Hi! I am looking into pulling CSV data from a S3 b...
# troubleshooting
j
Hi! I am looking into pulling CSV data from a S3 bucket, but am having issues with the files being .csv.gz extensions. I have tried using the extractors
tap-s3-csv
tap-s3
and
tap-spreadsheets-anywhere
.
Error as:
Copy code
time=2023-11-30 11:32:17 name=tap_s3_csv level=INFO message=Will download key "home/100105408/STL-AFP-20231130.csv.gz" as it was last modified 2023-11-30 13:31:20+00:00
time=2023-11-30 11:32:17 name=tap_s3_csv level=INFO message=Sampling home/100105408/STL-AFP-20231126.csv.gz (max records: 1000, sample rate: 5)     
time=2023-11-30 11:32:18 name=tap_s3_csv level=CRITICAL message='utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
e
The
smart_open
library used by
tap-spreadsheets-anywhere
should be correctly handling the compressed file, which makes me think it's an encoding issue with the decompressed file. You can try using the
encoding
setting of the tap if you have a guess as to which encoding the file actually has.
j
Just did a quick search and that looks about right. I will do some testing. thank you!