Nick McMahon
10/15/2024, 6:35 AMtap-spreadsheets-anywhere
but I'm facing an error with it that I suspect is related to the file compression
WARNING unable to transparently decompress <_io.BufferedReader name=7> because it seems to lack a string-like .name
ERROR Unable to write Catalog entry for 'myfeed' - it will be skipped due to error underlying stream is not seekable
Nick McMahon
10/15/2024, 6:38 AMAndy Carter
10/15/2024, 7:11 AMmeltano.yml
config for the extractor?Andy Carter
10/15/2024, 8:18 AMEdgar Ramírez (Arch.dev)
10/15/2024, 2:38 PMmax_sampling_read: 0
and overriding schema would fix it.
The problem seems to be that the file can't be inspected for inferring a schema because the stream isn't seekable. I would hope the tap or FS library would then fall back to reopening the stream, but maybe it's not possible.Nick McMahon
10/16/2024, 12:59 AM- name: tap-spreadsheets-anywhere
variant: ets
pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git>
config:
tables:
- path: <ftp://myserver.com/>
max_sampling_read: 0
name: myfeed
pattern: myfile.txt.gz
start_date: '2024-10-14T00:00:00Z'
key_properties: []
format: csv
delimiter: "|"
field_names:
- id
- title
- sku
- category
Nick McMahon
10/16/2024, 1:05 AM2024-10-16 11:59:00 INFO Checking 354 resolved objects for any that match regular expression "myfile.txt.*" and were modified since 2024-10-14
2024-10-16 11:59:00 00:00:00+00:00
2024-10-16 11:59:00 INFO Processing 1 resolved objects that met our criteria. Enable debug verbosity logging for more details.
2024-10-16 11:59:00 INFO Sampling myfile.txt.gz (0 records, every 5th record).
2024-10-16 11:59:00 WARNING unable to transparently decompress <_io.BufferedReader name=4> because it seems to lack a string-like .name
2024-10-16 11:59:00 ERROR Unable to write Catalog entry for 'myfeed' - it will be skipped due to error line contains NUL
2024-10-16 11:59:00 CRITICAL line contains NUL
2024-10-16 11:59:00 Traceback (most recent call last):
2024-10-16 11:59:00 File "/projects/.meltano/extractors/tap-spreadsheets-anywhere/venv/bin/tap-spreadsheets-anywhere", line 8, in <module>
2024-10-16 11:59:00 sys.exit(main())
2024-10-16 11:59:00 File "/projects/.meltano/extractors/tap-spreadsheets-anywhere/venv/lib/python3.9/site-packages/singer/utils.py", line 235, in wrapped
2024-10-16 11:59:00 return fnc(*args, **kwargs)
2024-10-16 11:59:00 File "/projects/.meltano/extractors/tap-spreadsheets-anywhere/venv/lib/python3.9/site-packages/tap_spreadsheets_anywhere/__init__.py",
2024-10-16 11:59:00 line 151, in main
2024-10-16 11:59:00 catalog = discover(tables_config)
2024-10-16 11:59:00 File "/projects/.meltano/extractors/tap-spreadsheets-anywhere/venv/lib/python3.9/site-packages/tap_spreadsheets_anywhere/__init__.py",
2024-10-16 11:59:00 line 92, in discover
2024-10-16 11:59:00 raise err
2024-10-16 11:59:00 File "/projects/.meltano/extractors/tap-spreadsheets-anywhere/venv/lib/python3.9/site-packages/tap_spreadsheets_anywhere/__init__.py",
2024-10-16 11:59:00 line 68, in discover
2024-10-16 11:59:00 samples = file_utils.sample_files(table_spec, target_files,sample_rate=sample_rate,
2024-10-16 11:59:00 File
2024-10-16 11:59:00 "/projects/.meltano/extractors/tap-spreadsheets-anywhere/venv/lib/python3.9/site-packages/tap_spreadsheets_anywhere/file_utils.py", line
2024-10-16 11:59:00 111, in sample_files
2024-10-16 11:59:00 to_return += sample_file(table_spec, target_file['key'], sample_rate, max_records)
2024-10-16 11:59:00 File
2024-10-16 11:59:00 "/projects/.meltano/extractors/tap-spreadsheets-anywhere/venv/lib/python3.9/site-packages/tap_spreadsheets_anywhere/file_utils.py", line
2024-10-16 11:59:00 87, in sample_file
2024-10-16 11:59:00 for row in iterator:
2024-10-16 11:59:00 File
2024-10-16 11:59:00 "/projects/.meltano/extractors/tap-spreadsheets-anywhere/venv/lib/python3.9/site-packages/tap_spreadsheets_anywhere/csv_handler.py",
2024-10-16 11:59:00 line 8, in generator_wrapper
2024-10-16 11:59:00 for row in reader:
2024-10-16 11:59:00 File "/usr/local/lib/python3.9/csv.py", line 111, in __next__
2024-10-16 11:59:00 row = next(self.reader)
2024-10-16 11:59:00 _csv.Error: line contains NUL
Edgar Ramírez (Arch.dev)
10/16/2024, 1:48 AMNick McMahon
10/16/2024, 2:08 AMNick McMahon
10/16/2024, 2:15 AMsmart_open
library seems to be failing to read the file name so it never decompresses the fileNick McMahon
10/16/2024, 3:13 AMsmart_open
handles FTP files