christoph
10/04/2022, 9:29 AMtap-spreadsheets-anywhere for the first time and can't get past this ...
smartopen/elt » meltano invoke tap-spreadsheets-anywhere
2022-10-04T09:27:47.017417Z [info ] Environment 'dev' is active
INFO Using supplied catalog /home/cwegener/Scratch/smartopen/elt/.meltano/run/tap-spreadsheets-anywhere/tap.properties.json.
INFO Processing 0 selected streams from Catalog
No streams are being returned. What am I doing wrong?christoph
10/04/2022, 9:30 AMversion: 1
default_environment: dev
project_id: ab328611-7e9e-41ea-9408-c3fe3a8deb80
environments:
- name: dev
- name: staging
- name: prod
plugins:
extractors:
- name: tap-spreadsheets-anywhere
variant: ets
pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git> azure-storage-blob azure-common azure-core
namespace: tap_spreadsheets_anywhere
executable: tap-spreadsheets-anywhere
capabilities:
- catalog
- discover
- state
config:
tables:
- name: "target_table_name"
path: "file:///tmp/folderwithfiles"
pattern: "tenants.*"
start_date: "2017-05-01T00:00:00Z"
key_properties: []
format: "json"
selected: trueReuben (Matatika)
10/04/2022, 9:44 AMTAP_SPREADSHEETS_ANYWHERE_TABLES env var), which seems to work.
config:
tables: '[{"name":"target_table_name","path":"file:///tmp/folderwithfiles","pattern":"tenants.*","start_date":"2017-05-01T00:00:00Z","key_properties":[],"format":"json","selected":true}]'christoph
10/04/2022, 10:56 AMvoluptuous.error.MultipleInvalid: expected a list for dictionary value @ data['tables']christoph
10/04/2022, 11:00 AMINFO Generating catalog through sampling.
INFO Walking /tmp/folderwithfiles.
INFO Found 1 files.
ERROR Unable to write Catalog entry for 'table' - it will be skipped due to error nothing to repeat at position 0
INFO Processing 0 selected streams from Catalogchristoph
10/04/2022, 11:05 AMre.error: nothing to repeat at position 0
(Pdb) ll .
59 def discover(config):
60 streams = []
61 for table_spec in config['tables']:
62 try:
63 import pdb;pdb.set_trace()
64 modified_since = dateutil.parser.parse(table_spec['start_date'])
65 -> target_files = file_utils.get_matching_objects(table_spec, modified_since)
66 sample_rate = table_spec.get('sample_rate',5)
67 max_sampling_read = table_spec.get('max_sampling_read', 1000)
68 max_sampled_files = table_spec.get('max_sampled_files', 50)
69 samples = file_utils.sample_files(table_spec, target_files,sample_rate=sample_rate,
70 max_records=max_sampling_read, max_files=max_sampled_files)
71 schema = generate_schema(table_spec, samples)
72 stream_metadata = []
73 key_properties = table_spec.get('key_properties', [])
74 streams.append(
75 CatalogEntry(
76 tap_stream_id=table_spec['name'],
77 stream=table_spec['name'],
78 schema=schema,
79 key_properties=key_properties,
80 metadata=stream_metadata,
81 replication_key=None,
82 is_view=None,
83 database=None,
84 table=None,
85 row_count=None,
86 stream_alias=None,
87 replication_method=None,
88 )
89 )
90 except Exception as err:
91 LOGGER.error(f"Unable to write Catalog entry for '{table_spec['name']}' - it will be skipped due to error {err}")
92
93 return Catalog(streams)
(Pdb) n
INFO Walking /tmp/folderwithfiles.
INFO Found 1 files.
re.error: nothing to repeat at position 0
> /home/cwegener/Scratch/smartopen/.meltano/extractors/tap-spreadsheets-anywhere/venv/lib/python3.10/site-packages/tap_spreadsheets_anywhere/__init__.py(65)discover()
-> target_files = file_utils.get_matching_objects(table_spec, modified_since)christoph
10/04/2022, 11:11 AMpattern from test.* to *.test which caused the exception.
Anyway. I have a way forward now for troubleshooting, which is to directly run the tap without meltano in order to see the error messages (gotta remember that as a troubleshooting step)