I'm trying to run `tap-spreadsheets-anywhere` for ...
# troubleshooting
c
I'm trying to run
tap-spreadsheets-anywhere
for the first time and can't get past this ...
Copy code
smartopen/elt » meltano invoke tap-spreadsheets-anywhere
2022-10-04T09:27:47.017417Z [info     ] Environment 'dev' is active
INFO Using supplied catalog /home/cwegener/Scratch/smartopen/elt/.meltano/run/tap-spreadsheets-anywhere/tap.properties.json.
INFO Processing 0 selected streams from Catalog
No streams are being returned. What am I doing wrong?
meltano.yml:
Copy code
version: 1
default_environment: dev
project_id: ab328611-7e9e-41ea-9408-c3fe3a8deb80
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-spreadsheets-anywhere
    variant: ets
    pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git> azure-storage-blob azure-common azure-core
    namespace: tap_spreadsheets_anywhere
    executable: tap-spreadsheets-anywhere
    capabilities:
    - catalog
    - discover
    - state
    config:
      tables:
      - name: "target_table_name"
        path: "file:///tmp/folderwithfiles"
        pattern: "tenants.*"
        start_date: "2017-05-01T00:00:00Z"
        key_properties: []
        format: "json"
        selected: true
r
We use this tap and supply config as a JSON string (through
TAP_SPREADSHEETS_ANYWHERE_TABLES
env var), which seems to work.
Copy code
config:
      tables: '[{"name":"target_table_name","path":"file:///tmp/folderwithfiles","pattern":"tenants.*","start_date":"2017-05-01T00:00:00Z","key_properties":[],"format":"json","selected":true}]'
c
That doesn't seem to pass the config schema validation ...
Copy code
voluptuous.error.MultipleInvalid: expected a list for dictionary value @ data['tables']
Ah. The tap's error message gets swallowed by meltano invoke ... running the tap directly shows an actual error message.
Copy code
INFO Generating catalog through sampling.
INFO Walking /tmp/folderwithfiles.
INFO Found 1 files.
ERROR Unable to write Catalog entry for 'table' - it will be skipped due to error nothing to repeat at position 0
INFO Processing 0 selected streams from Catalog
tap raises an exception here https://github.com/ets/tap-spreadsheets-anywhere/blob/5d9115985d3f9e7a568c6dcc68975f0c038253ff/tap_spreadsheets_anywhere/__init__.py#L64
re.error: nothing to repeat at position 0
Copy code
(Pdb) ll .
 59     def discover(config):
 60         streams = []
 61         for table_spec in config['tables']:
 62             try:
 63                 import pdb;pdb.set_trace()
 64                 modified_since = dateutil.parser.parse(table_spec['start_date'])
 65  ->             target_files = file_utils.get_matching_objects(table_spec, modified_since)
 66                 sample_rate = table_spec.get('sample_rate',5)
 67                 max_sampling_read = table_spec.get('max_sampling_read', 1000)
 68                 max_sampled_files = table_spec.get('max_sampled_files', 50)
 69                 samples = file_utils.sample_files(table_spec, target_files,sample_rate=sample_rate,
 70                                                   max_records=max_sampling_read, max_files=max_sampled_files)
 71                 schema = generate_schema(table_spec, samples)
 72                 stream_metadata = []
 73                 key_properties = table_spec.get('key_properties', [])
 74                 streams.append(
 75                     CatalogEntry(
 76                         tap_stream_id=table_spec['name'],
 77                         stream=table_spec['name'],
 78                         schema=schema,
 79                         key_properties=key_properties,
 80                         metadata=stream_metadata,
 81                         replication_key=None,
 82                         is_view=None,
 83                         database=None,
 84                         table=None,
 85                         row_count=None,
 86                         stream_alias=None,
 87                         replication_method=None,
 88                     )
 89                 )
 90             except Exception as err:
 91                 LOGGER.error(f"Unable to write Catalog entry for '{table_spec['name']}' - it will be skipped due to error {err}")
 92
 93         return Catalog(streams)
(Pdb) n
INFO Walking /tmp/folderwithfiles.
INFO Found 1 files.
re.error: nothing to repeat at position 0
> /home/cwegener/Scratch/smartopen/.meltano/extractors/tap-spreadsheets-anywhere/venv/lib/python3.10/site-packages/tap_spreadsheets_anywhere/__init__.py(65)discover()
-> target_files = file_utils.get_matching_objects(table_spec, modified_since)
Ah. That was due to a different problem ... I changed the
pattern
from
test.*
to
*.test
which caused the exception. Anyway. I have a way forward now for troubleshooting, which is to directly run the tap without meltano in order to see the error messages (gotta remember that as a troubleshooting step)