Hey all, has anyone encountered this error with ta...
# troubleshooting
d
Hey all, has anyone encountered this error with tap-spreadsheets-anywhere?
Copy code
Cannot start extractor: Catalog discovery failed: command ['/project/.meltano/extractors/tap-spreadsheets-anywhere/venv/bin/tap-spreadsheets-anywhere', '--config',
We are running meltano via docker using the latest image and orchestrating via airflow 2.0 and have recently encountered this error in the last few days. I am relatively new to meltano and am stumped as to what to check next.
đź‘€ 1
d
@dylan_accorti Is there anything more to the output? I’d expect to see an error message of some kind
d
Yeah apologies - the error log is fairly large. Does this provide enough detail or is there a specific portion I should look for?
Copy code
[2021-09-10 14:45:36,320] {pod_launcher.py:149} INFO - INFO:root:ELT could not be completed: Cannot start extractor: Catalog discovery failed: command ['/project/.meltano/extractors/tap-spreadsheets-anywhere/venv/bin/tap-spreadsheets-anywhere', '--config', '/project/.meltano/run/elt/2021-09-10T144421--tap-spreadsheets-anywhere--target-bigquery/aee21ffc-534c-4366-9e17-8ce4208f9cd6/tap.554307c6-688e-4a35-9c93-7b313e73ea17.config.json', '--discover'] returned 1
[2021-09-10 14:45:36,325] {pod_launcher.py:149} INFO - 
[2021-09-10 14:45:36,619] {pod_launcher.py:149} INFO - INFO:root:Completed Meltano pipeline
[2021-09-10 14:45:38,642] {pod_launcher.py:198} INFO - Event: run-meltano-staging-ticket-bigquery.31b5814a9e494cadaef8da441656c333 had an event of type Failed
[2021-09-10 14:45:38,642] {pod_launcher.py:308} ERROR - Event with job id run-meltano-staging-ticket-bigquery.31b5814a9e494cadaef8da441656c333 Failed
[2021-09-10 14:45:38,648] {pod_launcher.py:198} INFO - Event: run-meltano-staging-ticket-bigquery.31b5814a9e494cadaef8da441656c333 had an event of type Failed
[2021-09-10 14:45:38,648] {pod_launcher.py:308} ERROR - Event with job id run-meltano-staging-ticket-bigquery.31b5814a9e494cadaef8da441656c333 Failed
[2021-09-10 14:45:38,702] {taskinstance.py:1503} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 368, in execute
    raise AirflowException(f'Pod {self.pod.metadata.name} returned a failure: {remote_pod}')
d
This helps; I was hoping there’d be more after
Catalog discovery failed: command [...] returned 1
, but if there’s not we’ll have to debug in a different direction
@florian.hines Could this be related to https://meltano.slack.com/archives/C01TCRBBJD7/p1631288005162300? I don’t know if it could cause discovery to fail
@dylan_accorti Are you able to run
meltano invoke tap-spreadsheets-anywhere --discover
in that same Docker environment / with your same Docker image? That may give us some more information
f
Does seem plausible that its related, the bug that led to https://meltano.slack.com/archives/C01TCRBBJD7/p1631288005162300 was specifically in tap discovery.
d
@florian.hines Although
command X returned 1
suggests that the actual subprocess returned a 1 exitcode, which wouldn’t be affected by our code
I wonder what discovery mode outputs when run by itself (as I suggested in my before-last message)
d
Thank you! Running this in docker environment now, will have an answer in a minute
This seems to work fine in the docker environment root@0164621ca0c2:/project# meltano invoke tap-spreadsheets-anywhere --discover INFO Found 16688 files. INFO Checking 16688 resolved objects for any that match regular expression "data-warehouse/data-extraction/orange/timecard/timecard_20211404.csv*" and were modified since 2017-05-01 000000+00:00 INFO Processing 1 resolved objects that met our criteria. Enable debug verbosity logging for more details. INFO Sampling data-warehouse/data-extraction/orange/timecard/timecard_20211404.csv (100000 records, every 5th record).
@florian.hines @douwe_maan - Apologies I think I missed this initial suggestion, but this line that precedes the ELT could not be completed error does seem to relate to the error that Florian suggested:
Copy code
[2021-09-10 14:45:36,156] {pod_launcher.py:149} INFO - INFO:root:TypeError: can't concat generator to bytes
[2021-09-10 14:45:36,156] {pod_launcher.py:149} INFO - 
[2021-09-10 14:45:36,159] {pod_launcher.py:149} INFO - INFO:root:meltano                   | ELT could not be completed: Cannot start extractor: Catalog discovery failed: command ['/project/.meltano/extractors/tap-spreadsheets-anywhere/venv/bin/tap-spreadsheets-anywhere', '--config', '/project/.meltano/run/elt/2021-09-10T144421--tap-spreadsheets-anywhere--target-bigquery/aee21ffc-534c-4366-9e17-8ce4208f9cd6/tap.554307c6-688e-4a35-9c93-7b313e73ea17.config.json', '--discover'] returned 1
f
Yea, although I’m a little worried that you’re bumping into two separate errors.
d
Yeah, I don’t think “can’t concat generator to bytes” would result in “Catalog discovery failed: command X returned 1", but if
meltano invoke tap-spreadsheets-anywhere --discover
by itself runs fine, I’m having trouble understanding where the 1 returncode is coming from
Either way it’s probably a good idea to try out the new v1.80.1 release with the fix to the concat issue, to rule that out
f
Yea, a legitimate error running tap discovery would trigger that concat generator error on 1.80.0 so its a little hard to tell
d
@dylan_accorti Which Python version are you using? I can get you a Docker image URL that contains the fix 🙂
Try pulling from
<http://registry.gitlab.com/meltano/meltano:f5774af62847b100f055600bd085a769f6f3c50f-python3.6|registry.gitlab.com/meltano/meltano:f5774af62847b100f055600bd085a769f6f3c50f-python3.6>
, where
3.6
can be replaced with
3.7
or
3.8
d
Perfect, we are using 3.6.15
Update: the image you provided fixed the type error, but we are still seeing the same catalog discovery error as before
@eric_simmerman - Do you have any suggestions or recommendations with this error message?
Copy code
ELT could not be completed: Cannot start extractor: Catalog discovery failed: command ['/project/.meltano/extractors/tap-spreadsheets-anywhere/venv/bin/tap-spreadsheets-anywhere', '--config', '/project/.meltano/run/elt/2021-09-12T041216--tap-spreadsheets-anywhere--target-bigquery/719e7209-a9b5-471a-8c45-716d761bbd74/tap.3b986729-0924-4455-b3f1-27cd19f872d6.config.json', '--discover'] returned 1
d
@dylan_accorti That still indicates that when Meltano runs
meltano invoke tap-spreadsheets-anywhere --discover
, it gets an exitcode of 1 (indicating an error). Can you confirm once more that running that directly succeeds without any error messages?
e
Chiming in late to agree with Douwe. Need to invoke the tap directly to try to diagnose root cause.
w
I am seeing a very similar error. I am also running meltano via docker using the latest image and orchestrating via airflow 2.0! However, I am using a different tap.
ELT could not be completed: Cannot start extractor: Catalog discovery failed: command ['/project/.meltano/extractors/tap-postgres/venv/bin/tap-postgres', '--config', '/project/.meltano/run/elt/xdw_to_edw/f6b3457e-db6a-4614-8a36-e255cd33c8f3/tap.34633486-f760-4d6f-a00b-457d107fa6e8.config.json', '--discover'] returned 1
@dylan_accorti Did your team come to a resolution on this issue?
f
@wyatt running with a --log-level=debug might help snag or bubble up an issue that is occurring during discovery.
w
@florian.hines Thanks - simple fix, a changed postgres db host on google cloudsql. Next up for me to solve: why the snowflake loader is failing.