Hi everyone - my team’s run into an issue that’s g...
# troubleshooting
n
Hi everyone - my team’s run into an issue that’s got us mystified so I’m calling in the experts 🙂 We have a pipeline combining tap-typeform and pipelinewise-target-redshift. I’m able to run it just fine both on my local machine and in production. We’re trying to set up another staff member to run meltano jobs on her local machine, and for some reason we can’t get it to run successfully. We’ve confirmed we’ve got the same meltano.yml, correct credentials in
.env
, and the same version of meltano (1.80.1). She’s able to run jobs from other taps fine as well. Importantly, the job is not erroring out either - it’s “completing successfully”, but it’s not loading any data, nor is there any state information getting written back to the job table in the meltano db (which on both our local machines is sqlite, but postgres in prod). Any thoughts on where we might look next?
d
A few things to try: • Run
meltano --log-level=debug elt ...
so that we get additional debug output, see if any state file is being picked up, and if any messages are making it out of the tap • Verify that the
database_uri
points at
.meltano/meltano.db
in
meltano config meltano
Also, can you share your
meltano.yml
definition for
tap-typeform
so I can verify it’s correct?
n
Yep - though I don’t think it’s the
meltano.yml
, since we both have the same one and it’s running fine on my end
I’ve passed that along and will report back!
Confirmed correct
database_uri
In the meantime, here’s the relevant chunk of meltano.yml
Copy code
- name: tap-typeform
    namespace: tap_typeform
    pip_url: tap-typeform
    executable: tap-typeform
    capabilities:
      - discover
      - catalog
      - state
    config:
      token: $TYPEFORM_API_KEY

      # NOTE: Unless a new use case comes to light and this comment is removed, all typeform
      # ingest should happen via manual/local meltano commands instead of automatic
      # jobs run out of airflow. See the Dataverse Superuser Guide for more details.

      # Only try to load one form at a time!
      forms: "REDACTED"

      # Figure out the earliest response for your form and set the start date accordingly
      start_date: "2021-03-12T00:00:00Z"

      incremental_range: "daily"
    select:
    - answers.answer
    - answers.data_type
    - answers.landing_id
    - answers.question_id
    - answers.ref
    - answers.type

    - landings.browser
    - landings.hidden
    - landings.landed_at
    - landings.landing_id
    - landings.network_id
    - landings.platform
    - landings.referer
    - landings.submitted_at
    - landings.token
    - landings.user_agent

    - questions.form_id
    - questions.question_id
    - questions.ref
    - questions.title
Good call on turning on debug messages - through that, we can see that it can’t find
tap.properties.json
tap.properties.cache_key
or
state.json
in
meltano/.meltano/extractors/tap-typeform
I’d thought that those got created automatically when installing/running a tap for the first time, but I realize that might not be the case? is there a particular command I can run to do that manually?
d
Not being able to find
state.json
is expected if there’s no state yet in the empty DB.
tap.properties.json
should definitely be there since it should be generated by
tap --discover
that runs as part of
meltano elt
On your machine, can
tap.properties.json
be found?
If there’s no catalog/properties file, the tap probably isn’t pulling data from any stream. It should still create a record in the
job
table though, so it’s odd that that’s missing
s
Is she accessing the database as a different user in Redshift (i.e. her own)? Maybe she doesn't have the appropriate redshift privileges for the table?
n
sorry - long string of meetings there! Some updates
1.) I have both
tap.properties.json
and
tap.properties.cache.key
on my machine, as expected
2.) When she runs the tap, it does create a record in the
job
table, as expected, but that record does not contain a value in the
payload
or
payload_flags
column
3.) She is accessing Redshift as a different user account, so it possible that could be wrapped up in there somehow. We audited her permissions at the start of all this, so I don’t think that’s the issue, but it’s certainly worth double checking
d
@nick_muller In the debug output she’s seeing, are you seeing any indication that
<tap> --discover
is run as part of
meltano elt
? Sharing the entire output in a snippet may be easiest
n
Importantly though, the tap doesn’t return any results in the first place (I assume because of something related to the missing properties file), and since the tap wouldn’t care about her redshift permissions, I suspect that’s unlikely to be the issue
@douwe_maan, yep - I see that in there. Relevant line:
Copy code
meltano                            | DEBUG Invoking: ['/Users/cassiehudson/git/meltano/.meltano/extractors/tap-typeform/venv/bin/tap-typeform', '--config', '/Users/cassiehudson/git/meltano/.meltano/run/elt/typeform_to_redshift/8cabfb65-f249-47ac-8ef3-53a94be8722f/tap.693a2128-10e7-47c9-b04d-2a6209a8087f.config.json', '--discover']
d
@nick_hamlin OK, good. It’s confusing then that that’s not resulting in a
tap.properties.json
file being created. Can she run
meltano invoke tap-typeform --discover
and see if that raises any errors, or generates a valid catalog file?
n
Thanks! it’s reassuring to know that I’m not totally losing my mind 🙂 I’ll ask her to give that a shot and will report back
Looks like it’s generating something sensible?
Copy code
gg-C02CV0FWMD6R:meltano cassiehudson$ meltano invoke tap-typeform --discover
{"streams": [{"tap_stream_id": "landings", "key_properties": ["landing_id"], "schema": {"properties": {"landing_id": {"selected": true, "type": ["null", "string"]}, "token": {"selected": true, "type": ["null", "string"]}, "landed_at": {"selected": true, "format": "date-time", "type": ["null", "string"]}, "submitted_at": {"selected": true, "format": "date-time", "type": ["null", "string"]}, "user_agent": {"selected": true, "type": ["null", "string"]}, "platform": {"selected": true, "type": ["null", "string"]}, "referer": {"selected": true, "type": ["null", "string"]}, "network_id": {"selected": true, "type": ["null", "string"]}, "browser": {"selected": true, "type": ["null", "string"]}, "hidden": {"selected": true, "type": ["null", "string"]}}, "selected": true, "type": ["null", "object"], "additionalProperties": false}, "stream": "landings", "metadata": [{"metadata": {"inclusion": "automatic"}, "breadcrumb": ["properties", "landing_id"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "token"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "landed_at"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "submitted_at"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "user_agent"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "platform"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "referer"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "network_id"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "browser"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "hidden"]}]}, {"tap_stream_id": "answers", "key_properties": ["landing_id", "question_id"], "schema": {"properties": {"landing_id": {"selected": true, "type": ["null", "string"]}, "question_id": {"selected": true, "type": ["null", "string"]}, "type": {"selected": true, "type": ["null", "string"]}, "ref": {"selected": true, "type": ["null", "string"]}, "data_type": {"selected": true, "type": ["null", "string"]}, "answer": {"selected": true, "type": ["null", "string"]}}, "selected": true, "type": ["null", "object"], "additionalProperties": false}, "stream": "answers", "metadata": [{"metadata": {"inclusion": "automatic"}, "breadcrumb": ["properties", "landing_id"]}, {"metadata": {"inclusion": "automatic"}, "breadcrumb": ["properties", "question_id"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "type"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "ref"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "data_type"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "answer"]}]}, {"tap_stream_id": "questions", "key_properties": ["form_id", "question_id"], "schema": {"properties": {"form_id": {"selected": true, "type": ["null", "string"]}, "question_id": {"selected": true, "type": ["null", "string"]}, "title": {"selected": true, "type": ["null", "string"]}, "ref": {"selected": true, "type": ["null", "string"]}}, "selected": true, "type": ["null", "object"], "additionalProperties": false}, "stream": "questions", "metadata": [{"metadata": {"inclusion": "automatic"}, "breadcrumb": ["properties", "form_id"]}, {"metadata": {"inclusion": "automatic"}, "breadcrumb": ["properties", "question_id"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "title"]}, {"metadata": {"inclusion": "available"}, "breadcrumb": ["properties", "ref"]}]}]}
d
@nick_hamlin Yeah, that looks right. Can you share a few lines of log output after https://meltano.slack.com/archives/C01TCRBBJD7/p1631649736218000?thread_ts=1631637898.213900&amp;cid=C01TCRBBJD7?
n
Yep - here’s everything prior to the step where it echos out all the env variables:
Copy code
```[2021-09-15 08:42:12,085] [5802|MainThread|root] [DEBUG] Creating engine <meltano.core.project.Project object at 0x112cadb38>@sqlite:////Users/cassiehudson/git/meltano/.meltano/meltano.db
[2021-09-15 08:42:12,114] [5802|MainThread|asyncio] [DEBUG] Using selector: KqueueSelector
meltano | DEBUG Variable '$MELTANO_LOAD_SCHEMA' is missing from the environment.
meltano                            | INFO Running extract & load...
meltano                            | DEBUG Created configuration at /Users/cassiehudson/git/meltano/.meltano/run/elt/typeform_to_redshift/683befa1-e9b7-423b-9234-dc9b43f90dfe/tap.50e36044-c631-4a52-9dca-72d5d27ee93d.config.json
meltano                            | DEBUG Could not find tap.properties.json in /Users/cassiehudson/git/meltano/.meltano/extractors/tap-typeform/tap.properties.json, skipping.
meltano                            | DEBUG Could not find tap.properties.cache_key in /Users/cassiehudson/git/meltano/.meltano/extractors/tap-typeform/tap.properties.cache_key, skipping.
meltano                            | DEBUG Could not find state.json in /Users/cassiehudson/git/meltano/.meltano/extractors/tap-typeform/state.json, skipping.
meltano                            | DEBUG Variable '$MELTANO_LOAD_SCHEMA' is missing from the environment.
meltano                            | DEBUG Variable '$MELTANO_LOAD_SCHEMA' is missing from the environment.
meltano                            | DEBUG Created configuration at /Users/cassiehudson/git/meltano/.meltano/run/elt/typeform_to_redshift/683befa1-e9b7-423b-9234-dc9b43f90dfe/target.536a3012-206e-496f-b09e-3aed176ccefa.config.json
meltano                            | INFO Performing full refresh, ignoring state left behind by any previous runs.
meltano                            | DEBUG Invoking: ['/Users/cassiehudson/git/meltano/.meltano/extractors/tap-typeform/venv/bin/tap-typeform', '--config', '/Users/cassiehudson/git/meltano/.meltano/run/elt/typeform_to_redshift/683befa1-e9b7-423b-9234-dc9b43f90dfe/tap.50e36044-c631-4a52-9dca-72d5d27ee93d.config.json', '--discover']
Something else that stands out to me is | DEBUG Variable ‘$MELTANO_LOAD_SCHEMA’ is missing from the environment.
I think that’s one the meltano sets automatically?
either way - here’s what comes immediately after (skipping the env vars line)
Copy code
meltano                            | DEBUG Visiting CatalogNode.STREAM at '.streams[0]'.
meltano                            | DEBUG Setting '.streams[0].selected' to 'False'
meltano                            | DEBUG Setting '.streams[0].selected' to 'True'
meltano                            | DEBUG Setting '.streams[0].selected' to 'True'
meltano                            | DEBUG Setting '.streams[0].selected' to 'True'
meltano                            | DEBUG Setting '.streams[0].selected' to 'True'
meltano                            | DEBUG Setting '.streams[0].selected' to 'True'
meltano                            | DEBUG Setting '.streams[0].selected' to 'True'
meltano                            | DEBUG Setting '.streams[0].selected' to 'True'
meltano                            | DEBUG Setting '.streams[0].selected' to 'True'
meltano                            | DEBUG Setting '.streams[0].selected' to 'True'
meltano                            | DEBUG Setting '.streams[0].selected' to 'True'
meltano                            | DEBUG Skipping node at '.streams[0].tap_stream_id'
meltano                            | DEBUG Skipping node at '.streams[0].key_properties[0]'
meltano                            | DEBUG Visiting CatalogNode.PROPERTY at '.streams[0].schema.properties.landing_id'.
meltano                            | DEBUG Skipping node at '.streams[0].schema.properties.landing_id.selected'
meltano                            | DEBUG Visiting CatalogNode.PROPERTY at '.streams[0].schema.properties.token'.
meltano                            | DEBUG Skipping node at '.streams[0].schema.properties.token.selected'
d
The
MELTANO_LOAD_SCHEMA
warning is not an issue. Those
DEBUG Visiting
lines indicate that Meltano is seeing the discovered catalog file and applying your selection rules, which is a good sign. But then it sounds like the catalog file is disappearing before it’s passed to the tap 😕
Can you show the line saying that
tap.properties.json
is missing?
One thing we’re just noticing is that those json files DO exist for her in
meltano/.meltano/run/tap_typeform
(this is also where they exist on my instance) EDIT - I now think this is a red herring. Mine does this too
but, per those logs, meltano is looking for them in
/meltano/.meltano/extractors/tap-typeform/
d
@nick_hamlin Do you see a second
DEBUG Invoking:
line without
--discover
, for the actual sync run?
n
sure do:
Copy code
meltano                            | DEBUG Invoking: ['/Users/cassiehudson/git/meltano/.meltano/extractors/tap-typeform/venv/bin/tap-typeform', '--config', '/Users/cassiehudson/git/meltano/.meltano/run/elt/typeform_to_redshift/683befa1-e9b7-423b-9234-dc9b43f90dfe/tap.50e36044-c631-4a52-9dca-72d5d27ee93d.config.json', '--catalog', '/Users/cassiehudson/git/meltano/.meltano/run/elt/typeform_to_redshift/683befa1-e9b7-423b-9234-dc9b43f90dfe/tap.properties.json']
d
And you’re saying that
/Users/cassiehudson/git/meltano/.meltano/run/elt/typeform_to_redshift/683befa1-e9b7-423b-9234-dc9b43f90dfe/tap.properties.json
actually exists? Does it look the same on her machine as it does on yours?
All this time I was thinking that the properties file wasn’t being generated at all, and the tap was being invoked without it, but it looks like it’s actually there 😅
The “missing” warning is a red herring, because that takes place before discovery even runs
n
yeah I thought the same thing!
but it is definitely there, and (as far as we can tell) looks the same as mine
d
OK, so the tap is being invoked in the same way with the same catalog and config file, but on your machine it’s finding records, and on hers it isn’t. To confirm: when running in debug mode on your machine, you’re seeing lines with the
tap-typeform (out)
prefix and a
RECORD
message type, but on hers you aren’t?
n
yep, exactly
d
Are you comfortable putting debug print statements inside the tap itself? On the Meltano side everything appears to look the same, so I think we have to dig into the tap and figure out where the different behavior arises
n
yeah, we can certainly do that
ok - making progress! I’m pretty sure it’s related to a bug in a more recent version of tap-typeform. Two quick things to validate that’ll help me pin it down
1.) Is there a sanctioned way to determine the exact version of a tap that meltano is using? Presumably I could treat it like any other python package, but curious if there’s a “meltano official” alternative
2.) I think this is true based on my read of the docs, but wanted to confirm: If I pin a tap to a particular version via it’s
pip_url
in meltano.yml and then do a
meltano install extractor tap-typeform --clean
, presumably that should allow me to jump between versions as needed. Do I have that correct?
s
I think the simplest way is to just anchor the
pip_url
to a specific version: https://meltano.com/docs/plugin-management.html#pinning-a-plugin-to-a-specific-version
n
Yep, that’s exactly what I’m planning to do - I just need to figure out which is the right version to anchor there and make sure that everyone’s is on the same one
d
We don’t have a way yet to show the installed version, but there’s an issue for that: https://gitlab.com/meltano/meltano/-/issues/2337 Right now the trick would be to go into the installation directory and find the version identifier there
n
alright - success! Confirmed that the issue was a bug in a recent release of tap-typeform (which, I think, remains unresolved in the most recent version - I didn’t dig deeply into this - so heads up for anyone else using this tap!). Pinning to 1.3.0 and reinstalling allows things to run as normal again. Thanks everyone for your support in digging through this!
d
@nick_hamlin Awesome! Did you file an issue on the tap?
n
not yet, but I plan to
Closing loops - I’ve filed an issue 🙂
Another update on this one - something VERY similar just happened with tap-zendesk. Thanks to this thread, it was much easier to troubleshoot on my end and I’ve filed another issue (https://github.com/singer-io/tap-zendesk/issues/73). Seems like there might be a wave of changes on the Singer side? Figured I’d flag here for others in case your implementations are similarly affected