Hi guys, I have a machine pulling historical data ...
# troubleshooting
j
Hi guys, I have a machine pulling historical data from
tap-shopify
to
target-bigquery
and I'm using the embedded airflow to orchestrate it. Sometimes, the data pull just hangs and I can't see any log error. Does someone had faced this problem before? This leads to my next question. Where does meltano stores the state file? How does it knows from where to continue the data pull if I need to restart the machine?
e
Hi @jose_ribeiro!
the data pull just hangs and I can't see any log error
can you run the
meltano --log-level=debug elt tap-shopify target-bigquery
to see where it's hanging?
Where does meltano stores the state file? How does it knows from where to continue the data pull if I need to restart the machine?
Meltano uses its system db for that. If you're not explicitly declaring it, it's just the local
.meltano/meltano.db
, so depending on your environment, you may want to make that an external database. See also https://meltano.com/docs/integration.html#incremental-replication-state.
j
Hi Edgar, many thanks to guide me through this! Looks like I'm getting this error sometimes:
No heartbeat recorded for 5 minutes. The process was likely killed unceremoniously
e
Hmm looks like the pipeline is not running so meltano is killing the job. Can you check if
meltano invoke tap-shopify
works ok?
j
Yes, it works! I'm running a few machines using the embedded airflow to orchestrate the data-pulls and sometimes I get this error
e
So it seems like either shopify or bigquery have intermittencies. In that case, I think Meltano is doing the sane thing and killing the elt job after 5 minutes of inactivity. It should be ok to resume with the
--job_id
simple smile
j
hey Edgar! Thanks again for helping me on that! Actually, I've been doing that using the crontab to check