hey y all we have an elt job that got interrupted mid run an Meltano #troubleshooting

hey y’all - we have an elt job that got interrupte...

david_wallace

10/01/2021, 7:40 PM

hey y’all - we have an elt job that got interrupted mid-run and now seems to be stuck in “running” state according the db. All subsequent jobs have failed with the messaging “Another pipeline is already running which started at <timestamp>. To ignore this check use the ‘--force’ option.“. I am aware of

--force

but was also wondering if there is some sort of job timeout where meltano will proceed with running an elt job after a previous job has been running for x time.

david_wallace

10/01/2021, 7:43 PM

if not, what’s the suggested remediation path here instead of needing to use

--force

on all subsequent runs?

taylor

10/01/2021, 9:51 PM

We need to revisit this specifically soon https://gitlab.com/meltano/meltano/-/issues/2812 anything you can share in that issue would be appreciated 🙏

david_wallace

10/01/2021, 9:55 PM

yeah, this is exactly the issue. tbh I would expect some sort of CLI functionality that let’s you interact with runs in the database. A lot of other frameworks have something similar. Dagster comes to mind (example)

taylor

10/01/2021, 9:56 PM

https://gitlab.com/meltano/meltano/-/issues/2754 is an issue specifically around state too which might interest you

david_wallace

10/01/2021, 11:45 PM

whats weird is that I do sometimes see these messages in the logs for subsequent runs:

Copy code

[2021-10-01 23:43:28,411] [14|MainThread|meltano.core.job.stale_job_failer] [INFO] Marked stale run that started at 2021-10-01 23:10:29.859352 as failed: No heartbeat recorded for 5 minutes. The process was likely killed unceremoniously.

david_wallace

10/01/2021, 11:45 PM

not totally sure why that doesn’t seem to happen all the time though

Open in Slack

Previous Next