hey y’all - we have an elt job that got interrupte...
# troubleshooting
d
hey y’all - we have an elt job that got interrupted mid-run and now seems to be stuck in “running” state according the db. All subsequent jobs have failed with the messaging “Another pipeline is already running which started at <timestamp>. To ignore this check use the ‘--force’ option.“. I am aware of
--force
but was also wondering if there is some sort of job timeout where meltano will proceed with running an elt job after a previous job has been running for x time.
if not, what’s the suggested remediation path here instead of needing to use
--force
on all subsequent runs?
t
We need to revisit this specifically soon https://gitlab.com/meltano/meltano/-/issues/2812 anything you can share in that issue would be appreciated 🙏
d
yeah, this is exactly the issue. tbh I would expect some sort of CLI functionality that let’s you interact with runs in the database. A lot of other frameworks have something similar. Dagster comes to mind (example)
t
https://gitlab.com/meltano/meltano/-/issues/2754 is an issue specifically around state too which might interest you
d
whats weird is that I do sometimes see these messages in the logs for subsequent runs:
Copy code
[2021-10-01 23:43:28,411] [14|MainThread|meltano.core.job.stale_job_failer] [INFO] Marked stale run that started at 2021-10-01 23:10:29.859352 as failed: No heartbeat recorded for 5 minutes. The process was likely killed unceremoniously.
not totally sure why that doesn’t seem to happen all the time though