bulky-park-65916
01/29/2021, 3:37 PMripe-musician-59933
01/29/2021, 3:52 PMjob_id
) from the job
table in the system database, so that the next time the DAG runs, there will be no previous state, and they'll automatically start from scratch
2. Manually SSH into the machine and run meltano schedule run <name> --full-refresh
for each schedule, but first pause the DAGs in the Airflow web UI to prevent them from accidentally running while the full refresh is running, which could result in duplicate records. Once we have https://gitlab.com/meltano/meltano/-/issues/2356, pausing shouldn't be necessary anymorebulky-park-65916
01/29/2021, 3:56 PMripe-musician-59933
01/29/2021, 3:58 PMpsql
or any graphical Postgres UI, you'll see a job
table with a row for every schedule run, with a job_id
corresponding to the schedule name, a unique run_id
, and a payload
containing (among other things) singer_state
.
If you delete all records for a given job_id
there, the next schedule run will not be able to find any old singer state and will start from scratch.
If you'd like to fully refresh all schedules, you can delete all rows in that tablebulky-park-65916
01/29/2021, 4:36 PMripe-musician-59933
01/29/2021, 6:32 PMbulky-park-65916
01/29/2021, 6:35 PMripe-musician-59933
01/29/2021, 6:36 PMjob_id
, does it not pick up state?bulky-park-65916
01/29/2021, 6:37 PMripe-musician-59933
01/29/2021, 6:40 PMbulky-park-65916
01/29/2021, 6:40 PMripe-musician-59933
01/29/2021, 6:40 PMjob
and have job_id
and run_id
columns: https://meltano.slack.com/archives/CFG3C3C66/p1611935913067200?thread_ts=1611934620.063300&cid=CFG3C3C66bulky-park-65916
01/29/2021, 6:41 PMripe-musician-59933
01/29/2021, 6:42 PMbulky-park-65916
01/29/2021, 6:42 PMripe-musician-59933
01/29/2021, 6:43 PMMELTANO_DATABASE_URI
env varbulky-park-65916
01/29/2021, 6:47 PMripe-musician-59933
01/29/2021, 6:47 PM.meltano/meltano.db
bulky-park-65916
01/29/2021, 6:47 PMripe-musician-59933
01/29/2021, 6:48 PMbulky-park-65916
01/29/2021, 6:48 PMripe-musician-59933
01/29/2021, 6:48 PMbulky-park-65916
01/29/2021, 6:48 PMripe-musician-59933
01/29/2021, 6:48 PMbulky-park-65916
01/29/2021, 6:57 PMstart_date
on the schedules level is overwritten by the extractor's default config start_date
.
As shown on picture - this date is not taken into consideration. Is this to be expected as behavior?ripe-musician-59933
01/29/2021, 6:58 PMstart_date
doesn't actually do anything, it just has to be in the past: https://gitlab.com/meltano/meltano/-/issues/2529bulky-park-65916
01/29/2021, 6:59 PMripe-musician-59933
01/29/2021, 7:00 PMstart_date
configured (under the plugin definition's config
), but the start_date
s under schedules
really don't matter, they can be the same for every schedule, as long as it's in the past, because otherwise the pipeline won't run until that daybulky-park-65916
01/29/2021, 7:01 PM