colossal-cricket-61413
02/17/2021, 12:49 AMripe-musician-59933
02/17/2021, 12:59 AMWhen meltano elt is run a subsequent time, it will look for the most recent completed (successful or failed) pipeline run with the same job ID that generated some state. If found, this state is then passed along to the extractor.For scheduled pipelines, the schedule name is used as the job ID
wide-salesclerk-68871
02/18/2021, 4:42 PM--select
in a run to backfill a part of the job. For example:
• job_id=foo
is selecting a users
table using Binlog replication. It’s been run on a schedule for a while and is up to date
• I want to add a HUGE table. Let’s call it results
.
• Outside of the job I run meltano elt … job_id=foo --select results
. _I want the same job_id because after the backfill they’ll be part of the same airflow task_
• The next time I run will it do a full-refresh on users
since it wasn’t part of the pervious run and can’t find a binlog
?ripe-musician-59933
02/18/2021, 5:08 PMThe next time I run will it do a full-refresh on@wide-salesclerk-68871 No, it should incrementally replicate bothsince it wasn’t part of the pervious run and can’t find ausers
?binlog
users
and results
as expected, since runs with --select
or --except
are stored as incomplete, and their state is merged into the most recent complete run's state to create the state for the next incremental run, so the next run should see bookmarks both for users
and results
wide-salesclerk-68871
02/18/2021, 5:09 PMripe-musician-59933
02/18/2021, 5:12 PM