https://meltano.com/ logo
#announcements
Title
# announcements
b

bulky-park-65916

01/29/2021, 3:37 PM
Hi, guys! What is the best approach for running --full-refresh for multiple scheduled connectors that already have DAG runs.
1
r

ripe-musician-59933

01/29/2021, 3:52 PM
@bulky-park-65916 You have a few options: 1. Delete all rows for these schedules (with the same
job_id
) from the
job
table in the system database, so that the next time the DAG runs, there will be no previous state, and they'll automatically start from scratch 2. Manually SSH into the machine and run
meltano schedule run <name> --full-refresh
for each schedule, but first pause the DAGs in the Airflow web UI to prevent them from accidentally running while the full refresh is running, which could result in duplicate records. Once we have https://gitlab.com/meltano/meltano/-/issues/2356, pausing shouldn't be necessary anymore
@bulky-park-65916 Does that help? 🙂 I'm not sure what approaches you had considered before and why you weren't sure if they would be the best!
b

bulky-park-65916

01/29/2021, 3:56 PM
I've considered option 2, but if there are 20-30 connector schedules (or more) it becomes a hassle. As for option (1), where can I find more information about how to do it? 🙂
r

ripe-musician-59933

01/29/2021, 3:58 PM
The system database is documented here: https://meltano.com/docs/project.html#system-database In your production setup, you're probably using a separate Postgres database. If you connect to that database using
psql
or any graphical Postgres UI, you'll see a
job
table with a row for every schedule run, with a
job_id
corresponding to the schedule name, a unique
run_id
, and a
payload
containing (among other things)
singer_state
. If you delete all records for a given
job_id
there, the next schedule run will not be able to find any old singer state and will start from scratch. If you'd like to fully refresh all schedules, you can delete all rows in that table
b

bulky-park-65916

01/29/2021, 4:36 PM
Thanks, I'll try now 🙂
Interesting, I dropped all rows from table "job", but it still catches a state as shown in the attached image
r

ripe-musician-59933

01/29/2021, 6:32 PM
@bulky-park-65916 Are you sure you're looking at the same system database?
If the table is actually empty, there's really no way it could find state 😄
😄 1
b

bulky-park-65916

01/29/2021, 6:35 PM
Well, it's the same DB. I just changed the names of the connectors and they started to fill up in the same table.
Currently has only 12 rows with the new connector names, so it's the same database
r

ripe-musician-59933

01/29/2021, 6:36 PM
😬
Then I'm confused
If you use a different
job_id
, does it not pick up state?
b

bulky-park-65916

01/29/2021, 6:37 PM
By changing the name of the schedule connector works, but still that's weird 😄
Wait, this is the table we are talking about right?
r

ripe-musician-59933

01/29/2021, 6:40 PM
No! That looks like one of Airflow's tables
b

bulky-park-65916

01/29/2021, 6:40 PM
😄
aw, shit
r

ripe-musician-59933

01/29/2021, 6:40 PM
😄
Well at least I'm not going crazy
b

bulky-park-65916

01/29/2021, 6:41 PM
image.png
So it's not this one
r

ripe-musician-59933

01/29/2021, 6:42 PM
Those are all Airflow's
b

bulky-park-65916

01/29/2021, 6:42 PM
Right, let me find how to connect to Meltano's then
😄
r

ripe-musician-59933

01/29/2021, 6:43 PM
Look at the database_uri you've configured, probably with the
MELTANO_DATABASE_URI
env var
b

bulky-park-65916

01/29/2021, 6:47 PM
Not defined, perhaps it uses a default configuration?
sqlite:///$MELTANO_PROJECT_ROOT/.meltano/meltano.db
r

ripe-musician-59933

01/29/2021, 6:47 PM
If so, that would be the default SQLite database inside your project directory at
.meltano/meltano.db
Yeah that one
b

bulky-park-65916

01/29/2021, 6:47 PM
this should be it, right?
r

ripe-musician-59933

01/29/2021, 6:48 PM
Yeah
b

bulky-park-65916

01/29/2021, 6:48 PM
oh, right, sorry for the small heart attack.. 😄
r

ripe-musician-59933

01/29/2021, 6:48 PM
Haha no worries
b

bulky-park-65916

01/29/2021, 6:48 PM
I think all should be fine now
r

ripe-musician-59933

01/29/2021, 6:48 PM
Glad you figured it out
Perfect
b

bulky-park-65916

01/29/2021, 6:57 PM
Small thing, it appears that
start_date
on the schedules level is overwritten by the extractor's default config
start_date
. As shown on picture - this date is not taken into consideration. Is this to be expected as behavior?
r

ripe-musician-59933

01/29/2021, 6:58 PM
Yeah, that
start_date
doesn't actually do anything, it just has to be in the past: https://gitlab.com/meltano/meltano/-/issues/2529
b

bulky-park-65916

01/29/2021, 6:59 PM
Right, so an additional extractor would be required to setup different start_date, correct?
r

ripe-musician-59933

01/29/2021, 7:00 PM
Each extractors should have its own
start_date
configured (under the plugin definition's
config
), but the
start_date
s under
schedules
really don't matter, they can be the same for every schedule, as long as it's in the past, because otherwise the pipeline won't run until that day
b

bulky-park-65916

01/29/2021, 7:01 PM
Got it 🙂