victor_macaubas
01/17/2025, 5:23 PMruns
table and update the state
table with it. But it's happening more and more.
Here are the logs of when it starts doing a full load:
time=2025-01-17 05:13:20 name=tap_mysql level=INFO message=LOG_BASED stream prod-table_1 will resume its historical sync cmd_type=elb consumer=False job_name=prod:prod-to-snowflake:prod_nrt_extract_load name=prod producer=True run_id=8bf705b8-0e81-45d5-a71a-77bf7261754e stdio=stderr string_id=clockwork
time=2025-01-17 05:13:20 name=tap_mysql level=INFO message=LOG_BASED stream prod-table_2 requires full historical sync cmd_type=elb consumer=False job_name=prod:prod-to-snowflake:prod_nrt_extract_load name=prod producer=True run_id=8bf705b8-0e81-45d5-a71a-
Meltano: 3.5.4
Python: 3.9
tap: pipelinewise mysql-tap
Meltano backend: PostgresEdgar Ramírez (Arch.dev)
01/17/2025, 10:20 PMvictor_macaubas
01/20/2025, 11:46 AMvictor_macaubas
01/22/2025, 7:50 PM.meltano/run/{extractor_name}/state.json
Since both dags used the same extractor, when both pipelines started at the same time, the state files in the run folder would get overwritten. This caused Meltano to perform a full sync for tables that it couldn’t find in the state file.
This behavior became evident when looking at the logs with debug mode on:
--state', '/usr/local/airflow/.meltano/run/my_extractor/state.json'
The simplest solution was to create a separate extractor for each dag, that way we avoid the states being overwritten.Edgar Ramírez (Arch.dev)
01/22/2025, 10:01 PMEdgar Ramírez (Arch.dev)
01/22/2025, 10:01 PMvictor_macaubas
01/23/2025, 12:02 PM.meltano/run/{extractor_name}/state_{state_id}.json
?Edgar Ramírez (Arch.dev)
01/23/2025, 8:06 PM