Arnaud Stephan
11/04/2024, 3:08 PM2024-11-04T14:57:02.091508Z [info ] Incremental state has been updated at 2024-11-04 14:57:02.091402.
Also, all my tables have a date_modified
column.
Since we are using Docker, we build the whole Meltano project from scratch every time we load our data. So the info about incremental state
is lost, correct ?
Is there a way in my meltano.yml
file to setup my tap-postgres
so that I can fetch only the data from the last week (for instance), and then upsert the resulting data ?visch
11/04/2024, 3:36 PMThere's a lot to unpack here. But I'll try to just answer you. Generally it seems like some choices about architecture may need to change.
So the info aboutyes, depending on your meltano.yml file specefically how you're managing your engine / state backend.is lost, correct ?incremental state
Is there a way in myYes, check out state backends / backend enginefile to setup mymeltano.yml
so that I can fetch only the data from the last week (for instance), and then upsert the resulting data ?tap-postgres
Everything works so far, but recently I added a table that is >100M rows and the time to load everything has completely exploded.Easiest fix for you might be to just run this one table on a separate schedule it everything else is working great for you.
Arnaud Stephan
11/04/2024, 3:39 PMstream_maps:
public-metric_value:
__key_properties__:
- id
- instance_id
comment: sha3(comment) if comment else ''
instance_id: str('${instance_id}')
meltano_extracted_at: datetime.datetime.now()
select:
- public-metric_value.*
where:
- date_update >= current_date - interval '7 days'
I couldn't find doc regarding where
filtersvisch
11/04/2024, 3:40 PMArnaud Stephan
11/04/2024, 3:40 PMvisch
11/04/2024, 3:40 PMvisch
11/04/2024, 3:41 PMArnaud Stephan
11/04/2024, 3:48 PMArnaud Stephan
11/04/2024, 4:18 PMArnaud Stephan
11/04/2024, 4:19 PMvisch
11/04/2024, 4:29 PMEdgar Ramírez (Arch.dev)
11/04/2024, 4:30 PMNULLS FIRST
(see the flag in the tap code) so if the replication key column has null values, they come in first and shouldn't affect the state if there are non-null values.Arnaud Stephan
11/04/2024, 4:30 PMgenerally your replication key's shouldn't have nulls would be the first recommendation I'd have for youSadly I'm dealing with a shitty backend 🫠
visch
11/04/2024, 4:30 PMArnaud Stephan
11/04/2024, 4:31 PM