Hey team and happy tuesday Question for you guys Append vs u Meltano #best-practices

Hey team, and happy tuesday! Question for you guy...

Stéphane Burwash

01/17/2023, 6:23 PM

Hey team, and happy tuesday! Question for you guys: Append vs upsert -> do you have any preference? Which should be used when? Currently we are using upsert, but as storage is so cheap in bigquery, I'm leaning towards append to be able to easily view changes

taylor

01/17/2023, 10:46 PM

Generally append. Storage is cheap and I’d rather have the history than not!

binoy_shah

01/17/2023, 11:24 PM

How does append model work when data updates are very fine grained ?

binoy_shah

01/17/2023, 11:25 PM

Event 1 - User changed 1st Name only Event 2 - User changed zip only Event 3 - User Changed phone only so on … How should the latest User Profile (full) be built

taylor

01/18/2023, 3:26 PM

Likely you would compute up to date snapshots at specific points in time. It’s also possible each event has the full set of data with it - so it really depends on the shape of the data

Stéphane Burwash

01/18/2023, 3:46 PM

From my understanding, it would look like this: If your stream is set as
full_table replication
in the tap • Every sync you will get a full new set of records (example your api returns 10 records, you will have 10 new records per sync) If your stream is set as
incremental replication
in the tap • Every update, a new version of the updated record will be sent (example your api returns 10 records and 1 was updated, you will have 11 records at the end) Regardless of which replication method you use, it would probably be good practice to partition by

id

, order by

updated_at

and get the first record outputed. This will ensure your final table only outputs the latest results

6 Views

Open in Slack

Previous Next