Hey team, and happy tuesday! Question for you guy...
# best-practices
s
Hey team, and happy tuesday! Question for you guys: Append vs upsert -> do you have any preference? Which should be used when? Currently we are using upsert, but as storage is so cheap in bigquery, I'm leaning towards append to be able to easily view changes
t
Generally append. Storage is cheap and I’d rather have the history than not!
b
How does append model work when data updates are very fine grained ?
Event 1 - User changed 1st Name only Event 2 - User changed zip only Event 3 - User Changed phone only so on … How should the latest User Profile (full) be built
t
Likely you would compute up to date snapshots at specific points in time. It’s also possible each event has the full set of data with it - so it really depends on the shape of the data
s
From my understanding, it would look like this: If your stream is set as
full_table replication
in the tap
• Every sync you will get a full new set of records (example your api returns 10 records, you will have 10 new records per sync) If your stream is set as
incremental replication
in the tap
• Every update, a new version of the updated record will be sent (example your api returns 10 records and 1 was updated, you will have 11 records at the end) Regardless of which replication method you use, it would probably be good practice to partition by
id
, order by
updated_at
and get the first record outputed. This will ensure your final table only outputs the latest results