Morning everyone :smile: Question for you all: Ho...
# best-practices
s
Morning everyone 😄 Question for you all: How do you guy backfill data? I have an incremental sync that only takes my most recent changes (if I resynced everytime, would be 40h stream vs 20 minutes). My issue is, I'm not sure what to do in the case I have a new column to add to the stream. How do I backfill the historical data? - Should I • Resync everything, tying up the stream for 40 plus hours, • Create a separate stream only for updating? • Any other suggestions? I'm guessing this would be a devops best practice somewhere, I'm just not an expert in the subject 😅
c
I'm currently going with option 1 (resync evertything). It's simple. It works. It can take a long time, but luckily my source datasets are not THAT huge.
h
Resync everything if it is OK, if not, some taps have a config start-date which does the job, but if none of those are possible I am not above editing the state.
s
Yeah I think it's going to be option 1 😅 . I'll need to create a "schedule-less" dag