Hello, I am looking for guidance on incremental ha...
# singer-taps
l
Hello, I am looking for guidance on incremental handling of taps. I have developed some code and would like to get incremental updates right. Right now I am using a plain fs directory over fsspec and planning to add S3-compatible storage. I have files like the following with monthly/daily additions 2025-01.csv 2025-02.csv I follwowed https://sdk.meltano.com/en/latest/incremental_replication.html and https://sdk.meltano.com/en/latest/implementation/state.html so far What I have been working on is here https://github.com/celine-eu/tap-spreadsheets/blob/main/tap_spreadsheets/stream.py#L33-L41 Basically using a custom ___updated_at field_ to track row level progress and tracking the reference file mtime. So, I suspect to have reinvented the wheel :) My question, what is already managed by the SDK and what should I do in my own code? Thank you
v
Sounds similar to https://github.com/MeltanoLabs/tap-universal-file 's implementation maybe you coudl look at how incremental is done here
l
Thank you for the link @visch. If I read correctly, the point is to track the file mtime (fs) or LastModified(s3) and add as additional fields in the schema https://github.com/MeltanoLabs/tap-universal-file/blob/main/tap_universal_file/client.py#L156-L159 The SDK will then automatically track updates on it's own.
v
Kinda sorta you're right but it sounds like you might need to read https://hub.meltano.com/singer/spec#state Hard to say without knowing you and your thought process deeply But hand wavingly, yes