Hi again, New day, new question! I'm trying to fi...
# singer-tap-development
n
Hi again, New day, new question! I'm trying to figure out an incremental strategy for an API call that doesn't really have an obvious replication key. To work around it, I'd like to query the API for all rows with a certain status, save all of those IDs and at the next runtime run a child stream for each of those IDs, and then run the first call again and store the updated set of IDs. Any ideas?
v
if you have to pull all the data anyway just send all the data to the target. The target should upsert for you and you're good to go. Keeps in simpler
n
problem is that there are ≈41000 records that are already processed and roughly 200 that are not, and I don't want to pull out the 41000 records everytime
it's not the fastest API...
v
I'm missing some context, you're saying storing the ID's somewhere prevents you from having to do all the API calls again? Maybe if you could describe the requests in order we could help
It sounds like you have to do GET
/data
and all the results come every time. If there's no filter options there's nothing you can do anyway. Are you saying there is a filter option, if so what are the options? It sounds like you're saying there's an option for filtering based on id?
n
here's some more context: basic structure of endpoint data that is relevant for this:
Copy code
{
    "id": guid,
    "posted": boolean
}
My idea is to have one stream that exports all records, regardless of posted status, incrementally, based on id, which I already got to work. And then a second stream would filter on posted = false and store the ids for the next run. A third stream would then use those ids at the next runtime to run an individual call for each of those ids to see if they have been posted. Later, I'd combine the results of these streams to one table in dbt. Does that make more sense?
and thanks again for taking the time!