Hi again New day new question I m trying to figure out an in Meltano #singer-tap-development

Hi again, New day, new question! I'm trying to fi...

niclas_roos

02/06/2023, 9:35 AM

Hi again, New day, new question! I'm trying to figure out an incremental strategy for an API call that doesn't really have an obvious replication key. To work around it, I'd like to query the API for all rows with a certain status, save all of those IDs and at the next runtime run a child stream for each of those IDs, and then run the first call again and store the updated set of IDs. Any ideas?

visch

02/06/2023, 1:32 PM

if you have to pull all the data anyway just send all the data to the target. The target should upsert for you and you're good to go. Keeps in simpler

niclas_roos

02/06/2023, 1:36 PM

problem is that there are ≈41000 records that are already processed and roughly 200 that are not, and I don't want to pull out the 41000 records everytime

niclas_roos

02/06/2023, 1:37 PM

it's not the fastest API...

visch

02/06/2023, 1:37 PM

I'm missing some context, you're saying storing the ID's somewhere prevents you from having to do all the API calls again? Maybe if you could describe the requests in order we could help

visch

02/06/2023, 1:38 PM

It sounds like you have to do GET

/data

and all the results come every time. If there's no filter options there's nothing you can do anyway. Are you saying there is a filter option, if so what are the options? It sounds like you're saying there's an option for filtering based on id?

niclas_roos

02/06/2023, 1:48 PM

here's some more context: basic structure of endpoint data that is relevant for this:

Copy code

{
    "id": guid,
    "posted": boolean
}

My idea is to have one stream that exports all records, regardless of posted status, incrementally, based on id, which I already got to work. And then a second stream would filter on posted = false and store the ids for the next run. A third stream would then use those ids at the next runtime to run an individual call for each of those ids to see if they have been posted. Later, I'd combine the results of these streams to one table in dbt. Does that make more sense?

niclas_roos

02/06/2023, 1:48 PM

and thanks again for taking the time!

Open in Slack

Previous Next