https://linen.dev logo
#singer-tap-development
Title
# singer-tap-development
j

julian_knight

11/17/2021, 7:45 PM
The question I was going to ask in demo day: I’m working with an API that has time-series data, but doesn’t support ordering (order is hard-coded to most-recent-first aka time descending). I could just use
is_sorted = False
and accept that the first run will need to sync all of the data before any state can be saved. However the API supports time-range filtering, so it’s hypothetically possible (and what I would do with a non-SDK tap) to batch the API into some time window (days, weeks, whatever), run in batches from oldest to newest, and emit state at the end of each batch, which would allow picking back up if the first run is interrupted. This seems like a potential use-case for partitions, generating a partition for each time window. A few questions about this though: • Is partitions really the recommended way to solve this, or is there some other recommended solution? • Will using partitions save a new state value for each time window? That seems unnecessary, as every time window partition should only be run once; the only state we need is the last partition that was completed • Is there a way to use both partitioning and child streams? This API call happens to also be a child stream, and when I specified my own partitions it broke the context from the parent stream. Is there a way to merge these contexts together? Is this a bug? If so I’d be happy to open an issue