Bit of an abstract question, but we have a lot of custom sources that are highly rate limited, and so we cache calls (because we don’t trust checkpointing alone enough). If we see API calls as the E, and caching as the L in ETL, then we basically have a highly coupled E/L that are hard to decouple. Except if we see caching as part of E, and do a separate L step on top.
Has anyone else faced this? Is there a common approach to “cached E”? Or should we take this a different route – stop caching, and invest more in reliable checkpointing so we don’t make the same calls twice?
Would love to hear some opinions