hi everyone, looking into meltano to replace our current etl atm. looks like a good fit because I think it will make it super easy for others in my company to contribute new sources and using dbt to contribute new analysis.
However I do have a very specific data-source: blockchain rpcs. They are notoriously unreliable data-sources because they fail often and are rate-limited.
Our current extractor handles things like:
• pulls a list of contracts from a db to only get data for those contracts, maintains an import state for each contract
• handles multi-process importer rate limiting using redis shared locks
• does out-of-order processing: generates a bunch of queries, tries a query, if it fails, this gets persisted to the import state for failed contracts ranges and retried on next run
• Optimize queries for least number of queries to lower the paid rpc bill-per-query
• Select the best rpc for each type of query and does rpc load fanning when multiple rpc are available for a single chain
Given what I know about singer and the current complexity of this extractor, I think the best approach would be to use this extractor as is as a singer tap and plug it in the meltano ecosystem but it's written in typescript so idk how that would work.