Hello, everyone! I'm pretty new here, but for the past couple of weeks, I've been delving into Meltano and the SDK. At work, we're creating a tap, and while we've managed to make it perform its intended function, I've encountered a few issues and have some questions about our approach.
Our ultimate goal is to load data from an Excel file into Snowflake. This extractor is making calls to an endpoint to retrieve an object with data from a series of hosted files. With this data, we then call another endpoint, using the most recent record of a specific file name (an Excel file).
Firstly, how would you go about it? Would you opt for two streams or just one? My other question is this: if you were to implement a dynamic schema discovery function, how would you handle it?
Currently, our working version employs only one stream, and the schema discovery is managed by a dedicated schema function. This function retrieves data from the API and infers the schema, based on the response received(The transformed contents of the excel file).
Lastly, and my current issue, the schema discovery function works, but only when it's cached. Otherwise, it gets 'stuck' and fails to complete the job. As a newcomer, my debugging skills are limited, and I'm finding it challenging to comprehend this behaviour (the caching was merely my intuition, without fully grasping what might be happening). Can someone kindly explain to me what could be causing this issue? Or perhaps pointing me to where I could start debugging it?
While it's currently working as expected, I'm a bit concerned about potential surprises down the road. Thank you for all the sharing; it has already been immensely helpful to me. :) Have a great day!