I have a tap that is based on an Azure Datafactory...
# singer-tap-development
a
I have a tap that is based on an Azure Datafactory (ADF) pipeline run - it's a long story.... The tap class itself triggers a pipeline run, which extracts csv data, and saves it into different named files (aligning to tables of a database). Each sdk stream (40+ streams) is checking for the presence of the new file corresponding to its table in storage (using backoff). Once the file arrives it is read and then emitted via the stream in
get_records
in the normal way. Instead of checking and rechecking for each new file in storage, I've discovered I can check for pipeline logs to see as each table / stream file is complete, then just read the file when I know it's saved to storage. However, I don't want to replace my 'file check' code with 'pipeline log check' code in each stream, as the rest call takes a while. Is there a process I can run asynchronously at the
tap
level every 10 seconds or so, and in my
stream.get_records()
check the tap's cached version of the logs from ADF, and emit records if appropriate?
Ideally I don't want to wait for the whole pipeline to finish before I start emitting records - some data is ready in seconds but others take minutes.
👀 1
e
Interesting use case! I don't think there's a builtin way to accomplish that, but if happy to help you rubber duck 🙂
a
I think I am far off the intended use cases for meltano 🙂 I will let you know if I manage to cook something up.
👍 1