I have a tap that is based on an Azure Datafactory ADF pipel Meltano #singer-tap-development

I have a tap that is based on an Azure Datafactory...

Andy Carter

02/24/2025, 2:52 PM

I have a tap that is based on an Azure Datafactory (ADF) pipeline run - it's a long story.... The tap class itself triggers a pipeline run, which extracts csv data, and saves it into different named files (aligning to tables of a database). Each sdk stream (40+ streams) is checking for the presence of the new file corresponding to its table in storage (using backoff). Once the file arrives it is read and then emitted via the stream in

get_records

in the normal way. Instead of checking and rechecking for each new file in storage, I've discovered I can check for pipeline logs to see as each table / stream file is complete, then just read the file when I know it's saved to storage. However, I don't want to replace my 'file check' code with 'pipeline log check' code in each stream, as the rest call takes a while. Is there a process I can run asynchronously at the
tap
level every 10 seconds or so, and in my
stream.get_records()
check the tap's cached version of the logs from ADF, and emit records if appropriate? Ideally I don't want to wait for the whole pipeline to finish before I start emitting records - some data is ready in seconds but others take minutes.

👀 1

Edgar Ramírez (Arch.dev)

02/25/2025, 3:08 PM

Interesting use case! I don't think there's a builtin way to accomplish that, but if happy to help you rubber duck 🙂

Andy Carter

02/25/2025, 4:24 PM

I think I am far off the intended use cases for meltano 🙂 I will let you know if I manage to cook something up.

👍 1

Open in Slack

Previous Next