Andy Carter
02/24/2025, 2:52 PMget_records
in the normal way.
Instead of checking and rechecking for each new file in storage, I've discovered I can check for pipeline logs to see as each table / stream file is complete, then just read the file when I know it's saved to storage. However, I don't want to replace my 'file check' code with 'pipeline log check' code in each stream, as the rest call takes a while.
Is there a process I can run asynchronously at the tap
level every 10 seconds or so, and in my stream.get_records()
check the tap's cached version of the logs from ADF, and emit records if appropriate?
Ideally I don't want to wait for the whole pipeline to finish before I start emitting records - some data is ready in seconds but others take minutes.Edgar Ramírez (Arch.dev)
02/25/2025, 3:08 PMAndy Carter
02/25/2025, 4:24 PM