Howdy everyone I ve been using singer for a few months now a Meltano #best-practices

Howdy everyone. I’ve been using singer for a few ...

borna_almasi

05/18/2021, 9:17 PM

Howdy everyone. I’ve been using singer for a few months now and I’ve noticed some gaps in the spec. I wanted to get an idea of how others are thinking about these perceived shortcomings and thoughts on extension of the spec: 1. No support for “raw artifacts”: I typically write taps against metered/paid APIs and send them to an s3/gcs target. I want these to be in as raw a format as possible (ELT) so if there is any issues with code, I don’t have to call the APIs again and incur a cost. For example, the raw artifact could be an API response, or it could be a ZIP file from an ftp server. My thoughts: I think the singer spec can be extended to support API -> Artifact -> Records to support this and have a design in mind. Is anyone else dealing with similar issues? 2. Recovery from hard failures: For long-running replications, things can go wrong on the machine itself. Hardware/virtual hardware fails. If the machine crashes without the process having a chance to exit and the state being made available, we lose the state. My thoughts: I can imagine a world where the persistence of state is a sort of adapter and not purely reliant on the file system. Or perhaps some “Singer middleware spec” as a pipe that sits between the tap and the target and Has anyone else experienced these failures? How have you handled them? 3. ENV var interpolation for the config files I’m often running these singer workloads in containers. It’s quite a painful dance to pass environment variables to the singer config files. Why not provide first-class support for env vars via

{{ MY_ENV_VAR }}

in the config files?

aaronsteers

05/18/2021, 11:59 PM

Hi, @borna_almasi - Each great questions…. I’ll reply to each in a separate thread so we can keep conversation in a good flow. 🙂

Open in Slack

Previous Next