:wave: Hi everyone, Can someone help me understan...
# random
r
👋 Hi everyone, Can someone help me understand what is the difference between Meltano and Singer?
a
Hi @rhys_fisher, I'm sure @douwe_maan can give a better answer but in summary: Singer is a framework for building what they call taps and targets. Taps are used to pull data out of source systems/APIs and targets load data into destinations such as your data warehouse
Meltano uses Singer for the EL part of ELT i.e the extraction and loading. But it also adds a lot of extra functionality to make working with Singer easier and also includes tools to provide an 'end-to-end' data workflow. For example, Meltano can use dbt to do transformations on your data using SQL. Also, it bundles Airflow which is used to orchestrate and schedule your data pipleines
r
@al_whatmough ok makes sense - ideally everyone would maintain singer rather than create their own code to extract the data then load to the format needed for say google bigquery or wherever you're storing it. Meltano uses singer but then has a bunch of functionality on top for transforming the data in some way. Isn't the bigger hard problem to solve letting users connect their credentials to authenticate into the systems - they're all behaving in slightly different ways and storing the tokens is as dangerous as having plaintext passwords in your database IMO. If we then had to hope that singer was up to date as well... just added complexity for no real gain unless I'm missing something big. Right now this feels like a false economy.
a
@rhys_fisher yeah Singer is really just an attempt at creating a standardised way of pulling data out of APIs and loading it into a warehouse (such as bigquery as you mention). I agree that storing the credentials is an issue but imo it's already being solved by projects such as Vault. I think there's been talk of using something like this in Meltano, it doesn't make sense to reinvent that wheel. Meltano is more of an open source equivalent of Stitch or Fivetran. Stitch actually maintains a lot of the Singer taps as you might already know. Pipelinewise is another framework which uses Singer to do extract/load but it doesn't have some of the conveniences of Meltano such as a frontend UI or a bundled orchestrator. Hopefully other people can expand on anything I've said!
d
@rhys_fisher As @al_whatmough was saying (thanks for taking this, Al 🙂), Singer is a standard that defines the behavior of executables that extract data from sources, and executables that load extracted data into destinations. Extraction executables following the Singer specification are called taps, and loader executables following the Singer specification are called targets. While taps and targets can be run together using a simple Unix pipe (
tap | target
), that leaves the management of configuration, catalog generation, and incremental state as an exercise to the reader. Meltano is a runner for Singer taps and targets, that adds some abstraction layers on top of the Singer fundamentals. It manages the configuration for you, for example: https://meltano.com/docs/configuration.html Beyond that, Meltano adds a UI like Stitch's or Fivetran's, support for orchestrators (e.g. Airflow), and support for dbt models as transformations, so that you can build full ELT pipelines made up of only open source components.
ok makes sense - ideally everyone would maintain singer rather than create their own code to extract the data then load to the format needed for say google bigquery or wherever you're storing it.
Exactly.
Meltano uses singer but then has a bunch of functionality on top for transforming the data in some way.
We also want to make the entire experience of using Singer taps and targets more accessible and easier to get right for production, which is where things like https://meltano.com/docs/configuration.html, https://meltano.com/docs/integration.html#selecting-entities-and-attributes-for-extraction, and https://meltano.com/docs/containerization.html come in.
Isn't the bigger hard problem to solve letting users connect their credentials to authenticate into the systems - they're all behaving in slightly different ways and storing the tokens is as dangerous as having plaintext passwords in your database IMO.
Meltano lets you expose sensitive settings through the environment, which you can use together with your deployment platform's secrets manager: https://meltano.com/docs/production.html#managing-configuration Would that help address your concern or are you referring to something else?
If we then had to hope that singer was up to date as well...
Not all Singer taps and targets are currently equally well maintained and supported, but we expect that to change as we keep working on better tooling around deploying and developing them: https://meltano.com/blog/2020/05/13/why-we-are-building-an-open-source-platform-for-elt-pipelines/
just added complexity for no real gain unless I'm missing something big. Right now this feels like a false economy.
How do you mean? I obviously don't think Meltano is "just added complexity for no real gain", but I'm not sure what you're getting at 🙂