hello, i've set up a very simple meltano dbt pipel...
# random
k
hello, i've set up a very simple meltano dbt pipeline to get customer data from a CSV into a DB for use by a SaaS developer, with some dbt tests. (Context: some of our customers subscribe outside the main payment platform & we need a way to sync their information). I started using meltano for analytics from multiple sources & that's STILL my main intention--which is a separate pipeline. This use case would be more operational. ...my coworker asked
I’m struggling to justify:
• the amount of configuration/code written
• the added dependencies for meltano/DBT. more dependencies =
◦ devops / maintenance burden
◦ developer learning overhead
◦ security risk
that this solution entails.
He proposed a "pure postgres solution that is transactional, enforces constraints, is trivially testable, and cleans itself up"
Could you explain why you’re a proponent of Meltano/DBT for this job?
I can answer this for my use case in analytics... but not sure about the operational case. Any thoughts on how to respond? or which solution is more logical? What advantages are yielded in this situation where's theres only 1, straightforward source?
a
Hi, @kathryn_cowie. It's sometimes hard to communicate to folks why DevOps and CI/CD are innately valuable. I liken it to trying to convince someone that agile or scrum approaches are better than the 'older' traditional waterfall project management cycles. While it's often hard to "prove" which is the better approach abstractly, in practice the differences become more and more pronounced over time. What might be helpful is to start with confidence, safety, and reproducibility that Meltano, dbt, and a DevOps/DataOps approach can provide. If you make the changes directly in Postgres, how will you know when your Postgres code changes? How could you roll back? If/when there's a typo or bug in your Postgres code, how would you debug it and how would you replay it. With a DataOps approach using Meltano and dbt, the entire paradigm is built on confidence and stability promises. The other thing I'll add is that the benefit of the investment in a DataOps approach is that the built-in safety/stability promises will make iteration much faster over the long run and increased confidence across a team to maintain over the lifetime of the solution. You can hand off your code to anyone and they'll be able to learn the solution and iterate on it without risk breaking anything. Compare that with store procedures or other scripts directly in the database - someone will find them 2 or 5 years later and will have no idea how to maintain them, or if they are still even needed.
Hope this helps. And I expect others probably have their own unique experiences and perspectives.
There are probably other good ways to manage the operational use case which still have DevOps best practices - but not knowing what other dev and deployment patterns you have already in-house for use similar use cases, it's hard to comment on the comparison with other alternatives in particular.
c
Compare that with store procedures or other scripts directly in the database - someone will find them 2 or 5 years later and will have no idea how to maintain them, or if they are still even needed.
IMO, this is the key benefit of why Schema Management as code (alembic) and DataOps (Meltano,dbt) are beneficial. Traceability and Visibility and built-in documentation provide a solution that "explains itself" AND that can support "code (git) archaeology" (that's my favourite part)
k
^for the alternative, the script would not be stored in the database, it would be checked into version control. I personally agree--the generated docs greatly aid understanding and add needed context
Being new to data engineering, it's easier for me to jump in on a new tool than to convince established developers to learn something new, if the outcome can be achieved in paradigm they already trust (even one with fewer features).
"it’s hard to disentangle 'things that just work' from things that i’m familiar/comfortable with"
c