FYI I have been working on a meltano deployment wi...
# docker
a
FYI I have been working on a meltano deployment with dagster and dbt in Azure for about a year now, I have tried various containerised platforms in Azure and have just migrated onto App Service which I hope will be a 'final resting place' for our infra, seems to have good value and high performance. If anyone has any questions I would be happy to answer them.
j
Hi Andy, thanks for offering to share your knowledge! I am starting to look into hosting my Docker container onto Azure like you are doing, using the following packages • tap-mssql-buzzcutnoram • target-snowflake • DBT (Snowflake) • Dagster(dagster-ext) I don't have much experience with Azure infrastructure, so I've been trying to read around as to best practices but it feels like a jungle. I was wondering if you would be willing to share your insights on how to get things set up w.r.t. • Meltano state backend? I don't want to lose state every time I make a new change to my docker container • Assuming production dbs exist in your azure env as well, how hard it is to connect to it from Meltano (i think you're using postgres but the idea would be the same?) • Any "gotchas" you ran into in general? • Extra configuration needed anywhere?
Hoping to raise this again; I have been successful in loading my container to Azure using
Container App
, running meltano commands in shell, and then seeing dagster load in azure. • Meltano state backend? I don't want to lose state every time I make a new change to my docker container What I am trying to figure out right now is figure out the best way to host the backend state/sqllite db server so that we don't lose state/data when we have to rebuild the container. Is there any advice or setup recommendations to take care of this?
e
j
ya that's what we're hoping for
Looking through the documentation we'll need to set the following env vars? •
AZURE_STORAGE_CONNECTION_STRING
(or is it
MELTANO_STATE_BACKEND_AZURE_STORAGE_ACCOUNT_URL (
MELTANO_STATE_BACKEND_URI
e
Rather: •
MELTANO_STATE_BACKEND_URI
MELTANO_STATE_BACKEND_AZURE_STORAGE_ACCOUNT_URL
or
MELTANO_STATE_BACKEND_AZURE_CONNECTION_STRING
j
Thanks Edgar, always appreciated.
e
np
a
Hi @joshua_janicas I have the Azure state backend working well, I believe you can now use a Managed Identity if you (and your ops sec team) prefer that to a plaintext connection string. I haven't tried that though.
👍 1
Regarding Container apps (ACA), I used that for a while but eventually I found the memory limits of 4GB too low to use dagster to orchestrate meltano and dbt. I was running meltano in the same container as dagster though.
I can give you some code examples for your meltano assets and linking them to dbt assets. There is quite a bit of boilerplate though and knowledge of dbt asset groups.
Regarding production db, yes I use postgres flexible which works well for our smaller data scale. For network access I restrict access to certain site IPs and also have to permit all azure related IPs. I think pre-generate some logins and pass this as an env variable to the container for meltano to use. You could probably vnet it if you like but that's a bit above my expertise.
j
Regarding Container apps (ACA), I used that for a while but eventually I found the memory limits of 4GB too low to use dagster to orchestrate meltano and dbt. I was running meltano in the same container as dagster though.
How much memory does your container need, may I ask?
I can give you some code examples for your meltano assets and linking them to dbt assets. There is quite a bit of boilerplate though and knowledge of dbt asset groups.
I'd be more than happy to compare notes!!!
Regarding production db, yes I use postgres flexible which works well for our smaller data scale. For network access I restrict access to certain site IPs and also have to permit all azure related IPs. I think pre-generate some logins and pass this as an env variable to the container for meltano to use. You could probably vnet it if you like but that's a bit above my expertise.
Hi @joshua_janicas I have the Azure state backend working well, I believe you can now use a Managed Identity if you (and your ops sec team) prefer that to a plaintext connection string. I haven't tried that though.
Good to know, thank you!
a
I now use a 16g instance in Azure App Service, but I have one particularly memory hungry tap reading huge csv, and my dagster instance alone consumes around 2.5g 3g so only left me with a 1g overhead to run my taps and targets in.
One downside of AAS is that you can't (currently) ssh into the docker container itself and execute meltano commands directly. You can do this in ACA. However, I've found many fewer issues with memory etc in AAS so much less need to actually do this.
👍 1
@joshua_janicas do you have a structured dbt project you are beginning with?
j
i've been able to get dagster assets materialized as well at this point
i remember following some of (i think your) github discussions around how to best connect meltano and dbt asssets together in dagster and i worked off that a bit as a great starting pointing
a
Do you have the lineage working between meltano and dagster?
Hard to tell from your screenshot but maybe🙂
j
i think the answer here is yes? 🤔
🎉 1
a
I would consider grouping your assets in dbt, then you can associate a subset of your dbt models with a particular meltano tap.
So if you have a job called
salesforce
in meltano that does
meltano run tap-salesforce target-bigquery
then you have a dbt group tagged with
salesforce
this also does
dbt run --tags:salesforce
Then you can have a dagster asset job called `salesforce`that runs your meltano job (tap & target combo), and runs all dbt models associated with that job name. This is assuming you are going to grow beyond one tap.
j
at the moment we only have one tap, but that might very well change long term. i'll definitely keep the
tags
in mind, haven't thought about that. Thank you!
a
I tend to separate my models by folder with
staging
eg
staging/freshdesk/*.sql
so I can dbt tag them by folder path. https://docs.getdbt.com/reference/resource-configs/tags
melty bouncy 1
❤️ 1
You can then create a dagster asset group that includes meltano and dbt assets, set dagster schedules for the asset job etc.