Hey everyone, We're choosing our ELT stack and I'v...
# random
i
Hey everyone, We're choosing our ELT stack and I've landed on Meltano along with Airflow(Maybe Dagster) and dbt (Snowflake for DW). I was first planning on just setting this all up on a VM - but Azure Container Apps seems like a better/easier route (It's built on top of AKS). Has anyone had experience with a similar stack? If so - where should I store audit logs (Airflow requires a Postgres backend - I'll probably want to query these logs for troubleshooting)? Should I spin up containers for my Meltano/dbt jobs, respectively? Does there seem like a more efficient way to do this? Thank you!
a
Hi Ian. I use meltano with dagster, dbt and postgres, the dagster front end is exposed via a Container app. State store I use azure storage, and dagster logs go to postgres too. I don't spin up separate docker instances for my tasks, I run them directly on my container. A snipped version of
meltano.yml
Copy code
version: 1
include_paths:
- ./meltano-yml/dynamics-dev.yml
- ./meltano-yml/dynamics-prod.yml
default_environment: dev
project_id: 418e6bdf
plugins:
  extractors:
  loaders:
  utilities:
  - name: dagster
    variant: quantile-development
    pip_url: dagster-ext dagster-postgres dagster-dbt dbt-postgres dagster-azure
    settings:
    - name: dagster_home
      env: DAGSTER_HOME
      value: $MELTANO_PROJECT_ROOT/orchestrate/dagster
    commands:
      dev:
        args: dev -f $REPOSITORY_DIR/repository.py --dagit-host 0.0.0.0 -d $REPOSITORY_DIR
        executable: dagster_invoker
Dockerfile looks like this. I have my dbt models below the orchestrate/dagster directory, so I need to do things slightly differently to the stock Dockerfile.
Copy code
# registry.gitlab.com/meltano/meltano:latest is also available in GitLab Registry
ARG MELTANO_IMAGE=meltano/meltano:v2.20.0-python3.10
FROM $MELTANO_IMAGE

WORKDIR /project

# Install any additional requirements
COPY ./requirements.txt .
RUN pip install -r requirements.txt

COPY meltano.yml logging.yaml ga4_reports.json ./
ADD meltano-yml meltano-yml
ADD plugins plugins
# Copy over Meltano project directory
# COPY . .

RUN meltano install

# then copy dbt models in orchestrate folder
ADD orchestrate orchestrate

# overwrite dagster.yaml with contents of dagster_azure.yaml
RUN rm -rf ./orchestrate/dagster/dagster.yaml
COPY ./orchestrate/dagster/dagster_azure.yaml ./orchestrate/dagster/dagster.yaml

# Don't allow changes to containerized project files
ENV MELTANO_PROJECT_READONLY 1

# Expose default port used by `meltano ui`
EXPOSE 5000
# Expose port used for postgres connection
EXPOSE 5432

# Expose port used for postgres connection
EXPOSE 3000

ENTRYPOINT ["meltano"]
i
Gotcha - Does the Dagster Daemon run 24/7 in the same container as the web server? And for state store (I'm assuming azure blob storage) - is this just storing the Dagster log files? If your Dagster logs go to Postgres too - does that mean you have a process that pulls them from blob storage - or does each task run get logged straight to your Postgres db? (I'm assuming it's in a diff env)
a
`Does the Dagster Daemon run 24/7 in the same container as the web server?`Yes State store is for meltano json files. Dagster compute logs go to Azure blob store too: dagster_azure.yaml
Copy code
compute_logs:
  module: dagster_azure.blob.compute_log_manager
  class: AzureBlobComputeLogManager
  config:
    storage_account:
      env: DAGSTER_COMPUTE_LOG_STORAGE_ACCOUNT_NAME
    container:
      env: DAGSTER_COMPUTE_LOG_STORAGE_CONTAINER_NAME
    secret_key:
      env: DAGSTER_COMPUTE_LOG_STORAGE_KEY
General dagster logs go to the postgres dwh
i
Oh okay so your meltano extracts load into blob storage before you pull it into your dwh for dbt transforms? Could I theoretically log my Dagster run logs to my Snowflake dwh? Or would it have to go to blob storage first too? Where do you store environment variables?
Again - sorry for all the questions hahaha I really appreciate your help
a
No sorry, all tap data is in postgres. Only meltano state data and dagster compute logs go to azure storage. DBT runs from postgres (separate source schemas) -> postgres (central dbt schema) all in same instance. I don't know about snowflake unfortunately, sure the dagster community would have more info. I have a long string of secrets in secrets manager in azure, these are passed through the container as environmental variables. All managed via bicep IaC so not too bad once its set up.
i
Ahh alright gotcha. By secrets manager do you mean azure key vault? Also - why do you use bicep as opposed to github for your repo?
a
I use bicep to manage the Azure infrastructure, container app, registry etc. The main meltano repo is in github
Bicep is like terraform but Azure specific
i
Ahh okay. I'm assuming dbt is github too?
a
yes, dbt is part of my meltano repo because I haven't figured how to do it separately with a sidecar container yet 🙂
i
You're the man. Does the bicep repo / container registry come packaged with the Container Apps Service?
a
No it's separate, but you can find plenty of azure examples to adapt here. https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.app/container-app-acr
🙌 1
i
Gotcha. could you store your bicep repo in github too then? also - how long did this all take to set up? could you point me to some good resources to start?