Hi we are nascent Meltano+DBT adopters I wanted to get gener Meltano #best-practices

Hi, we are nascent Meltano+DBT adopters. I wanted...

binoy_shah

04/22/2022, 3:33 PM

Hi, we are nascent Meltano+DBT adopters. I wanted to get generic consensus on how people are managing Meltano codebase. • Mono-Repo or Per Pipeline Repo for Meltano • Embedded DBT Codebase per pipeline or Separate DBT codebase • Use Meltano UI or Use Airflow/Dagster UI • What / How do you manage Observability of pipelines and data flow/journey

visch

04/22/2022, 3:35 PM

I generally go for MonoRepo Embdeded DBT I use the orchestrator UI for me that can be SQL Agents (eek) or github/gitlab • What / How do you manage Observability of pipelines and data flow/journey Same as orchestrator. When I need more "observability" I make new dbt models that give me information about things I care about

visch

04/22/2022, 3:36 PM

I almost never use the UI I rely on notifications that I have setup, and then I"ll check a UI if needed. 🤷

binoy_shah

04/22/2022, 3:36 PM

With Mono-repo, would each pipeline have its own parent folder ?

binoy_shah

04/22/2022, 3:37 PM

or are the pipelines co-mingled..

visch

04/22/2022, 3:37 PM

co mingled, it's not a big deal for me. Let me count

binoy_shah

04/22/2022, 3:37 PM

is it following S.R.P ? https://en.wikipedia.org/wiki/Single-responsibility_principle

visch

04/22/2022, 3:38 PM

I care about making things easy for me and the team I don't really care about principles I use principles as guides

binoy_shah

04/22/2022, 3:38 PM

👆 yes I have a long 20+ years Software Engg background 🙂

visch

04/22/2022, 3:39 PM

As much as possible, the beauty of one repo is your definitions are all in one place vs how almost every other tool I've used is set up. They have things all over the place

binoy_shah

04/22/2022, 3:39 PM

Right, so then what do you consider best strategies, Naming, folder/file organization so that the pipelines dont end up in hot mess..

visch

04/22/2022, 3:40 PM

When they are hot messes

visch

04/22/2022, 3:40 PM

Not until then

visch

04/22/2022, 3:40 PM

Normally triggered when someone new gets added to maintaining the repo it's a good way for them to learn 🤷

binoy_shah

04/22/2022, 3:41 PM

so you only have 1

meltano.yml

file in your mono-repo ?

visch

04/22/2022, 3:42 PM

yep

binoy_shah

04/22/2022, 3:42 PM

what is your deployment…

visch

04/22/2022, 3:42 PM

But that works for us we only have 4 extractors, and 2-3 loaders

visch

04/22/2022, 3:42 PM

inherited a bunch but generally that's it if we had 20 it'd be different

binoy_shah

04/22/2022, 3:42 PM

we have 2-3 extractors and 1 loader currently

binoy_shah

04/22/2022, 3:43 PM

where/how do you deploy.. and what’s your orchestrator ?

visch

04/22/2022, 3:43 PM

git repo, gitlab. deploy to a window server orchestrator is sql agent jobs 😉

binoy_shah

04/22/2022, 3:44 PM

We have a Kubernetes ecosystem, we’re deploying each pipeline as singular docker image in kube

visch

04/22/2022, 3:44 PM

Good if you already have a k8s ecosystem you should 100% do that!

binoy_shah

04/22/2022, 3:45 PM

so if I have facility to do one pipeline per deployment mechanism, do you think a co-mingled deployment is still better vs each meltano pipeline as docker image..

binoy_shah

04/22/2022, 3:45 PM

not using any Kubernetes operator..

visch

04/22/2022, 3:46 PM

I personally would as it's easier to maitain one docker image

visch

04/22/2022, 3:46 PM

meltano handles all your dependency mangement

binoy_shah

04/22/2022, 3:46 PM

each docker has a entrypoint that starts meltano and points to a single meltano ui location

visch

04/22/2022, 3:46 PM

Doesn't mean you have to do that 🙂

binoy_shah

04/22/2022, 3:47 PM

but currently each meltano starts its own airflow scheduler

visch

04/22/2022, 3:47 PM

Copy code

stages:
- build
- run

#Only triggered via a scheduled run. We pull the latest Docker image to run the job with
#Using the docker image is faster to run as we don't have to install meltano or the tap/target packages
runner:
  image:
    name: $CI_REGISTRY_IMAGE:latest
    entrypoint: [""]
  before_script:
  - cp -Rn /project/. . #Copy meltano project into image
  stage: run
  variables:
    TARGET_POSTGRES_PASSWORD: $TAP_POSTGRES_PASSWORD
    TARGET_POSTGRES_HOST: $TAP_POSTGRES_HOST
    DBT_HOST: $TAP_POSTGRES_HOST
    DBT_PASSWORD: $TAP_POSTGRES_PASSWORD
    POSTGRES_PASSWORD: $TAP_POSTGRES_PASSWORD

  services:
  - name: postgres
  script:
  - "meltano run tap-toggl target-postgres dbt:run tap-postgres target-apprise"
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"

binoy_shah

04/22/2022, 3:47 PM

but airflow backend db is shared

visch

04/22/2022, 3:47 PM

Just giving you options 🤷

visch

04/22/2022, 3:47 PM

Copy code

stages:
- build
- run

#Only triggered via a scheduled run. We pull the latest Docker image to run the job with
#Using the docker image is faster to run as we don't have to install meltano or the tap/target packages
runner:
  image:
    name: $CI_REGISTRY_IMAGE:latest
    entrypoint: [""]
  before_script:
  - cp -Rn /project/. . #Copy meltano project into image
  stage: run
  variables:
    TARGET_POSTGRES_PASSWORD: $TAP_POSTGRES_PASSWORD
    TARGET_POSTGRES_HOST: $TAP_POSTGRES_HOST
    DBT_HOST: $TAP_POSTGRES_HOST
    DBT_PASSWORD: $TAP_POSTGRES_PASSWORD
    POSTGRES_PASSWORD: $TAP_POSTGRES_PASSWORD

  services:
  - name: postgres
  script:
  - "meltano run tap-toggl target-postgres dbt:run tap-postgres target-apprise"
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule" 

# Tags <project>:<sha>
# Tags <project>:<ref> (<branch> or <tag>)
# Tags <project>:latest
# Saves us time by building the Docker file once when things change. Runner runs a lot (every 15 minutes or so), docker-build-latest runs infreqently
docker-build-latest:
  stage: build
  image: docker:stable
  variables:
    DOCKER_DRIVER: overlay2
    MELTANO_IMAGE: meltano/meltano
  before_script:
  - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  - docker pull $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_NAME || true
  services: ["docker:dind"]
  script:
  - >
    docker build
    --cache-from $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_NAME
    --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_NAME
    --build-arg MELTANO_IMAGE=$MELTANO_IMAGE
    .
  - docker tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA $CI_REGISTRY_IMAGE:latest
  - docker push $CI_REGISTRY_IMAGE:latest
  rules:
  - if: $CI_COMMIT_BRANCH == "main" && $CI_PIPELINE_SOURCE != "schedule"

visch

04/22/2022, 3:48 PM

For a k8s deploy I'd look at @ken_payne’s work wherever it is at

visch

04/22/2022, 3:48 PM

He did a bunch of work with helm and how to deploy a standalone system, I'm just showing some other options

binoy_shah

04/22/2022, 3:49 PM

sure, what is the above code snippet.. i dont exaclty recognize the syntax

visch

04/22/2022, 3:49 PM

gitlab ci

binoy_shah

04/22/2022, 3:49 PM

ah okay

binoy_shah

04/22/2022, 3:50 PM

so prebuilt docker image with all taps/targets pre-installed, just copy the new layer of codebase and push and run

visch

04/22/2022, 3:50 PM

That's the way I like it 🤷

binoy_shah

04/22/2022, 3:51 PM

pre-built image auto-refreshes to

latest

i am guessing

visch

04/22/2022, 3:51 PM

That's these two

Copy code

- docker tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA $CI_REGISTRY_IMAGE:latest
  - docker push $CI_REGISTRY_IMAGE:latest

Note that this docker stuff comes from Meltano's file-docker stuff

visch

04/22/2022, 3:52 PM

https://docs.meltano.com/guide/containerization

visch

04/22/2022, 3:52 PM

https://docs.meltano.com/guide/production#:~:text=outside%20the%20container.-,Kubernetes,-Hosting%20a%20containerized is the stuff by Ken I was talking about

binoy_shah

04/22/2022, 3:59 PM

Oh thanks, I’ll look into Ken’s work

Open in Slack

Previous Next