I don t understand why `the Meltano convention is to name th Meltano #best-practices

I don't understand why `the Meltano convention is ...

juan_luis_cano_rodriguez

07/22/2022, 2:33 PM

I don't understand why

the Meltano convention is to name the model directory after the extractor using snake_case (i.e. tap_gitlab)

. for an ELT (E -> L -> T), wouldn't it be more appropriate to name it after the *L*oader (target)? after all,

Once your raw data has arrived in your data warehouse, its schema will likely need to be transformed to be more appropriate for analysis.

(emphasis mine from https://docs.meltano.com/getting-started#transform-loaded-data-for-analysis)

juan_luis_cano_rodriguez

07/22/2022, 2:36 PM

or maybe dbt reads the schema from the source (tap)? I'm confused

visch

07/22/2022, 2:59 PM

Even if you don't want to use DBT https://courses.getdbt.com/courses/fundamentals explained the why to me takes a bit though

juan_luis_cano_rodriguez

07/22/2022, 3:00 PM

I've approached that course 3 times, I always say "too much theory" or "why do I need a dbt Cloud account when I have the open source version" 😅

visch

07/22/2022, 3:00 PM

You don't need to use the cloud to take the course (I just followed on my terminal) The quick answer to your question is you have N data sources, how do you put those into your target? You need to seperate them somehow

juan_luis_cano_rodriguez

07/22/2022, 3:15 PM

you have N data sources, how do you put those into your target? You need to seperate them somehow

hmm but that's the EL part, right? IIUC, the T part came after putting all the sources into the target

visch

07/22/2022, 3:15 PM

yes it does, I don't understand your question sorry!

juan_luis_cano_rodriguez

07/22/2022, 3:16 PM

then for the meltano models (using dbt), one could ignore what the sources are, since all the data is now in the target. that's why I don't get why the model names should refer to the sources. let me know if it's still not clear 😅

alexander_butler

07/22/2022, 3:17 PM

I separate staging models by source. I couldn’t imagine why I wouldn’t do it. It's both intuitive and helps with configuration. That being said meltano supports and recommends running dbt in run commands which don't require any specific folder name.

alexander_butler

07/22/2022, 3:18 PM

Basically TLDR is you don't need to name folders or sources tap_ anything. Just run dbt via meltano and it works like a normal project.

visch

07/22/2022, 3:18 PM

@juan_luis_cano_rodriguez yeah too many words in there that aren't defined in my mind.

meltano models using dbt

, can you just show what you mean via code?

juan_luis_cano_rodriguez

07/22/2022, 3:19 PM

basically this part of the getting started guide:

juan_luis_cano_rodriguez

07/22/2022, 3:20 PM

For example, the
/transform/models/tap_gitlab/source.yml
below configures dbt sources from the postgres tables where our tap-gitlab EL job output to

I... need to re-read that a few more times

visch

07/22/2022, 3:20 PM

What about it? I'm not following what your issue is at all I'm sorry!

juan_luis_cano_rodriguez

07/22/2022, 3:22 PM

in my head, I wanted to treat the sources as an "implementation detail" that the models don't need to be aware of, since the data is already in the target and the models don't care where did it come from. but I guess it's what @alexander_butler said: that you can separate the models by source for debuggability purposes.

visch

07/22/2022, 3:22 PM

Do you have a DBT project setup right now?

juan_luis_cano_rodriguez

07/22/2022, 3:23 PM

yep, it's working. I just didn't get the naming convention

visch

07/22/2022, 3:23 PM

Can you show your code that points to naming conventions you don't understand?

juan_luis_cano_rodriguez

07/22/2022, 3:24 PM

Copy code

mkdir /transform/models/tap_gitlab/

juan_luis_cano_rodriguez

07/22/2022, 3:24 PM

why

tap_gitlab

and not, say,

target_bigquery

. that's the root of the question

visch

07/22/2022, 3:25 PM

You can call the folder whatever you want, can you show the code for

cat /transform/models/tap_gitlab/*

juan_luis_cano_rodriguez

07/22/2022, 3:29 PM

the guide puts a

source.yml

with

Copy code

config-version: 2
version: 2
sources:
  - name: tap_gitlab
    schema: public
    tables:
      - name: commits
      - name: tags

and a

commits_last_7d.sql

with

Copy code

{{
  config(
    materialized='table'
  )
}}

select *
from {{ source('tap_gitlab', 'commits') }}
where created_at::date >= current_date - interval '7 days'

but this is what I don't get. I thought that the transformation phase happened solely inside the warehouse/target. but the naming convention and the

source.yml

seem to indicate that dbt is reading the data from the source.

juan_luis_cano_rodriguez

07/22/2022, 3:30 PM

"dbt reading the data from the source" sounds a lot like Extract, Transform, Load to me. supposedly if I'm doing Extract, Load, Transform is because I want to decouple (EL) from (T). and if I decouple them, then (T) does not need to know anything about (E).

visch

07/22/2022, 3:30 PM

dbt pulls data from your DW, there's a

profiles.yml

that points to your DW.

juan_luis_cano_rodriguez

07/22/2022, 3:30 PM

maybe this is all very easy and very obvious but... I dunno, it's Friday afternoon

visch

07/22/2022, 3:30 PM

Copy code

config-version: 2
version: 2
sources:
  - name: tap_gitlab
    schema: public
    tables:
      - name: commits
      - name: tags

Is a table in your DW

visch

07/22/2022, 3:31 PM

I think we're getting there!

juan_luis_cano_rodriguez

07/22/2022, 3:31 PM

so this is just because the name of the table in the DW is...

tap_gitlab

visch

07/22/2022, 3:31 PM

This might come from meltano magically connecting dbt to your DW for you!

visch

07/22/2022, 3:31 PM

yes!

alexander_butler

07/22/2022, 3:31 PM

I gotchu Here is a real world example of a staging folder Each folder is named after a data source/schema In the "getting started" it is called tap_gitlab not because it came from a tap in EL but because that is the default schema name..

juan_luis_cano_rodriguez

07/22/2022, 3:32 PM

omg I get it now

visch

07/22/2022, 3:32 PM

dance

juan_luis_cano_rodriguez

07/22/2022, 3:32 PM

hahahaha

juan_luis_cano_rodriguez

07/22/2022, 3:32 PM

thanks folks, really appreciated 🙏

visch

07/22/2022, 3:39 PM

look at

.meltano/transformers/dbt

./compiled has the generated sql as well if you want to peruze and trust it some more. Also running dbt in debug works

2 Views

Open in Slack

Previous Next