I don't understand why `the Meltano convention is ...
# best-practices
j
I don't understand why
the Meltano convention is to name the model directory after the extractor using snake_case (i.e. tap_gitlab)
. for an ELT (E -> L -> T), wouldn't it be more appropriate to name it after the *L*oader (target)? after all,
Once your raw data has arrived in your data warehouse, its schema will likely need to be transformed to be more appropriate for analysis.
(emphasis mine from https://docs.meltano.com/getting-started#transform-loaded-data-for-analysis)
or maybe dbt reads the schema from the source (tap)? I'm confused
v
Even if you don't want to use DBT https://courses.getdbt.com/courses/fundamentals explained the why to me takes a bit though
j
I've approached that course 3 times, I always say "too much theory" or "why do I need a dbt Cloud account when I have the open source version" 😅
v
You don't need to use the cloud to take the course (I just followed on my terminal) The quick answer to your question is you have N data sources, how do you put those into your target? You need to seperate them somehow
j
you have N data sources, how do you put those into your target? You need to seperate them somehow
hmm but that's the EL part, right? IIUC, the T part came after putting all the sources into the target
v
yes it does, I don't understand your question sorry!
j
then for the meltano models (using dbt), one could ignore what the sources are, since all the data is now in the target. that's why I don't get why the model names should refer to the sources. let me know if it's still not clear 😅
a
I separate staging models by source. I couldn’t imagine why I wouldn’t do it. It's both intuitive and helps with configuration. That being said meltano supports and recommends running dbt in run commands which don't require any specific folder name.
Basically TLDR is you don't need to name folders or sources tap_ anything. Just run dbt via meltano and it works like a normal project.
v
@juan_luis_cano_rodriguez yeah too many words in there that aren't defined in my mind.
meltano models using dbt
, can you just show what you mean via code?
j
basically this part of the getting started guide:
For example, the
/transform/models/tap_gitlab/source.yml
below configures dbt sources from the postgres tables where our tap-gitlab EL job output to
I... need to re-read that a few more times
v
What about it? I'm not following what your issue is at all I'm sorry!
j
in my head, I wanted to treat the sources as an "implementation detail" that the models don't need to be aware of, since the data is already in the target and the models don't care where did it come from. but I guess it's what @alexander_butler said: that you can separate the models by source for debuggability purposes.
v
Do you have a DBT project setup right now?
j
yep, it's working. I just didn't get the naming convention
v
Can you show your code that points to naming conventions you don't understand?
j
Copy code
mkdir /transform/models/tap_gitlab/
why
tap_gitlab
and not, say,
target_bigquery
. that's the root of the question
v
You can call the folder whatever you want, can you show the code for
cat /transform/models/tap_gitlab/*
j
the guide puts a
source.yml
with
Copy code
config-version: 2
version: 2
sources:
  - name: tap_gitlab
    schema: public
    tables:
      - name: commits
      - name: tags
and a
commits_last_7d.sql
with
Copy code
{{
  config(
    materialized='table'
  )
}}

select *
from {{ source('tap_gitlab', 'commits') }}
where created_at::date >= current_date - interval '7 days'
but this is what I don't get. I thought that the transformation phase happened solely inside the warehouse/target. but the naming convention and the
source.yml
seem to indicate that dbt is reading the data from the source.
"dbt reading the data from the source" sounds a lot like Extract, Transform, Load to me. supposedly if I'm doing Extract, Load, Transform is because I want to decouple (EL) from (T). and if I decouple them, then (T) does not need to know anything about (E).
v
dbt pulls data from your DW, there's a
profiles.yml
that points to your DW.
j
maybe this is all very easy and very obvious but... I dunno, it's Friday afternoon
v
Copy code
config-version: 2
version: 2
sources:
  - name: tap_gitlab
    schema: public
    tables:
      - name: commits
      - name: tags
Is a table in your DW
I think we're getting there!
j
so this is just because the name of the table in the DW is...
tap_gitlab
?
v
This might come from meltano magically connecting dbt to your DW for you!
yes!
a
I gotchu Here is a real world example of a staging folder Each folder is named after a data source/schema In the "getting started" it is called tap_gitlab not because it came from a tap in EL but because that is the default schema name..
j
omg I get it now
v
dance
j
hahahaha
thanks folks, really appreciated 🙏
v
look at
.meltano/transformers/dbt
./compiled has the generated sql as well if you want to peruze and trust it some more. Also running dbt in debug works