I have a proof of concept working nicely, and am r...
# troubleshooting
n
I have a proof of concept working nicely, and am really encouraged by how clean and easy thsi all is. I’ve seen discussions of this in the channel, but want to make sure I’m following the current best practices for choosing a specific dbt model to run as a tranform in a scheduled job. Hoping someone might weigh on on how to avoid doing a full dbt run for every schedule.
t
yep, recommend you use the dbt selection syntax for specifying the set of models you want (usually something like @my_model or my_model+) and then you can alias that as a command for the dbt plugin https://docs.meltano.com/concepts/project#plugin-commands
n
ok so let’s say i’ve got this config:
Copy code
transformers:
  - name: dbt-snowflake
    pip_url: dbt-core~=1.0.0 dbt-snowflake~=1.0.0
    commands:
      bi_model:
        args: run --select +bi_model
        description: Run dbt, selecting model `my_model_name` and all upstream models.
          Read more about the dbt node selection syntax at <https://docs.getdbt.com/reference/node-selection/syntax>
  files:
seems like that would be step 1
i’m struggling to find how to reference that command in a schedule
where I think I should be able to get to is a set of dags that follow this pattern:
Copy code
1. extract tables foo, bar, baz from postgres
2. load tables foo. bar baz into snowflake
3. run dbt model that queries tables foo, bar and baz
what i haven’t figured out yet is how to properly configure a schedule with the
select
key to limit the list of tables it replicates, and how to reference that specific dbt command for the transform
I think based on some previous conversations I’ve read here on the subject, the answer to #3 is the environment variables, but it’s not clear which ones i need. When i run this command
meltano schedule run postgres-snowflake
with the config
Copy code
schedules:
- name: postgres-snowflake
  extractor: tap-postgres
  loader: target-snowflake
  transform: run
  interval: '@daily'
  start_date: 2022-04-26 17:58:03.162731
  env:
    MELTANO_EXTRACT_SELECT: public-table_1
    DBT_SNOWFLAKE_MODELS: bi_model
    MELTANO_TRANSFORM_NAME: dbt-snowflake
i get
Transformer 'dbt' is not known to Meltano
feels very close. just can’t figure out which env var (if this is the correct pattern) to tell the schedule use the
dbt-snowflake
transformer and to limit the select on the postgres tap to the one table.
t
so sorry - was in a meeting. So schedules don’t currently work with commands. schedules are currently super integrated with
meltano elt
which is limiting on what you can do with dbt. but! https://gitlab.com/meltano/meltano/-/issues/2924 is slated to be worked on starting next week and will basically solve this. You’d be able to use meltano run and the
jobs
definition to do whatever you want and then schedules could just reference that job
n
no worries! just trying to be as helpful as i can with context.
i appreciate your time!
t
the
Transformer 'dbt' is not known to Meltano
error makes sense to me too if you’re using the adapter-specific installation for dbt-snowflake.
meltano elt
is looking for the full
dbt
plugin
n
got it! yes that issue looks like exactly what I’m after. exciting that you’ll be breaking ground on it soon!
so in the meantime would the best thing be to simpy to make my own dag file and tee up a bunch of
meltano
commands to do the individual tasks?
t
most likely, yes. that said, @pat_nadolny has done some fun work making a custom dag generator that he shared in https://gitlab.com/meltano/files-airflow/-/merge_requests/8/diffs#60a335aa08790bf789eb4924a24ac9342dde7cb7 This is unlikely to get merged in to Meltano proper, but it can give you an idea of what you can do with Meltano and Airflow
n
neat! I think I’ll probably just tinker with some manual dags to get something up and running and just migrate stuff to jobs once those are released. Am I good to just write dags in the
/orchestrate/dags
folder or would that mess up whatever the autogeneration meltano is doing?
p
Yes that MR is an experimental dbt integration, I demoed it in

https://www.youtube.com/watch?v=pNGJ96HOioM&amp;list=PLO0YrxtDbWAvytzdULRNfvWDTErPr-qZG&amp;index=3

. Another option thats a little simpler and what we're running in production for our own meltano instance is https://gitlab.com/meltano/squared/-/tree/master/data/orchestrate. I wrote a custom dag generator that reads from a custom dag defintion yaml file. Its doing essentially what we'd hope to make a default feature of meltano as part of https://gitlab.com/meltano/meltano/-/issues/2924
n
awesome thank you Pat! Giving this a watch now.
p
@nick_james awesome and just to be totally clear what we run in production is https://gitlab.com/meltano/squared/-/tree/master/data/orchestrate. The demo video feature are definitely experimental features
n
that worked beautifully @pat_nadolny thank you!
this doesn’t feel real. amazing stuff.
p
@nick_james great to hear! I'm curious, which implementation did you end up trying?
n
i copied your dag generator script into my
/orchestrate/dags
folder and just started putting dag configs in a yaml file in
/orchestrate