Hi I have a couple of questions regarding transformers Pleas Meltano #getting-started

Hi. I have a couple of questions regarding transfo...

christopher_kintzel

12/06/2022, 2:37 PM

Hi. I have a couple of questions regarding transformers. Please note, that I am just starting out with meltano, have a working setup with a custom tap, a postgres target, and multiple schedules executed by the included airflow orchestrator, but have not used dbt before. 1. I see a warning on https://docs.meltano.com/guide/transformation , that transform plugins are de-prioritized, but am not sure what this means. Is every "transformer" a "transform plugin" and I should not use this feature at all, or is only the "dbt" package deprecated, and the usage of dbt-postgres is still supported, or ...? 2. I have added a

dbt-postgres

transformer to my existing project with

meltano add transformer dbt-postgres

. This added a new section in my

meltano.yml

and created a bunch of files in the transform directory. I have created a custom

source.yml

and

my_new_table.sql

within

./transform/models/<my_project_name>

. I can successfully execute those by running

meltano run dbt-postgres:run

meltano invoke dbt-postgres:run

locally, but am failing to run them as part of my schedules. a. I tried enabling it by setting

transform: run

within a schedule. The schedule now fails with this error:

{"error": "Plugin 'Transformer 'dbt' not found.\nUse of the legacy 'dbt' Transformer is deprecated in favor of new adapter specific implementations (e.g. 'dbt-snowflake') compatible with the 'meltano run ...' command.\n<https://docs.meltano.com/guide/transformation>\nTo continue using the legacy 'dbt' Transformer, add it to your Project using 'meltano add transformer dbt'.' is not known to Meltano"}

, but I have not added the

dbt

plugin, only

dbt-postgres

b. How can I specify which transformer plugin to run within a schedule. I only find the option to set it to

run

skip

only

? I would like to run different transformers on different schedules. Thank you!

Sven Balnojan

12/06/2022, 2:46 PM

To answer 1 really quick: You can use all existing transformers (all dbt dialects). They work, and we support them. We just don't add new transformers (of a different type than dbt) right now.

pat_nadolny

12/06/2022, 3:24 PM

Also context for 1. Meltano has

transformers

plugins (only dbt as of today) to manage your data transformations after the data lands in the warehouse. Meltano also had a legacy feature called

transforms

which is what that warning message is about. Transforms were meltano specific dbt packages that included pre-built transformations for a specific tap's data that were tightly coupled, we've moved away from that pattern. Although you can still use normal dbt packages in your project.

pat_nadolny

12/06/2022, 3:34 PM

For 2: theres 2 ways to schedule pipelines and it looks like the getting started guide docs are a little unclear right now. If you follow this guide https://docs.meltano.com/guide/orchestration it will show you how to create jobs and schedule them using the new more flexible syntax.

pat_nadolny

12/06/2022, 3:34 PM

Let us know if you run into any issues and we can help resolve!

aaronsteers

12/06/2022, 7:08 PM

A short blurb on this is in our v2 migration guide: https://docs.meltano.com/guide/v2-migration#transform-support-in-meltano-schedules

christopher_kintzel

12/12/2022, 11:13 AM

Thanks everyone, those explanations helped already in understanding it a bit better. I am now trying to replicate my existing setup (without transformer) first, but running into a problem with the env variables in combination with airflow orchestrator. I am setting the

TARGET_POSTGRES_DBNAME

inside my schedule, like:

Copy code

schedules:
- name: my-schedule-name
  job: my-job-name
  env:
    TARGET_POSTGRES_DBNAME: mydbname

This works when i run the schedule with

meltano schedule run my-schedule-name

, but fails when the schedule is run inside the orchestrator (

meltano invoke airflow scheduler

) . Settings this env variable worked with the old EL(T) style schedule in the orchestrator. It also works when i set the

dbname

property directly on the loader, but i want to avoid duplicating (inherit_from) the same postgres loader many times with only a difference in the dbname. Any ideas?

christopher_kintzel

12/14/2022, 12:29 PM

Seems to be a known issue, this describes the problem: https://meltano.slack.com/archives/C01TCRBBJD7/p1663679958598429

pat_nadolny

12/14/2022, 1:32 PM

@christopher_kintzel sounds like that thread explains it - with the new job/tasks syntax we opted to split each job into its own Airflow task vs lumping them all into a single task like the older elt style. It still seems like a good idea to split out those tasks. Have you tried the suggested tweak to the dag generator from https://github.com/meltano/files-airflow/issues/32#issue-1379590061? That seems like a reasonable solution. I created a PR for that change https://github.com/meltano/files-airflow/pull/36

christopher_kintzel

12/14/2022, 1:41 PM

Thanks. Yes, I am currently in process of trying this workaround. Had to additionally use a newer version of airflow, min 2.3.4 (

append_env

is not available in previous versions). Is there a specific reason meltano uses 2.1.2 by default? But I am now getting this error in the log:

Environment variable 'MELTANO_LOAD_SCHEMA' referenced but not set. Make sure the environment variable is set.

I have actually seen this error multiple times already while trying different unrelated things as well, but could not find any good information about it online 🙂

christopher_kintzel

12/15/2022, 9:59 AM

I was on the wrong meltano version. I can confirm, the tweak works with airflow 2.3.4.

christopher_kintzel

12/15/2022, 10:00 AM

I am currently on meltano==2.8.0. Unrelated to this fix, my current setup stops working when upgrading to 2.10.0 or 2.11.1 because of the

MELTANO_LOAD_SCHEMA

issue.

pat_nadolny

12/15/2022, 2:41 PM

Hmm I was able to find https://github.com/meltano/meltano/issues/2928 which sounds like it might be related. But it was closed a year ago so I wouldnt think it would be an issue in 2.10.0 🤔 . Although I'm not sure an exact fix was put in place. I see a note at the top

According to @DouweM, this can occur if a custom loader that doesn't define a schema setting, so the env var doesn't get populated and dbt doesn't know what schema to read from.

that seems relevant

Open in Slack

Previous Next