Hey all! I am using airflow to orchestrate my sche...
# troubleshooting
b
Hey all! I am using airflow to orchestrate my scheduled meltano jobs, however it seems that environment variables set on schedules in meltano.yml are ignored. I pasted the relevant parts of my meltano.yml below. When I run the schedule using
meltano schedule run extract-load
, everything works as expected. I found that for elt schedules, airflow calls
meltano schedule run
, but for job schedules, airflow calls
meltano run
, ignoring all schedule-level settings. Is there any way to define which dbt models I want to run per schedule? Or am I using it wrong?
Copy code
version: 1
plugins:
  extractors: [...]
  orchestrators: [...]
  transformers: [...]
  files: [...]
schedules:
  - name: extract-load
    interval: 0 0 * * *
    job: EL
    env:
      DBT_MODELS: calendar appdata woocommerce shopify
jobs:
  - name: EL
    tasks:
      - tap-calendar target-postgres
      - tap-appdata target-postgres
      - tap-woocommerce target-postgres
      - tap-shopify target-postgres
      - dbt:run
environments: [...]
t
That should work and if it doesn’t then you’ve found a bug. Can you share which version of Meltano you’re on? @florian.hines nothing about
run
should be ignoring a schedule env like that, right?
b
I'm on v2.4.0. I think the problem lies in the way jobs are executed in airflow. I took a look at the auto-generated
orchestrate/dags/meltano.py
and for job-type schedules, the command is not
meltano schedule run
which would respect env vars set on that level, but instead
meltano run <task>
f
@taylor I think that's expected behavior - it pop'd up during one of the discussions when we where adding support for scheduled jobs and was one of the reasons originally why I'd started to roll with still using
meltano schedule run
@benjamin_mitzkus yep meltano run doesn't know , its a being invoked as part of a schedule, just that its being asked to run a job.
@taylor @benjamin_mitzkus it be pretty easy to extend the dag generator to inject the env I think, it does get returned when the generator grabs the schedule listing.
t
@florian.hines do we already have an issue on that? At face value it does break the contract of
env
on schedules in an unexpected way. I totally get why we have the current state though.
f
no issue on it but i'll go pop one. Its a super low weight change.
b
@florian.hines @taylor Thanks for your help, really appreciate the support! For now, overwriting the generated
orchestrate/dags/meltano.py
in a way that
meltano schedule run
is invoked should be a workaround, right?
t
@benjamin_mitzkus yep!
f
@benjamin_mitzkus if you're feeling adventurous, I think the change we'd be making in the operator is actually a pretty simple one. Probably just changing the BashOperator call to look like:
Copy code
task = BashOperator(
    task_id=task_id,
    bash_command=f"cd {PROJECT_ROOT}; {MELTANO_BIN} run {run_args}",
    dag=dag,
    env=schedule.get("env", {}),
    append_env=True,
)
with
env
and
append_env
being the two new additions.
b
@florian.hines I'll take a look this evening 😃
f
t
@florian.hines do we need that on the extension as well?