Hi, my team is currently making use of three envir...
# troubleshooting
f
Hi, my team is currently making use of three environments- dev, staging and production. As such, we want to set different cron schedules for some of the DAGs in some environments. Is there a way to override the schedule defined in
jobs.yml
(a file where we have our predefined pipelines configuration inside
./config
) for specific environments only? I.e., we want to be able to have a universal schedule for all the DAGs, but have a specific schedule for one of the environments (in our case
staging
) only. Is it possible to do this without having to edit
meltano.py
?
k
cc: @huiming @adrian_soltesz @leonardo_sjahputra
f
For more context: The
./config/jobs.yml
file contains something like this
Copy code
jobs:
...
schedules:
- name: xxx1
  interval: '*/15 * * * *'
  job: job1
  env:
    ...
- name: xxx2
  interval: '*/30 * * * *'
  job: job2
  env:
    ...
And
./config/environment/staging.meltano.yml
file:
Copy code
environments:
- name: staging
  config:
    plugins:
      extractors:
        ...
      loaders:
        ...
      transformers:
        ...
      orchestrators:
        ...

  env:
      ...
And we want the
staging
environment file to be able to override the
schedule
, and more specifically the
interval
defined in
jobs.yml
~ something like this:
Copy code
environments:
- name: staging
  config:
    plugins:
      extractors:
        ...
      loaders:
        ...
      transformers:
        ...
      orchestrators:
        ...

  env:
      ...
  schedules:
  - name: xxx1
    interval: '*/15 1-5 * * 1-3' # overridden schedule for this env
    job: job1
    env:
     ...
  - name: xxx2
    interval: '*/30 1-5 * * 1-3' # overridden schedule for this env
    job: job2
    env:
      ...
But currently this is not possible as the DAG creation fails all together. Or it throws a duplicate schedule error if we don't nest it within
environment:
.
p
@fauzaan I tested a few variations of this as well but couldnt get it to work. It might not be supported. Would you mind opening an issue in the meltano repo to explain your use case? Theres a chance that theres a workaround that I dont know of i.e. overriding via env vars somehow šŸ¤” but either way it could be a good feature to support
a
Just came across this myself, was hoping to achieve something like this:
Copy code
schedules:
  - name: ga_schedule
    job: ga_sync
    interval: '${SCHEDULE_CRON}'
environments:
  - name: dev
    env: 
      SCHEDULE_CRON: '@daily'
  - name: prod
    env:
      SCHEDULE_CRON: '@hourly'
but it doesn't seem to parse. I typically would run a daily update in dev around 3am, and the production hourly between 6am and 8pm.
h
We need what @Andy Carter suggested too! šŸ‘
p
Did anyone find an existing issue for this feature? If not, would one of you mind creating a new issue so everyone can share their thoughts and use cases
f
Thank you for your replies, I appreciate it! I have found a couple of issues that target this problem: • https://github.com/meltano/meltano/issues/6853?cf_lbyyhhwhyjj5l3rs65cb3w=zpc7iil3ewo5hgctl91zqq - from the discussion here, it seems that the environment specific tag is only available for meltano-cloud (please confirm). Another concern that I had with the approach discussed here is that if you define schedules in this way, then will we be able to define two jobs with the same names, but different schedules and different environments? That is, can we have something like this:
Copy code
schedules:
   - name: xxx1
    ...
    interval : @daily
    environments: [staging]
   - name: xxx1
    ...
    interval : @hourly
    environments: [production]
# The above have the same name, but different environment and schedules, not sure if this will be possible if the linked approach is adopted.
• https://github.com/meltano/meltano/issues/6848 - not sure if this is still being pursued This PR seems to address our problem directly, but it is currently drafted at the moment. Is it possible to get any status updates on this/ if it's still being pursued? Can I also confirm if the PR linked above indeed directly addresses the recommended changes in this comment?
Hello! Are there any updates on this @pat_nadolny?
u
@fauzaan I'm not super in tune with the road map and how all this will be prioritized (cc @taylor) but I'll do my best to clarify some of the questions you asked:
from the discussion here, it seems that the environment specific tag is only available for meltano-cloud (please confirm)
I'm not sure about this question, I wouldnt read into that comment too deeply, that was related specifically to how the cloud backend worked in Beta around schedules and deployments. A lot has changed since then and that is not part of meltano core.
Another concern that I had with the approach discussed here...
Can you put your thoughts in that issue? When the issue is worked on they wont have your context from this slack thread so we want to make sure its being considered in the design.
Can I also confirm if the PR linked above indeed directly addresses the recommended changes in this comment?
This PR is only specific to the airflow extension https://github.com/meltano/airflow-ext/pull/36 and how it consumes the meltano schedules, whereas it sounds like the feature youre after needs to be supported by meltano core itself. From reviewing all of these it does seem like they want to pursue what is described in https://github.com/meltano/meltano/issues/6853?cf_lbyyhhwhyjj5l3rs65cb3w=zpc7iil3ewo5hgctl91zqq#issuecomment-1301373550 and that makes sense to me.