fauzaan
06/14/2023, 9:08 AMjobs.yml
(a file where we have our predefined pipelines configuration inside ./config
) for specific environments only? I.e., we want to be able to have a universal schedule for all the DAGs, but have a specific schedule for one of the environments (in our case staging
) only. Is it possible to do this without having to edit meltano.py
?khoa_nguyen
06/14/2023, 9:09 AMfauzaan
06/14/2023, 10:35 AM./config/jobs.yml
file contains something like this
jobs:
...
schedules:
- name: xxx1
interval: '*/15 * * * *'
job: job1
env:
...
- name: xxx2
interval: '*/30 * * * *'
job: job2
env:
...
And ./config/environment/staging.meltano.yml
file:
environments:
- name: staging
config:
plugins:
extractors:
...
loaders:
...
transformers:
...
orchestrators:
...
env:
...
And we want the staging
environment file to be able to override the schedule
, and more specifically the interval
defined in jobs.yml
~ something like this:
environments:
- name: staging
config:
plugins:
extractors:
...
loaders:
...
transformers:
...
orchestrators:
...
env:
...
schedules:
- name: xxx1
interval: '*/15 1-5 * * 1-3' # overridden schedule for this env
job: job1
env:
...
- name: xxx2
interval: '*/30 1-5 * * 1-3' # overridden schedule for this env
job: job2
env:
...
But currently this is not possible as the DAG creation fails all together. Or it throws a duplicate schedule error if we don't nest it within environment:
.pat_nadolny
06/19/2023, 3:51 PMAndy Carter
06/20/2023, 10:05 PMschedules:
- name: ga_schedule
job: ga_sync
interval: '${SCHEDULE_CRON}'
environments:
- name: dev
env:
SCHEDULE_CRON: '@daily'
- name: prod
env:
SCHEDULE_CRON: '@hourly'
but it doesn't seem to parse. I typically would run a daily update in dev around 3am, and the production hourly between 6am and 8pm.huiming
06/21/2023, 8:37 AMpat_nadolny
06/21/2023, 2:10 PMfauzaan
06/21/2023, 4:35 PMschedules:
- name: xxx1
...
interval : @daily
environments: [staging]
- name: xxx1
...
interval : @hourly
environments: [production]
# The above have the same name, but different environment and schedules, not sure if this will be possible if the linked approach is adopted.
⢠https://github.com/meltano/meltano/issues/6848 - not sure if this is still being pursued
This PR seems to address our problem directly, but it is currently drafted at the moment. Is it possible to get any status updates on this/ if it's still being pursued?
Can I also confirm if the PR linked above indeed directly addresses the recommended changes in this comment?fauzaan
06/30/2023, 8:47 AMuser
06/30/2023, 1:47 PMfrom the discussion here, it seems that the environment specific tag is only available for meltano-cloud (please confirm)I'm not sure about this question, I wouldnt read into that comment too deeply, that was related specifically to how the cloud backend worked in Beta around schedules and deployments. A lot has changed since then and that is not part of meltano core.
Another concern that I had with the approach discussed here...Can you put your thoughts in that issue? When the issue is worked on they wont have your context from this slack thread so we want to make sure its being considered in the design.
Can I also confirm if the PR linked above indeed directly addresses the recommended changes in this comment?This PR is only specific to the airflow extension https://github.com/meltano/airflow-ext/pull/36 and how it consumes the meltano schedules, whereas it sounds like the feature youre after needs to be supported by meltano core itself. From reviewing all of these it does seem like they want to pursue what is described in https://github.com/meltano/meltano/issues/6853?cf_lbyyhhwhyjj5l3rs65cb3w=zpc7iil3ewo5hgctl91zqq#issuecomment-1301373550 and that makes sense to me.