Is there a best way to manually trigger a pipeline...
# troubleshooting
g
Is there a best way to manually trigger a pipeline (has a dag and orchestrated through airflow)? I've noticed when I trigger using airflow in the meltano CLI it does not sync up with the Meltano UI. I am hoping to have Airflow UI and Meltano UI synced up, but want to manually trigger a pipeline (another pipeline is currently running). Would it be best to do so through Meltano UI? That is what I was going to try next
n
I don’t believe the meltano UI and airflow UI automatically sync up (at least, I haven’t encountered that in my testing, which sounds similar to yours). As for manual pipeline triggers, I’ve had success doing that in the airflow UI directly. In the view that lists all your DAGs, there’s a column on the far right with lots of icons. I forget which one specifically, but one of them is for manual triggering of jobs.
g
Perfect, that's what I've tried but ended up resetting it because I wasn't sure why it wasn't syncing... Thanks for the clarity! I am currently working through trying to figure out why a pipeline is loading data even though it has been deleted and the DAG was deleted.
@nick_hamlin Do you know if it is necessary to include anything in the Configuration JSON? This is the page that is brought up after selecting "Trigger DAG"
n
I haven’t been doing that and it’s been working just fine 🙂
g
Ah just leaving blank? Awesome! Sounds good 🙂
n
One idea: airflow is always running a process in the background that checks for DAGs it doesn’t know about. So, if even if you delete the DAG via the UI, airflow will add it back in automatically unless you also remove the underlying python code
g
When removing the DAG I believe I ran the airflow CLI command which deletes all files/references/occurrences related to that DAG. However, previous to that I believe I might have deleted a DAG via the UI which could be causing this issue. Thanks for the info! Wasn't totally sure about that beforehand.
hmm, I've been waiting to trigger the DAG to see if the table is still being generated and populated without it running (and old instance I think). It seems as thought that one of my old DAGS that have been deleted is still loading data. This issue has been ongoing and has occurred before
d
@nick_hamlin @gunnar What do you mean when you say the Airflow and Meltano UIs aren't synced up? Triggering a DAG in Airflow should make the same pipeline go to "running" in Meltano UI, and you should be able to view the logs from Meltano UI as well
n
A major reason I’ve leaned towards doing things in airflow rather than meltano directly is that I have greater control over the parameters I can include in a given job (mostly getting really specific with
--select
for different pipelines within a single tap). This means that I don’t have corresponding jobs appearing in the meltano UI for the pipelines I’ve written directly in airflow. And (at least as far as I know), because nothing is appearing under the pipelines tab, I also don’t have a way to see those logs through the meltano UI (which is what led to exploration of the airflow logs in the other thread)
d
Ah right, Meltano UI's Pipelines will only list pipelines from the
schedules
section in
meltano.yml
. If you're creating pipelines on the fly with dynamic
job_id
s, they won't show up there.
e
@nick_hamlin, I'm interested in how you are managing this. I've just gotten started with Meltano and I wound up having "base extractors" which define connections to various databases. From there, I have "selection extractors" which inherit from the base and each one has a different set of select's set so that each "selection extractor" is pulling different tables from the db.
n
That seems like it would totally work, though I’m doing it slightly differently. I have the one extractor with all the selects set up. From there, I run separate
meltano elt
commands with different
--select
args when to orchestrate different subsets of the tables/fields handled by that extractor.
Sounds like the end result would be the same as yours (and I may wind up refactoring to something that more closely resembles what you’ve got down the road, but I’ve found it nice to have the flexibility to modify jobs at the
meltano elt
level instead of needing to make changes to
meltano.yml
everytime because we’re still early on in our implementation and lots is changing rapidly
e
Gotcha.... I'm happy with how mine is working, although, another approach that I'd consider now that I've seen airflow in action: Define an extractor for each table and then use 'pools' in airflow to control how many of them execute at once.