I have ~200 tables, configured to incrementally up...
# troubleshooting
m
I have ~200 tables, configured to incrementally update under same tap-postgres configuration. Then I simply created a job and a schedule, and meltano created a dag with a single task on airflow. No problem so far. It works. I've been thinking... These tables don't need to wait each other but they do get extracted sequentially because they are in the same task. Also, if there is an error in one of them others can continue, but they don't. Wouldn't it be better to consider each table as seperate entities and create independent tasks for each?
v
yes you can run them independently . select_filter env variable helps you here. Pat made an example of generating a dag automatically on ariflow for this as well.
m
I don't think I get it :D Can you point me an example?
m
Thanks, I'll look into it. I think I have been using option1 since my schedules are converted into a single dag.
r
@mert_bakir any luck with this? I would like to know it too. I also want to prevent this issue too "Also, if there is an error in one of them others can continue, but they don't."
m
not really @rida, I just had a chance to look into it. dbt generator method looks promising but it's kind of complicated, I couldn't figure it out. 😅
v
Instead of using the generator the way I think about it is 1. Setup your normal meltano project with the 200 tables working serially 2. Run the meltano run in the same way but using https://docs.meltano.com/concepts/plugins/#select_filter-extra via an env var ie
TAP_GITLAB__SELECT_FILTER=users
3. Does it work? Cool, now you just need a way to make that run with your 200 tables
m
can't say it's a bad workaround. I think I'll write a script to do that.
v
Automating this on the meltano side is pretty hard, doable but everyone needs to optimize it differently. Doing it for one usecase isn't so bad!