Export from a modest Postrgres database (450GB) an...
# best-practices
c
Export from a modest Postrgres database (450GB) and import into BigQuery (foregoing transformation) is taking about 32 hours, but not because of insufficient resources... is there any foreseeable problem with splitting tables of the same DB between two taps (and concomitant pipelines) and then running them both in parallel?
a
No problem with that at all. It's actually a good practice. You can use the
inherit_from
option as in our hub project here to declare multiple instances of the same tap. The only thing to watch out for is, if you later want to join them back together as a single job, you may need to manually merge their states. Also, just make sure you use a job_id so that state will be captured for each running instance. @douwe_maan - Do I have the above correct regarding a merge of states? I imagine you'd need distinct job_ids, because two jobs running simultaneously with the same job_id would probably clobber each other, but I actually haven't tested that myself.
d
Yep, each parallel pipeline should have its own job_id, but you can manually merge the state JSON dicts later on. https://gitlab.com/meltano/meltano/-/issues/2727 will basically automate this “splitting up over multiple tap/target combinations” process
c
Ah, great. Thanks for the advice regarding job states; it would have tripped me up.