I m using `tap postgres` meltanolabs variant and running the Meltano #troubleshooting

I'm using `tap-postgres` (meltanolabs variant), an...

bubba

10/25/2023, 3:17 PM

I'm using

tap-postgres

(meltanolabs variant), and running the extraction takes 2+ hours do to having to fetch the catalog / tap properties. I have limited to the

public

schema using the

filter_schemas

config. How can I bypass the discovery or increase the speed? Is the only solution to manually specify a catalog? Also how long is the default catalog cached for?

visch

10/25/2023, 3:53 PM

That's a crazy long time for that to run! So if you run the tap directly and do something like

Copy code

time tap-postgres --config config.json --discover

The time is 2 hours, that's a crazy long time!? How many tables are in your public schema, I wonder if we could replicate

bubba

10/25/2023, 4:55 PM

I am running the tap now. As far as I can tell the

filter_schemas

only filters after the discovery. i.e. it is fetching schema and its tables. But filters to the public schema after everything is fetched.

bubba

10/25/2023, 4:59 PM

We have a total of

tables/views and the public schema has

visch

10/25/2023, 5:46 PM

iirc it does only query the schema you filtered to but I don't think it'd make much of a difference here with 3452 tables/views in a single schema. Sounds to me like we'd want a way to limit the tables were generating the catalog with, or speed up the catalog creation for this. That's a very high number hmm

visch

10/25/2023, 5:47 PM

Easiest solution today would be to add some kind of filtering that would allow you to filter down to just tables/views you want (no just a schema filter) so we could get down to something reasonable like 300 some tables. Just an idea. Also pulling 4,451 catalogs doesn't seem like it should take 2 hours

bubba

10/25/2023, 6:29 PM

Running the tap outside of meltano (i.e. i pulled the repo and ran it directly) I see that it does filter out the schemas before fetching. But using the version i installed with meltano it doesn't seam to filter.

tap-postgres --config config.json --discover  59.60s user 10.66s system 1% cpu 1:21:01.21 total

Looking at the output of the command I see tables from other schemas. I have

v0.0.2

installed with meltano. As for a temporary work around I modified the code to fetch the catalog for my table and am passing it in directly. It looks like its taking about a second to pull the schema per table.

visch

10/25/2023, 6:32 PM

hmm the fact that it's scaling linearly seems like the tap could be much more efficient with this

visch

10/25/2023, 6:41 PM

Could you throw an issue into the tap?

bubba

10/25/2023, 6:43 PM

Yes I can do that!

bubba

10/25/2023, 6:49 PM

I upgraded my tap version and it looks like it's properly filtering out the schemas. But it is still very slow. I will make an issue to increase the efficiency. 👍

bubba

11/02/2023, 3:34 PM

GitHub Issue

Open in Slack

Previous Next