I'm using `tap-postgres` (meltanolabs variant), an...
# troubleshooting
b
I'm using
tap-postgres
(meltanolabs variant), and running the extraction takes 2+ hours do to having to fetch the catalog / tap properties. I have limited to the
public
schema using the
filter_schemas
config. How can I bypass the discovery or increase the speed? Is the only solution to manually specify a catalog? Also how long is the default catalog cached for?
v
That's a crazy long time for that to run! So if you run the tap directly and do something like
Copy code
time tap-postgres --config config.json --discover
The time is 2 hours, that's a crazy long time!? How many tables are in your public schema, I wonder if we could replicate
b
I am running the tap now. As far as I can tell the
filter_schemas
only filters after the discovery. i.e. it is fetching schema and its tables. But filters to the public schema after everything is fetched.
We have a total of
4451
tables/views and the public schema has
3452
v
iirc it does only query the schema you filtered to but I don't think it'd make much of a difference here with 3452 tables/views in a single schema. Sounds to me like we'd want a way to limit the tables were generating the catalog with, or speed up the catalog creation for this. That's a very high number hmm
Easiest solution today would be to add some kind of filtering that would allow you to filter down to just tables/views you want (no just a schema filter) so we could get down to something reasonable like 300 some tables. Just an idea. Also pulling 4,451 catalogs doesn't seem like it should take 2 hours
b
Running the tap outside of meltano (i.e. i pulled the repo and ran it directly) I see that it does filter out the schemas before fetching. But using the version i installed with meltano it doesn't seam to filter.
tap-postgres --config config.json --discover  59.60s user 10.66s system 1% cpu 1:21:01.21 total
Looking at the output of the command I see tables from other schemas. I have
v0.0.2
installed with meltano. As for a temporary work around I modified the code to fetch the catalog for my table and am passing it in directly. It looks like its taking about a second to pull the schema per table.
v
hmm the fact that it's scaling linearly seems like the tap could be much more efficient with this
Could you throw an issue into the tap?
b
Yes I can do that!
I upgraded my tap version and it looks like it's properly filtering out the schemas. But it is still very slow. I will make an issue to increase the efficiency. 👍