I m having an issue with a Postgres connector timing out and Meltano #troubleshooting

I'm having an issue with a Postgres connector timi...

jacob_mulligan

02/15/2024, 1:36 PM

I'm having an issue with a Postgres connector timing out and I want to make sure I'm understanding the issue correctly. The connection times out regularly, I think because the sync is just taking a long time due to large volume of data.

Copy code

tap-postgres.public-ecm_profiles | An unhandled error occurred while syncing 'public-ecm_profiles' 
  File "/home/runner/work/analytics-etl/analytics-etl/.***/extractors/tap-postgres/venv/lib/python3.10/site-packages/sqlalchemy/engine/cursor.py", line [1066](...), in _buffer_rows 
    new_rows = dbapi_cursor.fetchmany(size) 
psycopg2.OperationalError: SSL connection has been closed unexpectedly

Googling the issue leads me to think that I would ideally adjust the SQLAlchemy connection to include

pool_pre_ping=True

as a param to check that the connection is still alive before attempting new connections:

Copy code

engine = create_engine("<mysql+pymysql://user:pw@host/db>", pool_pre_ping=True)

The problem I'm running into is that Meltano's Postgres tap doesn't expose this parameter. For context, I'm pulling data from an AWS RDS postgres db. What are my options for getting this Meltano sync working again?

Edgar Ramírez (Arch.dev)

02/15/2024, 7:49 PM

The problem I'm running into is that Meltano's Postgres tap doesn't expose this parameter. For context, I'm pulling data from an AWS RDS postgres db. What are my options for getting this Meltano sync working again?

The simplest would be sending MeltanoLabs/target-postgres a PR to create the engine with that flag, or at least an issue describing the request 🙂

jacob_mulligan

02/20/2024, 12:05 PM

I added an issue 3 weeks ago 👀 I tried looking through the source code to add the

pool_pre_ping

flag locally to verify that that would fix the issue, but i couldn't figure out where the right place was to add the pool_pre_ping config.

jacob_mulligan

02/20/2024, 12:05 PM

If you're familiar with the connector, I'd be happy to hop on a screenshare with you if you're open to it to try to add this config? 🙂

Edgar Ramírez (Arch.dev)

02/20/2024, 3:14 PM

@jacob_mulligan Does this look what you need https://github.com/meltano/sdk/issues/2257?

jacob_mulligan

02/20/2024, 3:15 PM

Yeah, exactly that

jacob_mulligan

02/21/2024, 11:31 AM

I saw you put up a PR for this. Thanks Edgar 🙇 I subscribed to the PR to get notified when that merges.

jacob_mulligan

02/23/2024, 3:45 PM

Looks like the PR is merged 🎉 do you know how long it takes for future

meltano install

commands to be able to pull the latest version of the sdk package?

Edgar Ramírez (Arch.dev)

02/23/2024, 3:48 PM

the steps are: 1. ✅ merge the pr 2. publish a new version of singer-sdk to pypi 3. update tap-postgres to use the latest singer-sdk (at this point you could already get the fix if your

pip_url

is a direct git reference to the

main

branch) 4. publish a new version meltanolabs-tap-postgres to pypi

Edgar Ramírez (Arch.dev)

02/23/2024, 3:50 PM

between 1 and 2, I'm gonna publish a prerelease first and test out a few taps and targets with it. So I'll do 2 probably on Tuesday. The others should follow quickly if nothing's broken.

jacob_mulligan

02/23/2024, 3:51 PM

That makes sense. Thanks again 🙇

jacob_mulligan

02/29/2024, 1:21 PM

Hey Edgar, has this made it's way to being published?

Edgar Ramírez (Arch.dev)

02/29/2024, 4:15 PM

There's a pending PR for tap-postgres: https://github.com/MeltanoLabs/tap-postgres/pull/376, and a release after that cc @visch

visch

02/29/2024, 4:37 PM

I'll take a peak soon!

ty 2

visch

03/01/2024, 10:17 PM

Merged, and released https://github.com/MeltanoLabs/tap-postgres/releases/tag/v0.0.9 Let us know if this works for you at @jacob_mulligan I don't have an easy way to test this locally

jacob_mulligan

03/04/2024, 2:24 PM

The sync is still failing for the same reason. Should I need to update this config:

Copy code

extractors:
  - name: tap-postgres-arc-incremental
    inherit_from: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
    config:
      host: ${ARC_DB_HOST}
      user: ${ARC_DB_USER}
      port: ${ARC_DB_PORT}

I have assumed that every time

meltano install

is run in our github action that executes Meltano that we'd fetch the latest version of all taps/targets. But perhaps that's not the case?

jacob_mulligan

03/04/2024, 2:24 PM

I'll specify the version and rerun to see if that does it:

Copy code

- name: tap-postgres-arc-incremental
    inherit_from: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git@v0.0.9>

jacob_mulligan

03/04/2024, 3:06 PM

Unfortunately I'm still getting the same error:

psycopg2.OperationalError: SSL connection has been closed unexpectedly

Copy code

2024-03-04T14:40:20.612742Z [info     ] 2024-03-04 14:40:20,611 | ERROR    | tap-postgres.public-message_channels | An unhandled error occurred while syncing 'public-message_channels' cmd_type=elb consumer=False name=tap-postgres-arc-full-refresh producer=True stdio=stderr string_id=tap-postgres-arc-full-refresh
2024-03-04T14:40:20.612949Z [info     ] Traceback (most recent call last): cmd_type=elb consumer=False name=tap-postgres-arc-full-refresh producer=True stdio=stderr string_id=tap-postgres-arc-full-refresh
2024-03-04T14:40:20.613101Z [info     ]   File "/home/runner/work/analytics-etl/analytics-etl/.***/extractors/tap-postgres-arc-full-refresh/venv/lib/python3.10/site-packages/sqlalchemy/engine/cursor.py", line 1210, in _buffer_rows cmd_type=elb consumer=False name=tap-postgres-arc-full-refresh producer=True stdio=stderr string_id=tap-postgres-arc-full-refresh
2024-03-04T14:40:20.613245Z [info     ]     new_rows = dbapi_cursor.fetchmany(size) cmd_type=elb consumer=False name=tap-postgres-arc-full-refresh producer=True stdio=stderr string_id=tap-postgres-arc-full-refresh
2024-03-04T14:40:20.613388Z [info     ] psycopg2.OperationalError: SSL connection has been closed unexpectedly cmd_type=elb consumer=False name=tap-postgres-arc-full-refresh producer=True stdio=stderr string_id=tap-postgres-arc-full-refresh

👀 1

visch

03/04/2024, 3:14 PM

It depends on how you're installing if it's on a fresh container every time and you're running meltano install then yeah you should be running on the latest version. The docs here state https://docs.sqlalchemy.org/en/14/core/pooling.html#disconnect-handling-pessimistic that pre ping won't fix this if it happens in the middle of a transaction. There's other options that would help make this work, me personally if I were diving in to fix this and wanted a high likelyhood of success I'd need to be able to see this thing failing myself, so that I could replicate it. Then we have to pick a method for handling the failure. Some options 1. Figure out what is dropping the SSL connection on us (flaky internet, timeout in postgres, etc etc) 2. If we can't figure out #1 then it's figuring out a retry mechanism that may work. Which is really dependent on when this is failing and why. Do most queries work and this happens " randomly" or does this failure happen consistently for long running queries.

Edgar Ramírez (Arch.dev)

03/05/2024, 6:03 AM

I'm curious though if forcing a different isolation level would make a difference. Since we're doing a read-only operation, autocommit should be pretty safe. Anyway I plan on giving https://docs.sqlalchemy.org/en/20/faq/connections.html#using-dbapi-autocommit-allows-for-a-readonly-version-of-transparent-reconnect a good read and a quick try this week.

➕ 1

6 Views

Open in Slack

Previous Next