Running into an issue where a `meltano run` command is faili Meltano #troubleshooting

Running into an issue where a `meltano run` comman...

mark_gordon

05/02/2023, 2:45 AM

Running into an issue where a

meltano run

command is failing after seemingly successfully performing the requested extraction and DBT step. The error below is shown without any context. We are using a postgres cluster for both the meltano state and extracted/dbt tables.

Copy code

... normal output above ...
2023-05-01T23:56:10.389373Z [info     ] 23:56:10  Finished running 21 view models, 1 table model, 4 incremental models in 0 hours 8 minutes and 22.61 seconds (502.61s). cmd_type=command name=dbt-postgres stdio=stderr
2023-05-01T23:56:10.424722Z [info     ] 23:56:10                   cmd_type=command name=dbt-postgres stdio=stderr
2023-05-01T23:56:10.426245Z [info     ] 23:56:10  Completed successfully cmd_type=command name=dbt-postgres stdio=stderr
2023-05-01T23:56:10.427779Z [info     ] 23:56:10                   cmd_type=command name=dbt-postgres stdio=stderr
2023-05-01T23:56:10.429591Z [info     ] 23:56:10  Done. PASS=26 WARN=0 ERROR=0 SKIP=0 TOTAL=26 cmd_type=command name=dbt-postgres stdio=stderr
2023-05-01T23:56:10.644030Z [info     ]                                cmd_type=command name=dbt-postgres stdio=stderr
2023-05-01T23:56:10.644438Z [info     ]                                cmd_type=command name=dbt-postgres stdio=stderr
Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
join our friendly Slack community.

(psycopg2.OperationalError) SSL connection has been closed unexpectedly

(Background on this error at: <https://sqlalche.me/e/14/e3q8>)

It looks like our postgres connection was dropped for whatever reason. I don’t see our postgres cluster as having any relevant timeouts that I think could be causing this, however (I only see a transaction idle timeout of 60s). Any ideas on how to debug what’s going on here?

aaron_phethean

05/02/2023, 8:31 AM

Looks like the Meltano system db, trying to write the job history perhaps? You could use a local system db to narrow down the problem. If the error goes away then you know it’s the job history write. Assuming it is the job history and the connection being closed by Postgres- you could prove this by running a job with a utility that just sleeps for 8 minutes. (that appears to be how long your dbt transforms are running)

mark_gordon

05/02/2023, 8:05 PM

thanks for the suggestion. It appears that it actually was the idle transaction timeout triggering the issue. I did a few more test runs and could see a connection from meltano that was stuck in the “idle in transaction” state until disappearing. After removing that timeout it seems things are working. That said, this seems like something that meltano shouldn’t be doing.

aaron_phethean

05/02/2023, 8:12 PM

Interesting. I’m confident a bug report would be very welcome. We use a few elephant sql instances, where there’s no idle timeout. Was yours a self hosted postgres?

mark_gordon

05/02/2023, 8:14 PM

It’s hosted on AWS RDS as an Aurora PostgreSQL cluster. I think the setting I started with were the default settings on AWS.

aaron_phethean

05/02/2023, 8:14 PM

Good to know!

mark_gordon

05/02/2023, 8:19 PM

Just looked through our configs, it looks like we explicitly set the timeout and it’s not the default

aaron_phethean

05/02/2023, 8:23 PM

Oops. Short transactions, and releasing resources asap, is a good idea for OLTP. Can’t think of much benefit for OLAP.

Open in Slack

Previous Next