Hi I ve been stuck with a consistent error running `meltano Meltano #troubleshooting

Hi, I've been stuck with a consistent error runnin...

jerry_deng

11/29/2022, 1:14 AM

Hi, I've been stuck with a consistent error running

meltano run tap-postgres target-bigquery

or similar command. This error isn't too helpful. Looks like ORM session is lost during streaming. I tried increasing buffer and a bunch other tweaks. Any direction is appreciated

christoph

11/29/2022, 1:34 AM

All those colourful boxes and squares and codesections actually are not what you would need to look for. Basically what you need to do is scroll up all the way to the beggining of these colourful box output and then look at the text lines preceding those coloured boxes. (A bit unintuitive, I know) I think you can disable these couloured boxes by setting

NO_COLOR=1

as a shell variable now with meltano 2.10 and higher.

pat_nadolny

11/29/2022, 2:17 AM

It also looks like it’s a problem with the extractor so you can try running it with invoke to get more details. You can also try log level debug to get more logs

jerry_deng

11/29/2022, 2:25 AM

Copy code

time=2022-11-29 02:21:41 name=singer level=INFO message=METRIC: {"type": "counter", "metric": "record_count", "value": 21186, "tags": {}}
time=2022-11-29 02:21:41 name=tap_postgres level=CRITICAL message=connection already closed
Traceback (most recent call last):
  File "/project/.meltano/extractors/tap-postgres/venv/lib/python3.9/site-packages/tap_postgres/sync_strategies/full_table.py", line 144, in sync_table
    for rec in cur:
  File "/project/.meltano/extractors/tap-postgres/venv/lib/python3.9/site-packages/psycopg2/extras.py", line 120, in __iter__
    yield next(res)
psycopg2.OperationalError: terminating connection due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
SSL connection has been closed unexpectedly

meltano --log-level=debug invoke tap-postgres

did reveal more of what happened; not sure what it implies though

christoph

11/29/2022, 3:21 AM

You are reading from a hot standby replica. You'll need to try and avoid query conflicts.

christoph

11/29/2022, 3:22 AM

https://www.postgresql.org/docs/14/hot-standby.html#HOT-STANDBY-CONFLICT

jerry_deng

11/29/2022, 3:53 AM

Thank you so much Christoph. Definitely learnt something today. I am most likely going to see what my options are. Primary can impact replica is not something I can rely on. This is a GCP CloudSQL snapshot, there might be some other options of replica there. Thank you again

christoph

11/29/2022, 3:57 AM

This is a GCP CloudSQL snapshot.

Ah. Interesting. I'm not familiar with google's version. They may have options to configure the postgres WAL delay settings on their replicas. Alex from harness uses GCP, but I think mainly BigQuery. He may have some input.

christoph

11/29/2022, 4:01 AM

A quick scan of the instance creation options doesn't really show anything where you would configure WAL delays .... https://cloud.google.com/sdk/gcloud/reference/sql/instances/create

jerry_deng

11/29/2022, 4:01 AM

That's what I'm finding too

jerry_deng

11/29/2022, 4:01 AM

And I don't want to hit Primary for ELT regularly

christoph

11/29/2022, 4:02 AM

I wonder what the difference between

FAILOVER

and

READ

replica-type is though ... it's the most suspiciously looking option ... https://cloud.google.com/sdk/gcloud/reference/sql/instances/create#--replica-type

christoph

11/29/2022, 4:04 AM

Doesn't really provide a lot of details ... 🤷 https://cloud.google.com/sql/docs/postgres/replication/create-replica

christoph

11/29/2022, 4:08 AM

Hmmm ... https://cloud.google.com/sql/docs/postgres/replication/replication-lag#long-running_queries_in_the_read_replica

christoph

11/29/2022, 4:09 AM

Consider adjusting the
max_standby_archive_delay
and
max_standby_streaming_delay
flags for your replica.

Seems like it's possible somehow in Cloud SQL

jerry_deng

11/29/2022, 4:09 AM

That's what it sounds like.

christoph

11/29/2022, 4:12 AM

I think it's this 'instance flags' thing ... the CLI doesn't seem to expose a dedicated command to adjust the flags. Might just be in the API or GUI Console. https://cloud.google.com/sdk/gcloud/reference/sql/flags/list

christoph

11/29/2022, 4:15 AM

Ey. Found it ...

gcloud sql instances patch

https://cloud.google.com/sql/docs/postgres/flags#gcloud

christoph

11/29/2022, 4:16 AM

https://cloud.google.com/sql/docs/postgres/flags#postgres-m

jerry_deng

11/29/2022, 4:16 AM

You rock dude. Thank you so much

christoph

11/29/2022, 4:16 AM

I am now Google Cloud certified. 😂

christoph

11/29/2022, 4:16 AM

I have used GCP many many years ago though. So that helps.

jerry_deng

11/29/2022, 4:39 AM

I just ran a job and it's working

christoph

11/29/2022, 4:51 AM

Amazing!

jerry_deng

12/01/2022, 3:00 PM

After that, Loader was failing and it turns out OOMKilled on the container it ran on; bumped it to double, the manual jobs are finishing properly. Now I'm faced with

Copy code

[2022-12-01 14:35:14,964] {dagbag.py:496} INFO - Filling up the DagBag from /project/orchestrate/dags/meltano.py
[2022-12-01 14:35:19,042] {meltano.py:193} INFO - Received meltano v2 style schedule export: {'schedules': {'job': [{'name': 'postgres-to-bq-schedule', 'interval': '0 0/2 * * *', 'cron_interval': '0 0/2 * * *', 'env': {}, 'job': {'name': 'tap-postgres-to-bigquery', 'tasks': ['tap-postgres target-bigquery']}}, {'name': 'postgres-to-ds-bq-schedule', 'interval': '30 0/2 * * *', 'cron_interval': '30 0/2 * * *', 'env': {}, 'job': {'name': 'tap-postgres-to-ds-bigquery', 'tasks': ['tap-postgres target-ds-bigquery']}}], 'elt': []}}
[2022-12-01 14:35:19,043] {meltano.py:146} INFO - Considering task 'tap-postgres target-bigquery' of schedule 'postgres-to-bq-schedule': {'name': 'postgres-to-bq-schedule', 'interval': '0 0/2 * * *', 'cron_interval': '0 0/2 * * *', 'env': {}, 'job': {'name': 'tap-postgres-to-bigquery', 'tasks': ['tap-postgres target-bigquery']}}
[2022-12-01 14:35:19,045] {meltano.py:165} INFO - Spun off task '<Task(BashOperator): meltano_postgres-to-bq-schedule_tap-postgres-to-bigquery_task0>' of schedule 'postgres-to-bq-schedule': {'name': 'postgres-to-bq-schedule', 'interval': '0 0/2 * * *', 'cron_interval': '0 0/2 * * *', 'env': {}, 'job': {'name': 'tap-postgres-to-bigquery', 'tasks': ['tap-postgres target-bigquery']}}
[2022-12-01 14:35:19,045] {meltano.py:170} INFO - DAG created for schedule 'postgres-to-bq-schedule', task='tap-postgres target-bigquery'
[2022-12-01 14:35:19,046] {meltano.py:146} INFO - Considering task 'tap-postgres target-ds-bigquery' of schedule 'postgres-to-ds-bq-schedule': {'name': 'postgres-to-ds-bq-schedule', 'interval': '30 0/2 * * *', 'cron_interval': '30 0/2 * * *', 'env': {}, 'job': {'name': 'tap-postgres-to-ds-bigquery', 'tasks': ['tap-postgres target-ds-bigquery']}}
[2022-12-01 14:35:19,046] {meltano.py:165} INFO - Spun off task '<Task(BashOperator): meltano_postgres-to-ds-bq-schedule_tap-postgres-to-ds-bigquery_task0>' of schedule 'postgres-to-ds-bq-schedule': {'name': 'postgres-to-ds-bq-schedule', 'interval': '30 0/2 * * *', 'cron_interval': '30 0/2 * * *', 'env': {}, 'job': {'name': 'tap-postgres-to-ds-bigquery', 'tasks': ['tap-postgres target-ds-bigquery']}}
[2022-12-01 14:35:19,046] {meltano.py:170} INFO - DAG created for schedule 'postgres-to-ds-bq-schedule', task='tap-postgres target-ds-bigquery'
Running <TaskInstance: meltano_postgres-to-ds-bq-schedule_tap-postgres-to-ds-bigquery.meltano_postgres-to-ds-bq-schedule_tap-postgres-to-ds-bigquery_task0 2022-12-01T12:30:00+00:00 [queued]> on host airflow-scheduler-deployment-b955b69fb-7dpjq
[2022-12-01 14:35:26,147] {scheduler_job.py:1218} INFO - Executor reports execution of meltano_postgres-to-ds-bq-schedule_tap-postgres-to-ds-bigquery.meltano_postgres-to-ds-bq-schedule_tap-postgres-to-ds-bigquery_task0 execution_date=2022-12-01 12:30:00+00:00 exited with status success for try_number 2
[2022-12-01 14:35:30,423] {dagrun.py:429} ERROR - Marking run <DagRun meltano_postgres-to-ds-bq-schedule_tap-postgres-to-ds-bigquery @ 2022-12-01 12:30:00+00:00: scheduled__2022-12-01T12:30:00+00:00, externally triggered: False> failed

I wonder if this has to do with spinning up new worker container to run the job? It's a helm chart I used to deployed this set up. https://gitlab.com/meltano/infra/helm-meltano/-/tree/master/airflow

Open in Slack

Previous Next