Any tips for diagnosing a `BrokenPipeError` when the stack t Meltano #troubleshooting

Any tips for diagnosing a `BrokenPipeError` when ...

Andy Carter

04/26/2024, 2:37 PM

Any tips for diagnosing a

BrokenPipeError

when the stack trace isn't very revealing? I'm running a bespoke version of tap-spreadsheets-anywhere but I seem to be getting an error in more or less the same place, but I can't tell exactly which file is causing the error. Logs in the thread

Andy Carter

04/26/2024, 2:40 PM

stack trace

st.txt

Andy Carter

04/26/2024, 2:42 PM

I'm not sure if the error is tap or target related but I shall see if I can run individual streams locally to narrow it down

Andy Carter

04/26/2024, 2:56 PM

Going to spin up a local postgres target as looks like tap not the issue

visch

04/26/2024, 5:45 PM

smells like some logs are missing, if you run tap-dynamics and target-postgres directly I bet you'll see the error, then it's just working through what exactly is causing the error to not be printed. My hunch would be the dagster utility but that's a huge guess

Andy Carter

04/26/2024, 7:21 PM

Here's what I got running meltano directly in the container:

Copy code

2024-04-26T16:02:26.368733Z [info     ] INFO Syncing file "msdyn_ocsession/Snapshot/2024-03_1711966124.csv". cmd_type=elb consumer=False name=tap-dynamics producer=True stdio=stderr string_id=tap-dynamics
2024-04-26T16:02:31.108158Z [info     ] time=2024-04-26 16:02:31 name=target_postgres level=INFO message=Loading 5000 rows into 'raw__tap_dynamics."msdyn_ocsession"' cmd_type=elb consumer=True name=target-postgres-small-batch producer=False stdio=stderr string_id=target-postgres-small-batch
2024-04-26T16:02:31.683842Z [info     ] time=2024-04-26 16:02:31 name=target_postgres level=INFO message=Loading into raw__tap_dynamics."msdyn_ocsession": {"inserts": 0, "updates": 5000, "size_bytes": 6528892} cmd_type=elb consumer=True name=target-postgres-small-batch producer=False stdio=stderr string_id=target-postgres-small-batch
2024-04-26T16:02:37.309368Z [info     ] INFO Syncing file "msdyn_ocsession/Snapshot/2024-04_1714144124.csv". cmd_type=elb consumer=False name=tap-dynamics producer=True stdio=stderr string_id=tap-dynamics
2024-04-26T16:02:38.887010Z [info     ] time=2024-04-26 16:02:38 name=target_postgres level=INFO message=Loading 5000 rows into 'raw__tap_dynamics."msdyn_ocsession"' cmd_type=elb consumer=True name=target-postgres-small-batch producer=False stdio=stderr string_id=target-postgres-small-batch
2024-04-26T16:02:39.235047Z [info     ] time=2024-04-26 16:02:39 name=target_postgres level=INFO message=Loading into raw__tap_dynamics."msdyn_ocsession": {"inserts": 0, "updates": 5000, "size_bytes": 6524929} cmd_type=elb consumer=True name=target-postgres-small-batch producer=False stdio=stderr string_id=target-postgres-small-batch
2024-04-26T16:02:41.908509Z [error    ] Loader failed
2024-04-26T16:02:41.909077Z [error    ] Block run completed.           block_type=ExtractLoadBlocks err=RunnerError('Loader failed') exit_codes={<PluginType.LOADERS: 'loaders'>: -9} set_number=0 success=False
Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
join our friendly Slack community.

Run invocation could not be completed as block failed: Loader failed
root@prod-dagster-bntalb3flzn22--i0uvfge-55b8ffc885-l95w2:/project# ERROR: {"Error":{"Code":"ClusterExecFailure","Message":"Cluster exec API returns error: command terminated with non-zero exit code: error executing command [/bin/bash], exit code 137, code: 0.","Details":null,"Target":null,"AdditionalInfo":null,"TraceId":null}}

visch

04/26/2024, 7:22 PM

https://refine.dev/blog/kubernetes-exit-code-137/#:~:text=What%20exit%20code%20137%20means%20for%20Kubernetes%E2%80%8B,many%20resources%20on%20the%20node.

visch

04/26/2024, 7:22 PM

context missing here with your orchestrator 🙂

visch

04/26/2024, 7:22 PM

Again best guess 😄

Andy Carter

04/26/2024, 7:23 PM

Yes I just arrived at that myself . Looks like time to up the memory again! Was looking for something in the meltano logs. Explains why I couldn't repro locally with docker in postgres

visch

04/26/2024, 7:24 PM

Dagster should imo have the 137 available if it's calling a k8s pod directly

Andy Carter

04/26/2024, 7:24 PM

I am already using a small batch size of 5000 to mitigate the issue,

Andy Carter

04/26/2024, 7:24 PM

Other way around, I am running dagster in azure container apps which is k8s I believe

visch

04/26/2024, 7:24 PM

ie it should be spinng up a pod to run that task (I think it's called task, I don't use dagster)

visch

04/26/2024, 7:25 PM

So the operator (again probably wrong word) ie the thing that runs the job is the instance itself?

visch

04/26/2024, 7:25 PM

That would mean the whole thing dies then though, so something else is probably running it? anyway it's an orchestration thing

Andy Carter

04/26/2024, 7:26 PM

yes, anyway luckily this time I can drop that stream. Plus I can up the memory if I need to. Cheers and have a good weekend, about done with critical errors on a friday

🍻 1

dancingpenguin 1

np 1

7 Views

Open in Slack

Previous Next