running into issues with postgres -> bigquery, ...
# troubleshooting
l
running into issues with postgres -> bigquery, using latest versions of meltano, postgres tap (transferwise variant) and bigquery loader (z3z1ma variant, method is
storage_write_api
), incremental replication on a ~260M row table after running for about 12 hours, meltano starts using almost all available RAM on a 4G EC2 box, and basically grinds to a halt. when things are working smoothly, logs look like this:
Copy code
2023-05-09T16:25:04.617198Z [info     ] 2023-05-09 16:25:04,613 | INFO     | target-bigquery      | [a57699c1088f4a17a83f1812f70b389e] Sent 500 rows to projects/REDACTED/datasets/meltano_prod/tables/public_versions/streams/_default with offset 71500. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-09T16:25:05.896054Z [info     ] 2023-05-09 16:25:05,895 | INFO     | target-bigquery      | Target sink for 'public-versions' is full. Draining... cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-09T16:25:05.896655Z [info     ] 2023-05-09 16:25:05,895 | INFO     | target-bigquery      | [a57699c1088f4a17a83f1812f70b389e] Sent 500 rows to projects/REDACTED/datasets/meltano_prod/tables/public_versions/streams/_default with offset 72000. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-09T16:25:06.589807Z [info     ] 2023-05-09 16:25:06,589 | INFO     | target-bigquery      | Target sink for 'public-versions' is full. Draining... cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-09T16:25:06.591187Z [info     ] 2023-05-09 16:25:06,591 | INFO     | target-bigquery      | [a57699c1088f4a17a83f1812f70b389e] Sent 500 rows to projects/REDACTED/datasets/meltano_prod/tables/public_versions/streams/_default with offset 72500. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-09T16:25:07.034480Z [info     ] 2023-05-09 16:25:07,034 | INFO     | target-bigquery      | Target sink for 'public-versions' is full. Draining... cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-09T16:25:07.035043Z [info     ] 2023-05-09 16:25:07,034 | INFO     | target-bigquery      | [a57699c1088f4a17a83f1812f70b389e] Sent 500 rows to projects/REDACTED/datasets/meltano_prod/tables/public_versions/streams/_default with offset 73000. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-09T16:25:07.463604Z [info     ] 2023-05-09 16:25:07,463 | INFO     | target-bigquery      | Target sink for 'public-versions' is full. Draining... cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-09T16:25:07.466982Z [info     ] 2023-05-09 16:25:07,466 | INFO     | target-bigquery      | [a57699c1088f4a17a83f1812f70b389e] Sent 500 rows to projects/REDACTED/datasets/meltano_prod/tables/public_versions/streams/_default with offset 73500. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-09T16:25:07.853789Z [info     ] 2023-05-09 16:25:07,853 | INFO     | target-bigquery      | Target sink for 'public-versions' is full. Draining... cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-05-09T16:25:07.854452Z [info     ] 2023-05-09 16:25:07,854 | INFO     | target-bigquery      | [a57699c1088f4a17a83f1812f70b389e] Sent 500 rows to projects/REDACTED/datasets/meltano_prod/tables/public_versions/streams/_default with offset 74000. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
but at a certain point, the logs start looking like this: ```2023-05-09T150921.273318Z [info ] 2023-05-09 150921,272 | INFO | target-bigquery | Target sink …
v
smells like a target-bigquery issue but it's hard to say for sure. I'd profile it some how, but context really matters and it's hard to give a general answer for how to hop into this one
p
i would expect meltano to eventually just crash or enter a recover process if it’s been in this state for long enough, but it probably only does so if the target bubbles up an error or something
I think this is true, meltano is running the tap/target and doesnt get any alert that the target is hanging unless it errors out so I dont think Meltano would be able to do anything. Depending on what you figure out the cause is, its possible the SDK could implement a way to detect this for target developers and raise an error instead of hanging.
t
Are there no errors in the log at all? They may occur way before things appear to be broken, and won't necessarily be logged with level=ERROR - the tap/target logs may indicate an error but meltano logs it all as INFO, so grep works better than any kind of standard log parsing.
l
thanks all - there probably are errors in the logs, but i’m running meltano in tmux and unfortunately didn’t have it configured to support sufficient scrollback
t
Meltano writes the logs to a file too... you'll be able to find it there.
In the
.meltano/logs/elt/<pipeline_name>
directory
l
ah thanks, here’s what I see ```2023-05-09T215012.304554Z [info ] 2023-05-09 215012,303 | INFO | target-bigquery | [a57699c1088f4a17a83f1812f70b389e] Sent 500 rows to projects/REDACTED/datasets/meltano_prod/tables/public_versions/streams/_default with offset 10034500. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery 2023-05-09T215012.580110Z [info ] 2023-05-09 215012,575 | INFO | target-bigquery | Target sink for 'public-versions' is full. Draining... cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery 2023-05-09T215012.580700Z [info ] 2023-05-09 215012,579 | INFO | target-bigquery | [a57699c1088f4a17a83f1812f70b389e] Sent 500 rows to projects/REDACTED/datasets/meltano_prod/tables/public_versions/streams/_default with offset 10035000. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery 2023-05-09T215012.939226Z [info ] 2023-05-09 215012,939 | INFO | target-bigquery | Target sink for 'public-versions' is full. Draining... cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery 2023-05-09T215012.949876Z [info ] 2023-05-09 215012,947 | INFO | target-bigquery | [a57699c1088f4a17a83f1812f70b389e] Sent 500 rows to projects/REDACTED/datasets/meltano_prod/tables/public_versions/streams/_default with offset 10035500. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery 2023-05-09T215019.421579Z [info ] 2023-05-09 215019,421 | INFO | target-bigquery | Target sink for 'public-versions' is full. Draining... cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery 2023-05-09T215019.430226Z [info ] 2023-05-09 215019,430 | INFO | target-bigquery | [a57699c1088f4a17a83f1812f70b389e] Sent 500 rows to projects/REDACTED/datasets/meltano_prod/tables/public_versions/streams/_default with offset 10036000. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery 2023-05-09T215020.047010Z [info ] 2023-05-09 215020,046 | INFO | google.cloud.bigquery_storage_v1.writer | RPC termination has signaled streaming pull manager shutdown. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery 2023-05-09T215020.105434Z [info ] 2023-05-09 215020,105 | INFO | google.api_core.bidi | Thread-ConsumeBidirectionalStream exiting cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery 2023-05-09T215045.828067Z [info ] 2023-05-09 215045,827 | INFO | target-bigquery | Target sink for 'public-versions' is full. Draining... cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery 2023-05-09T215045.828597Z [info ] 2023-05-09 215045,827 | INFO | target-bigquery | [a57699c1088f4a17a83f1812f70b389e] Sent 500 rows to projects/REDACTED/datasets/meltano_prod/tables/public_versions/streams/_default with offset 10036500. cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery 2023-05-09T215108.839056Z [info ] time=2023-05-09 215108 name=singer level=INFO message=METRIC: {"type": "counter", "metric": "record_count", "value": 11081, "tags": {}} cmd_type=elb consumer=False name=tap-postgres producer=True stdio=stderr string_id=tap-postgres 2023-05-09T215200.731144Z [info ] 2023-05-09 215200,730 | INFO | target-bigquery | Target sink for 'public-versions' is full. Draining... cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery 2023-05-09T215208.890740Z [info ] time=2023-05-09 215208 name=singer level=INFO message=METR…
filed an issue here https://github.com/z3z1ma/target-bigquery/issues/43, seems like an unhandled 400 error from bigquery API