Hi all, I'm debugging an issue with a tap-snowflak...
# troubleshooting
p
Hi all, I'm debugging an issue with a tap-snowflake-to-target-csv pipeline. Runs fine locally; but when I package it up into a Docker image, and run in a k8s pod attached to an EFS volume, I get an error and not clear how to debug it. ```2022-12-16T234205.670660Z [info ] time=2022-12-16 234205 name=singer level=INFO message=METRIC: {"type": "counter", "metric": "record_count", "value": 59725, "tags": {"database": "ANALYTICS", "table": "SALES_INVOICE_V"}} cmd_type=elb consumer=False name=tap-snowflake-sap-sales-invoice producer=True stdio=stderr string_id=tap-snowflake-sap-sales-invoice 2022-12-16T234304.671892Z [error ] Loader failed ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /.venv/lib/python3.9/site-packages/meltano/core/logging/output_logger.py:201 in redirect_logging │ │ │ │ 198 │ │ │ *ignore_errors, │ │ 199 │ │ ) │ │ 200 │ │ try: │ │ ❱ 201 │ │ │ yield │ │ 202 │ │ except ignored_errors: # noqa: WPS329 │ │ 203 │ │ │ raise │ │ 204 │ │ except Exception as err: │ │ │ │ ╭────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ err = RunnerError('Loader failed') │ │ │ │ ignore_errors = () │ │ │ │ ignored_errors = (<class 'KeyboardInterrupt'>, <class 'asyncio.exceptions.CancelledError'>) │ │ │ │ logger = <RootLogger root (INFO)> │ │ │ │ self = <meltano.core.logging.output_logger.Out object at 0x7f8c9b48a970> │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /.venv/lib/python3.9/site-packages/meltano/core/block/extract_load.py:461 in run │ │ │ │ 458 │ │ │ # TODO: legacy
meltano elt
style logging should be deprecated │ │ 459 │ │ │ legacy_log_handler = self.output_logger.out("meltano", logger) │ │ 460 │ │ │ with legacy_log_handler.redirect_logging(): │ │ ❱ 461 │ │ │ │ await self.run_with_job() │ │ 462 │ │ │ │ return …
w
Looks like you're getting exit code 9, which is an out-of-memory error.
p
that's problematic.
Thanks for pointing that out
So, it's not ideal to hold a large snowflake table in memory waiting to be written to CSV. are there any strategies around handling this? Like, using partition keys? Do any CSV load plugins do chunked loading?
i see that
target-csv
(meltanolabs variant) implements
process_batch
. would this help in copying only batches at a time from snowflake to csv? does the tap need to implement batch as well?
w
@edgar_ramirez_mondragon would know better than I
e
would this help in copying only batches at a time from snowflake to csv?
It doesn’t, unfortunately. It does process records in batches, but there’s nothing controlling the maximum batch size. Some targets allow you to fine tune this batch size, but that one doesn’t.
does the tap need to implement batch as well?
@peter_pezon not necessarily. Most targets should be able to batch the records they read from stdin and commit them based on a few triggers, e.g. a SCHEMA message from a different stream or a STATE message. https://github.com/MeltanoLabs/target-csv/issues/3