sean_han
06/29/2023, 4:30 AMmeltano run tap-s3-csv-update target-postgres-application --full-refresh
When it loaded current/update/update.csv
file (14.5KB), it is working fine. but when it tried to load current/update/update_data.csv
file (50.0MB), I got the following errors. The error messages didn't show the root cause.
It is working fine when I ran from my local env.
How can I solve the issue here and how to debug this issue? Thanks in advance.
```[2m2023-06-28T221753.819815Z[0m [[32m[1minfo [0m] [1mEnvironment 'dev' is active[0m
[2m2023-06-28T221806.906567Z[0m [[32m[1minfo [0m] [1mPerforming full refresh, ignoring state left behind by any previous runs.[0m
[2m2023-06-28T221817.993238Z[0m [[32m[1minfo [0m] [1mtime=2023-06-28 221817 name=tap_s3_csv level=INFO message=Attempting to create AWS session[0m [36mcmd_type[0m=[35melb[0m [36mconsumer[0m=[35mFalse[0m [36mname[0m=[35mtap-s3-csv-update[0m [36mproducer[0m=[35mTrue[0m [36mstdio[0m=[35mstderr[0m [36mstring_id[0m=[35mtap-s3-csv-update[0m
[2m2023-06-28T221818.095091Z[0m [[32m[1minfo [0m] [1mtime=2023-06-28 221818 name=tap_s3_csv level=INFO message=Starting sync.[0m [36mcmd_type[0m=[35melb[0m [36mconsumer[0m=[35mFalse[0m [36mname[0m=[35mtap-s3-csv-update[0m [36mproducer[0m=[35mTrue[0m [36mstdio[0m=[35mstderr[0m [36mstring_id[0m=[35mtap-s3-csv-update[0m
[2m2023-06-28T221818.193323Z[0m [[32m[1minfo [0m] [1mtime=2023-06-28 221818 name=tap_s3_csv level=INFO message=data_update: Starting sync[0m [36mcmd_type[0m=[35melb[0m [36mconsumer[0m=[35mFalse[0m [36mname[0m=[35mtap-s3-csv-update[0m [36mproducer[0m=[35mTrue[0m [36mstdio[0m=[35mstderr[0m [36mstring_id[0m=[35mtap-s3-csv-update[0m
[2m2023-06-28T221818.193762Z[0m [[32m[1minfo [0m] [1mtime=2023-06-28 221818 name=tap_s3_csv level=INFO message=Syncing table "data_update".[0m [36mcmd_type[0m=[35melb[0m [36mconsumer[0m=[35mFalse[0m [36mname[0m=[35mtap-s3-csv-update[0m [36mproducer[0m=[35mTrue[0m [36mstdio[0m=[35mstderr[0m [36mstring_id[0m=[35mtap-s3-csv-update[0m
[2m2023-06-28T221818.194022Z[0m [[32m[1minfo [0m] [1mtime=2023-06-28 221818 name=tap_s3_csv level=INFO message=Getting files modified since 2023-06-01 000000+00:00.[0m [36mcmd_type[0m=[35melb[0m [36mconsumer[0m=[35mFalse[0m [36mname[0m=[35mtap-s3-csv-update[0m [36mproducer[0m=[35mTrue[0m [36mstdio[0m=[35mstderr[0m [36mstring_id[0m=[35mtap-s3-csv-update[0m
[2m2023-06-28T221818.194254Z[0m [[32m[1minfo [0m] [1mtime=2023-06-28 221818 name=tap_s3_csv level=INFO message=Checking bucket "my-data-platform-dev" for keys matching ".csv"[0m [36mcmd_type[0m=[35melb[0m [36mconsumer[0m=[35mFalse[0m [36mname[0m=[35mtap-s3-csv-update[0m [36mproducer[0m=[35mTrue[0m [36mstdio[0m=[35mstderr[0m [36mstring_id[0m=[35mtap-s3-csv-update[0m
[2m2023-06-28T221818.194480Z[0m [[32m[1minfo [0m] [1mtime=2023-06-28 221818 name=tap_s3_csv level=INFO message=Skipping files which have a LastModified value older than 2023-06-01 000000+00:00[0m [36mcmd_type[0m=[35melb[0m [36mconsumer[0m=[35mFalse[0m [36mname[0m=[35mtap-s3-csv-update[0m [36mproducer[0m=[35mTrue[0m [36mstdio[0m=[35mstderr[0m [36mstring_id[0m=[35mtap-s3-csv-update[0m
[2m2023-06-28T221819.136358Z[0m [[32m[1minfo [0m] [1mtime=2023-06-28 221819 name=tap_s3_csv level=INFO message=Found 3 files.[0m [36mcmd_type[0m=[35melb[0m [36mconsumer[0m=[35mFalse[0m [36mname[0m=[35mtap-s3-csv-update[0m [36mproducer[0m=[35mTrue[0m [36mstdio[0m=[35mstderr[0m [36mstring_id[0m=[35mtap-s3-csv-update[0m
[2m2023-06-28T221819.139097Z[0m [[32m[1minfo [0m] [1mtime=2023-06-28 221819 name=tap_s3_csv level=INFO message=Skipping matched file "current/update/" as it is empty[0m [36mcmd_type[0m=[35melb[0m [36mconsumer[0m=[35mFalse[0m [36mname[0m=[35mtap-s3-csv-update[0m [36mproducer[0m=[35mTrue[0m [36mstdio[0m=[35mstderr[0m [36mstring_id[0m=[35mtap-s3-csv-update[0m
[2m2023-06-28T221819.139614Z[0m [[32m[1minfo [0m] [1mtime=…sean_han
06/29/2023, 5:31 AMAndy Carter
06/29/2023, 8:28 AMNO_COLOR=1
as an environment variable will clean up the logs in dagster a bit. Also I would use meltano --log-level=debug run ...
and see if you get any more info in the logs.Andy Carter
06/29/2023, 8:29 AMtables:
section of meltano.yml
Andy Carter
06/29/2023, 8:50 AMupdate_data.csv
and see if the run completes? Maybe start with a couple MB and work up from there?visch
06/29/2023, 10:35 AMsean_han
06/29/2023, 4:16 PMtables: [{search_prefix: current/update, search_pattern: .csv, table_name: data_update,
key_properties: ["[ID]"], delimiter: " "}]
sean_han
06/29/2023, 4:17 PMsean_han
06/29/2023, 4:33 PM--log-level=debug
last night, I got more helpful info from Dagster. it is related to the memory. I am trying to re-run the job using more memories. Here is the message:
Multiprocess executor: child process for step load_update_data was terminated by signal 9 (SIGKILL). This usually indicates that the process was killed by the operating system due to running out of memory. Possible solutions include increasing the amount of memory available to the run, reducing the amount of memory used by the ops in the run, or configuring the executor to run fewer ops concurrently.
dagster._core.executor.child_process_executor.ChildProcessCrashException
Stack Trace:
File "/usr/local/lib/python3.9/site-packages/dagster/_core/executor/multiprocess.py", line 240, in execute
event_or_none = next(step_iter)
, File "/usr/local/lib/python3.9/site-packages/dagster/_core/executor/multiprocess.py", line 357, in execute_step_out_of_process
for ret in execute_child_process_command(multiproc_ctx, command):
, File "/usr/local/lib/python3.9/site-packages/dagster/_core/executor/child_process_executor.py", line 174, in execute_child_process_command
raise ChildProcessCrashException(exit_code=process.exitcode)
visch
06/29/2023, 5:06 PMsean_han
06/29/2023, 5:11 PMsean_han
06/29/2023, 5:12 PMsean_han
06/29/2023, 5:21 PMvisch
06/29/2023, 5:24 PMvisch
06/29/2023, 5:24 PMsean_han
06/29/2023, 5:25 PMsean_han
06/29/2023, 5:25 PMvisch
06/29/2023, 5:26 PMvisch
06/29/2023, 5:26 PMsean_han
06/29/2023, 5:29 PM