https://meltano.com/ logo
#cli
Title
# cli
a

aloof-twilight-67541

09/24/2020, 8:26 PM
I'm using the meltano-1.51.0 CLI to run an ELT for a single table, however it seems like I run out of memory and when that happens the pipeline breaks (allocating 7GB to an ubuntu 16 container) .
.venv/meltano/bin/meltano elt --job_id job_directsource tap-postgres target-postgres --transform run
I can complete the full sync ELT if I break it into 3 steps:
.venv/meltano/bin/meltano invoke tap-postgres > singer.jsonl
cat singer.jsonl | .venv/meltano/bin/meltano invoke target-postgres
meltano invoke dbt run --models my_meltano_project --profiles-dir $(pwd)/transform/profile/ --profile meltano
I was wondering if there's something I can do to complete the ELT without having to increase the memory in the container or break it into 3 steps. Ideally I want to have log-based replication, but I am not sure if that will work with meltano-invoke.
1
r

ripe-musician-59933

09/24/2020, 8:50 PM
@aloof-twilight-67541 Do you know if the memory is being used up by the tap, the target, or Meltano itself? How large is the
singer.jsonl
file? How much memory do the tap and target invocations use when running by themselves? Meltano keeps at most 1MB worth of Singer messages in its buffer, so I think we're either looking at a memory leak, or an inefficiently implemented tap or target.
Multiple variants of tap-postgres and target-postgres exist, so you may have more success with different versions such as https://github.com/transferwise/pipelinewise-tap-postgres and https://github.com/transferwise/pipelinewise-target-postgres
a

aloof-twilight-67541

09/24/2020, 9:29 PM
Its the target-postgres that's using up the memory. Meltano is at about 64MB and the tap about 46.2MB. The target kept growing. I'll give the alternatives a go. Thanks!
r

ripe-musician-59933

09/24/2020, 9:31 PM
Sounds good, please let me know if pipelinewise-target-postgres is better behaved! If it's still buffering a lot, changing the
batch_size_rows
setting may make a difference: https://github.com/transferwise/pipelinewise-target-postgres#configuration-settings For what it's worth, I'm currently working on a feature that will let us start defaulting to the pipelinewise version of target-postgres instead of ours, for new users at least: https://gitlab.com/meltano/meltano/-/issues/2134
a

aloof-twilight-67541

10/29/2020, 4:26 PM
As a follow-up to this, I switched the target to pipelinewise-target-postgres and it was better behaved, but the pipline would still break. I ended up breaking the ELT into 3 steps, like I mentioned and that works but its slow. The table size is ~3.5 GB in the jsonl.
r

ripe-musician-59933

10/29/2020, 4:46 PM
Was it still pipelinewise-target-postgres that was taking up too much memory?
a

aloof-twilight-67541

10/30/2020, 11:53 PM
It seems like its Meltano this time
r

ripe-musician-59933

11/02/2020, 3:52 PM
@aloof-twilight-67541 Hmm, that's not good! Can you please create an issue about this high memory usage in https://gitlab.com/meltano/meltano/-/issues, so that we can collect more data and try to get to the bottom of this?
Do you have experience profiling Python memory usage using a tool like https://github.com/pythonprofilers/memory_profiler, by any chance?