Hi folks, I am trying to create an ephemeral melta...
# troubleshooting
s
Hi folks, I am trying to create an ephemeral meltano pipeline which does a simple ELT operation, where in other people simply configure the
meltano.yml
file. I am planning to orchestrate this via our existing airflow installation(via KPO) by just running the
meltano elt tap-xyz target-abc
cmd incrementally. I am using s3 as a state backend. My solution is working fine except for the state management. Every time the pipeline starts, it does a complete fresh import even though the state files are being written on s3. I see this at the end of the logs:
Copy code
00954--tap-lever--target-redshift stdio=stderr
2023-01-14T20:11:06.023202Z [info     ] Writing state to AWS S3
2023-01-14T20:11:07.232593Z [info     ] smart_open.s3.MultipartWriter('mybucketxyz', 'lever-data-state/2023-01-14T200954--tap-lever--target-redshift/lock'): uploading part_num: 1, 17 bytes (total 0.000GB)
2023-01-14T20:11:07.480713Z [info     ] smart_open.s3.MultipartWriter('mybucketxyz', 'lever-data-state/2023-01-14T200954--tap-lever--target-redshift/state.json'): uploading part_num: 1, 141 bytes (total 0.000GB)
2023-01-14T20:11:07.744721Z [info     ] Incremental state has been updated at 2023-01-14 20:11:07.744534.
2023-01-14T20:11:07.758494Z [info     ] Extract & load complete!       name=meltano run_id=c62c772d-5297-4a7f-ba20-46cf1965b638 state_id=2023-01-14T200954--tap-lever--target-redshift
2023-01-14T20:11:07.759291Z [info     ] Transformation skipped.
Next run:
Copy code
2023-01-14T20:11:39.625689Z [info     ] Reading state from AWS S3
2023-01-14T20:11:41.101692Z [info     ] smart_open.s3.MultipartWriter('mybucketxyz"', lever-data-state/2023-01-14T201131--tap-lever--target-redshift/lock'): uploading part_num: 1, 17 bytes (total 0.000GB)
2023-01-14T20:11:41.313667Z [info     ] No state found for 2023-01-14T201131--tap-lever--target-redshift.
2023-01-14T20:11:41.369482Z [warning  ] No state was found, complete import.
Do I need an external meltano db to implement the solution? from what I understood in the doc is that db is optional if we are relying on a cloud storage for state management.
k
provide --state-id as well
meltano elt tap-xyz target-abc --state-id <some-id>
s
It should pick the state automatically from s3 right? I digged in a bit deeper and it looks like the tap that I am using will always load some of the streams in cache before beginning. I might be wrong though https://github.com/singer-io/tap-lever/blob/master/tap_lever/streams/__init__.py#L21
s
@edgar_ramirez_mondragon?
e
It should pick the state automatically from s3 right?
meltano run
would auto-generate the state key and pull the right payload from S3, but
meltano elt
requires you to explicitly set the
--state-id
option