Hey Meltano Team, I am wondering, what is the reco...
# announcements
r
Hey Meltano Team, I am wondering, what is the recommended way to retry download of few objects among configured 100s of objects? Let’s say I configured the
meltano.yml
with 100s of objects for a
tap
and realized some downloaded data are not correct, worked with the source data provider to fix the data and then now I want to re-download only selected few objects. Obviously I can do that by editing the
meltano.yml
or running
meltano select
, but since this is a temporary phase to re-download I would like to avoid changing
meltano.yml
. Is there some easier way like passing a list to meltano cli to re-download some selected objects from a
tap
?
j
If you mean you want to re-run an ELT job from the beginning, you can pass
--full-refresh
to
meltano elt
which will cause it to ignore the saved state and start over from the start date
d
@rahul_anand Good question! Meltano does not currently have an option to only select/refresh a single entity type, but an issue to track this has already been created: https://gitlab.com/meltano/meltano/-/issues/2155. I hope to get to that over the next few weeks.
r
@douwe_maan I guess it will be more helpful, if Meltano can take a list of streams to do full-refresh or incremental run.
d
@rahul_anand I agree, I’m imagining resolving that issue with a
meltano elt
CLI option that can take one or multiple stream names, e.g.
meltano elt ... --select foo,bar,baz
(Exact naming and format to be determined, of course)
Note that you can already override the “select” feature on a per-pipeline basis using an environment variable: https://meltano.com/docs/command-line-interface.html#extractor-extra-select However, the state that will be stored will not refer to any of the other streams, meaning that the next ELT run will assume those other streams have not been extracted yet at all, and need to be refreshed entirely
In addressing that issue, we’ll have to make state management slightly smarter, to prevent that from happening
r
This makes sense. We need to be careful around how it interferes with the state management.
In most frequent scenario - This is needed to troubleshoot some failed runs or bad data download in data source itself. In my view the feature to temporary override select list to run few streams should integrate with the other executions and remember the state for next execution with/without overridden select list. We should also allow specifying the
--full-refresh
option to selectively refresh data for selected few streams.
d
I agree! Feel free to comment that in the issue if you want to make sure it doesn’t get lost :)