I am stuck in a loop using Meltano to copy data fr...
# troubleshooting
t
I am stuck in a loop using Meltano to copy data from Mongo to Snowflake. I have set the replication strategy to incremental but when Meltano run fails (not sure exactly why but maybe Mongo is busy, anyway the reason is not important), even though the state - meltano.db - shows the latest replication key, Meltano seems to start from the beginning when restarted. Is this the expected behavior?
t
What command are you using to run meltano? If you're using
elt
then you need to specify a job name for it to track state. If you're using
run
it should do it automatically though.
t
meltano run tap-mongodb target-snowflake
That's how I run it
I sqlite-d into meltano.db and can see that the state is up to date. It shows the last replication key. But then when meltano runs again, the replication key is reset.
t
What do your
select
rules in meltano.yml look like? Meltano will store the high water mark for each stream at the end of the run but whether or not it retrieves that and uses it for the next run depends on the configuration.
t
select:
- table1.*
- table2.*
metadata:
'*':
replication-method: INCREMENTAL
replication-key: _id
Isn't that how you configure?
t
Yeah, that looks right... maybe try replacing the wildcard under metadata with an explicit table name? Also, none of the patterns in our configs are in single quotes... ours are all things like
dbo-*
Could also be something about Mongo... I haven't worked with that at all
t
single quote was added by Meltano - I did not hand edit meltano.yml
t
I have no proof that's the problem, of course, but it sounds plausible 😉
t
This is good information. I believe this issue has been resolved in the latest version of tap-mongodb.
m
Which tap-mongodb variant are you using, is it the default one?
t
- name: tap-mongodb
variant: z3z1ma
pip_url: git+<https://github.com/z3z1ma/tap-mongodb.git>
Yes
I can see that Meltano is emitting state as replication key is updated but if I were to kill it and then restart, Meltano goes back to the beginning instead of picking up from where it left off, which should be the last replication_key it knows about.
m
My menzenski variant of tap-mongodb picks up where it left off in this scenario, for what it’s worth. You might see if that works for your use case.
t
Thanks @Matt Menzenski I will try it today