Hey everyone! I’m trying to configure <tap-mongodb...
# plugins-general
j
Hey everyone! I’m trying to configure tap-mongodb but have issues setting up a host to connect within the yaml file. I have the cluster activated, and the configuration template is set as follows (correctly filled):
Copy code
config:
  mongo:
    host: mongodb+srv://<cluster>:<pass>@<cluster>.oji0mra.mongodb.net/?retryWrites=true&w=majority&appName=<cluster>
    port: 27017
    username: username
    password: password
I tried more variants with no luck. I could not find a proper example of the setting. I can reach it from Python or shell, but unable from this Meltano conf. Thank you for any help.
m
(author of the meltanolabs variant here) For the meltanolabs variant you’ll want to set the
mongodb_connection_string
config property. With the config you’ve provided it might look like this:
Copy code
config:
  mongodb_connection_string: mongodb+srv://<user>:<pass>@<cluster>.<http://oji0mra.mongodb.net:27017/?retryWrites=true&w=majority&appName=<cluster|oji0mra.mongodb.net:27017/?retryWrites=true&w=majority&appName=<cluster>>
j
My bad. The cluster is the same as the user, so I have it correct. The thing is that I have to follow the structure, and the config might be completely different; I’m unable to find the working one. Originally, It was configured like this:
Copy code
mongo:
        host: <cluster>.<http://oji0mra.mongodb.net|oji0mra.mongodb.net>
        port: 27017
        username: <user>
        password: <pass>
m
the
meltanolabs
variant of tap-mongodb has never supported that configuration format. It’s only ever used the connection string form. The
z3z1ma
variant (which is the default one in the meltano hub) does support that format, though: https://github.com/z3z1ma/tap-mongodb/blob/main/meltano.yml#L63-L72 - is that what you are after?
j
Yeah, I use this one.
I somehow need to set the host and reach the cluster on Atlas.
I instead went with dockerized Mongo, and it works fine.
b
@Matt Menzenski I've been using the MongoDB tap with good results for the past few months but I'd really like to figure out how to get update/deletes to propagate from the connector. I've seen this issue here and there is a Draft PR that seems to address this ticket, will the connector when using LOG_BASED replication not currently perform an initial backfill of records? I was considering trying to patch this manually myself to get unblocked if it was a clear change to make https://github.com/MeltanoLabs/tap-mongodb/issues/33
m
The tap in log-based mode will not do an initial backfill of records, but that’s an interesting idea - I don’t see any reason why it couldn’t do that (although I’d want to put that behind an opt-in config setting). Would you open a GitHub issue for that?
Regarding the log-based replication, I need to close that PR and open a new one. I’ve been iterating on a private fork of that tap and have rewritten the whole log-based implementation. I need to contribute that back to the public tap.
❤️ 1
b
That would be immensely helpful, I'm a bit swamped with other misc tasks at the moment but I'll try to get a ticket up on the repo
@Matt Menzenski No rush but I've had a sudden influx of new customers on our platform using MongoDB that has been hitting the limitations of the current tap implementation, it would be incredibly handy if I could test your new private fork if you have any capacity to open a PR again the public tap. Happy to also help out with contributing too but honestly Mongo is a big blind spot for me vs. other databases
m
Yeah, I can try to get a PR together. We have been running the private fork for a while now and it has been doing great. It is a significant change for log-based mode though, as it now opens one cluster-wide change stream rather than one change stream on each individual collection. I am not sure that other users will want this behavior. (As you either need to define inline stream maps to route records to collection-specific sinks or land the raw records into one enormous table).
b
For better or worse we are mapping one Meltano process to write to a single Clickhouse table instead of trying to use Meltano to multiplex so I think I could create some mapping rules on our side to handle that generically