Hello! I'm starting out in data engineering and I...
# getting-started
r
Hello! I'm starting out in data engineering and I need to integrate a MongoDB database with BigQuery. I found Meltano with a solution for this, but I'm having problems; when I try to test the connection (
meltano config tap-mongodb test
) I get the message:
Copy code
m-meltano:~/prj-mdb-gbq$ meltano config tap-mongodb test
2025-04-28T17:50:03.990046Z [info     ] The default environment 'dev' will be ignored for `meltano config`. To configure a specific environment, please use the option `--environment=<environment name>`.
2025-04-28T18:03:11.496374Z [warning  ] Stream `classe` was not found in the catalog
Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to join our friendly Slack community.
Plugin configuration is invalid
No RECORD or BATCH message received. Verify that at least one stream is selected using 'meltano select tap-mongodb --list'.
The meltano.yml looks like this:
Copy code
version: 1
default_environment: dev
project_id: c1ac854b-545d
environments:
- name: dev
plugins:
  extractors:
  - name: tap-mongodb
    variant: z3z1ma
    pip_url: git+<https://github.com/z3z1ma/tap-mongodb.git>
    config:
      mongo:
        host: 12.34.5.678
        port: 27017
        directConnection: true
        readPreference: primary
        username: datalake
        password: ****
        authSource: db
        tls: false
      strategy: infer
    select:
    - classe.*
    metadata:
      dbprocapi_classe:
        replication_key: replication_key
        replication-method: LOG_BASED
For testing purposes I am trying to load only the "classe" collection (- classe.*) from the db database. When I use the command "`meltano select tap-mongodb --list --all`" I have :
Copy code
Enabled patterns: classe.*
but also appears in
Copy code
[excluded   ] db_classe.field1
[excluded   ] db_classe.field2
[excluded   ] db_classe.field3
It is important to note that MongoDB does not have replicas. I'm using: • a VM on Google Cloud to access MongoDB, both on the same network; • the tap-mongodb extractor (z3z1ma). Could someone please help me? Thank you.
1
e
should the selection pattern rather be
db_classe.*
?
r
Hello, @Edgar Ramírez (Arch.dev) I apologize if I didn't understand your question, but I imagine not, because looking at other topics I see that it returns
collection.fields
. I don't know if I filled something wrong in the yml. I imagine that the "correct" would be
classe.*
, not
db_classe.*
Update: I changed meltano.yml, in the "select" to "`db_classe.*`" and after testing it returned
Plugin configuration is valid.
I will continue the configuration, but for now I appreciate your attention, @Edgar Ramírez (Arch.dev). 😃
🙌 1
Hello, @Edgar Ramírez (Arch.dev)! Could you help me again, please? 😅 I'm trying to establish a connection between MongoDB and BigQuery; the tap-mongodb (z3z1ma) configuration is ready and tested, now I'm configuring target-bigquery (z3z1ma). After updating meltano.yml with the BigQuery parameters and running
meltano run tap-mongodb target-bigquery
I have no response. meltano.yml
Copy code
version: 1
default_environment: dev
project_id: c1ac854b-545d-468c-9c87-8c24f0c089b3
environments:
- name: dev
plugins:
  extractors:
  - name: tap-mongodb
    variant: z3z1ma
    pip_url: git+<https://github.com/z3z1ma/tap-mongodb.git>
    config:
      mongo:
        host: 10.20.2.333
        port: 27017
        directConnection: true
        readPreference: primary
        username: username
        password: password
        authSource: db_mongo
        tls: false
      strategy: infer
      database_includes:
      - db_mongo
    select:
    - db_mongo_processo.*
    metadata:
      db_mongo_processo:
        replication_key: replication_key
        replication-method: LOG_BASED
  loaders:
  - name: target-bigquery
    variant: z3z1ma
    pip_url: git+<https://github.com/z3z1ma/target-bigquery.git>
    config:
      dataset: dataset
      project: project
      column_name_transforms:
        add_underscore_when_invalid: true
      #credentials_json: ''
      denormalized: true
      flattening_enabled: false
      dedupe_before_upsert: 'false'
      location: southamerica-east1
      credentials_path: ${PWD}/.gcp/service_account_key.json
      method: storage_write_api
      overwrite: 'false'
      upsert: 'true'
Thank you!
{5189AC12-796E-408D-A9D6-8E216E958705}.png
e
Have you tried a different target? e.g.
meltano run tap-mongodb target-jsonl
. Trying to rule out other types of issues.
r
Ok. I'll try. Thanks!
👍 1