Hi! It’s my first time using Meltano to perform an...
# docker
j
Hi! It’s my first time using Meltano to perform an EL workflow, and I must admit, putting this together has been quite challenging. I’d love for this EL pipeline to be reviewed and to hear your suggestions on how it can be improved. Your recommendations would be greatly appreciated!
Copy code
version: 1
default_environment: dev
project_id: 751cca76-711b-46ec-8e5c-26afb7f94623
environments:
  - name: dev
  - name: staging
  - name: prod
plugins:
  extractors:
    - name: tap-mysql
      variant: transferwise
      pip_url: git+<https://github.com/transferwise/pipelinewise.git#subdirectory=singer-connectors/tap-mysql>
      config:
        database: ${TAP_MYSQL_DATABASE} 
        user: ${TAP_MYSQL_USER} 
        port: ${TAP_MYSQL_PORT} 
        host: ${TAP_MYSQL_HOST} 
      select:
        '*.*': true  # Select all tables by default
      metadata:
        '*.*':
          replication-method: INCREMENTAL  # Use INCREMENTAL replication for all tables
          replication_key: update_time    # Replace with your timestamp column
          key_properties:
            - id                         # Replace with your primary key column
        '*.*_audit':
          selected: false  # Exclude tables ending with "_audit"

  loaders:
  - name: target-bigquery
    variant: z3z1ma
    pip_url: git+<https://github.com/z3z1ma/target-bigquery.git>
    config:
      dataset: ${TARGET_BIGQUERY_DATASET}
      location: ${TARGET_BIGQUERY_LOCATION}
      project: ${TARGET_BIGQUERY_PROJECT} 
      credentials_json: ${TARGET_BIGQUERY_CREDENTIALS_JSON}
r
Welcome! meltano First thing I notice is how you are mapping env vars to config here
Copy code
database: ${TAP_MYSQL_DATABASE} 
        user: ${TAP_MYSQL_USER} 
        port: ${TAP_MYSQL_PORT} 
        host: ${TAP_MYSQL_HOST}
and here
Copy code
dataset: ${TARGET_BIGQUERY_DATASET}
      location: ${TARGET_BIGQUERY_LOCATION}
      project: ${TARGET_BIGQUERY_PROJECT} 
      credentials_json: ${TARGET_BIGQUERY_CREDENTIALS_JSON}
You don't actually need to do this as Meltano automatically understands config from env vars in the format you are already providing (i.e.
<PLUGIN_NAME>_<SETTING_NAME>
). So you can remove the
config
blocks for both plugins entirely if you are providing all config via env. This
Copy code
select:
        '*.*': true  # Select all tables by default
is not a valid selection rule, and should be
Copy code
select:
      - '*.*'
In fact I believe Meltano will select all streams by default, so you should be able to omit
select
entirely if you want to be less explicit. You may want to exclude
_audit
post-fixed tables with
select
rather than `metadata`:
Copy code
select:
      - '!*.*_audit'
Everything else looks good to me!
1
j
@Reuben (Matatika) thanks. Much appreciated.
Here's is the revised code but the logic to exclude
_audit
isn't working as expected. Please, I'd love you to help with this. Thanks:
Copy code
version: 1
default_environment: dev
project_id: 751cca76-711b-46ec-8e5c-26afb7f94623

environments:
  - name: dev
  - name: staging
  - name: prod

plugins:
  extractors:
    - name: tap-mysql
      variant: transferwise
      pip_url: git+<https://github.com/transferwise/pipelinewise.git#subdirectory=singer-connectors/tap-mysql>
      select:
        - '*.*'  # Select all tables first
        - '!*.*_audit'  # Then exclude audit tables
      metadata:
        select:
          '*.*':  # Apply metadata to all non-excluded tables
            replication-method: INCREMENTAL
            replication_key: update_time
            key_properties:
              - id  

  loaders:
    - name: target-bigquery
      variant: z3z1ma
      pip_url: git+<https://github.com/z3z1ma/target-bigquery.git>
r
What do you see when you run
Copy code
meltano select tap-mysql --list --all
? Also, I think you have a new issue in that
select
has appeared under
metadata
- the structure in your first version was correct.
j
meltano select tap-mysql --list --all
is displayed a list of excluded data points for both tables contain
_audit
and not
_audit
e.g
Copy code
[excluded   ] subject_choice_slot_audit.updated_by
[excluded   ] term_cost.classroom_level_id
Yea, the
select
statement for
metadata
has been corrected.
r
Can you try
!*_audit.*
and then check the output of
meltano select
again?
j
Thanks @Reuben (Matatika) this
- '!*_audit*'
worked
🙌 1
What is the best way to replicate approximately 100 tables, each with an average size of 100GB, without causing the database to time out?
r
Might be best to ask that in a separate thread in #C06A1MD6A6L or #C069A0GE129, depending on which db is causing you problems (I don't have much experience with either of the plugins you're using).
j
Alright, thanks. You've been very helpful. Thanks again
r
No problem 🙂
🙌 1