Is the best way to replicate data from multiple MS...
# best-practices
i
Is the best way to replicate data from multiple MSSQL databases with the
tap-mssql
connector to create multiple taps that inherit from the same base `tap-mssql`tap or is there a simpler way to do it? The connection to MSSQL will be the same of course across all db's.
e
Hey @Ian OLeary! If the database name is a setting of the tap, then most likely yes you need to create multiple inherited taps. You could probably use a single plugin and manage different dbs with environment variables, but you'll lose track of all the dbs in version control.
1
i
Hey @Edgar Ramírez (Arch.dev), after reconsideration I'm wondering if I would have to do this - since in MSSQL you can query data from one DB when connected to another DB. If I'm connected to DEV and I want to query a table from PROD I could just include PROD.dbo.table. What I'm wondering is if adding the DB to the beginning of my object in my meltano.yml will work instead of just
schema-table
. Since the tables I'm pulling are on different DBs but all share the same table name and columns, would meltano just ignore the fact the table already exists when it moves on to subsequent replications of the same table which exists on the other DBs? Again, all the columns/datatypes are the same. Current:
Copy code
- name: tap-mssql
    variant: buzzcutnorman
    pip_url: git+<https://github.com/BuzzCutNorman/tap-mssql.git>
    config:
      host: <our_host>
      port: 1433
      database: DB1
    select:
    - dbo-table.*
    metadata:
      DB1-dbo-table_name:
        replication-method: INCREMENTAL
        replication-key: REPLICATION_KEY
Proposed:
Copy code
- name: tap-mssql
    variant: buzzcutnorman
    pip_url: git+<https://github.com/BuzzCutNorman/tap-mssql.git>
    config:
      host: <our_host>
      port: 1433
      database: DB1
    select:
    - DB1-dbo-table_name.*
    - DB2-dbo-table_name.*
    - DB3-dbo-table_name.*
    metadata:
      DB1-dbo-table_name:
        replication-method: INCREMENTAL
        replication-key: REPLICATION_KEY
      DB2-dbo-table_name:
        replication-method: INCREMENTAL
        replication-key: REPLICATION_KEY
      DB3-dbo-table_name:
        replication-method: INCREMENTAL
        replication-key: REPLICATION_KEY
Are there any glaring issues with this?
Copy code
- name: tap-mssql
    variant: buzzcutnorman
    pip_url: git+<https://github.com/BuzzCutNorman/tap-mssql.git>
    config:
      host: <host_ip>
      port: 1433
  - name: tap-mssql--db1
    inherit_from: tap-mssql
    config:
      database: DB1
    select:
    - dbo-table_1_name.*
    metadata:
      dbo-table_1_name:
        replication-method: INCREMENTAL
        replication-key: <REPLICATION_KEY>
  - name: tap-mssql--db2
    inherit_from: tap-mssql
    config:
      database: DB2
    select:
    - dbo-table_1_name.*
    metadata:
      dbo-table_1_name:
        replication-method: INCREMENTAL
        replication-key: <REPLICATION_KEY>
Otherwise, If I'm doing the inheritance method would this be the rough setup?
ok rant over - Ended up just doing the inheritance - worked fine. Probably should map the streams to separate tables in case one of my source tables on one of the dbs changes, but since all the columns are the same it's working fine and replicating the "table_1" tables across my multiple source dbs into one table in my snowflake DW. Queried the rowcounts and did some other checks and it's all there. I don't think mapping the streams to different tables is really going to matter though because it's a legacy transactional system so the likelihood of column changes is extremely low. We'll see though
👍 1
e
tap-mssql probably only extracts from a single db at a time, so a selection like
DB1-dbo-table_name.*
would certainly not work and the inheritance approach is the best.
1