https://linen.dev logo
#plugins-general
Title
# plugins-general
p

par_degerman

06/09/2023, 8:27 PM
Hey all. Getting confused here. I have a mariadb source database that I'm extracting data from and pushing to a snowflake warehouse. I have around 45 databases in the mariadb instance and those are each pushed into different schemas. All good so far. All databases have the same tables inside of them, so I'm using wildcards to select the tables and columns that I want to replicate. There are also some columns that I don't want to extract as they contain sensitive information, so those are de-selected again using negated wildcard selects. It looks something like this:
Copy code
plugins:
  extractors:
    - name: source-db
      inherit_from: tap-mysql
      variant: transferwise
      pip_url: pipelinewise-tap-mysql
      config:
        host: localhost
        user: meltano
        engine: mariadb
        use_gtid: true
      select:
        - "*-Orders.*"
        - "*-Payments.*"
        - "!*-Payments.CreditCardNumber"
        - "*-Customers.*"
        - "!*-Customers.owner*"
        - "!*-Customers.secondOwner*"
        - "*-EventDataKey.*"
        - "*-Export.*"
        - "*-Note.*"
        - "*-LogItem.*"
        - "!*-LogItem.data"
      metadata:
        "*":
          replication-method: LOG_BASED
Problem is now that the de-selected columns show up from time to time in the snowflake warehouse. The pipelines all run orchetrated by airflow, and the de-selected columns shows up only for some of the databases (seemingly random) and only at certain syncs. So two things; A. Is the above not the correct way to select "all but a few" columns out of a large number of databases? B. Is there a better way? Perhaps using a stream mapping to transform away the unwanted columns?