Hey :wave: I have the following meltano.yml file d...
# getting-started
b
Hey đź‘‹ I have the following meltano.yml file definition:
Copy code
- name: dev
  config:
    plugins:
      extractors:
      - name: tap-mysql
        select:
        - db-tableA.*
        - db-tableB.*
        - db-tableC.*
        metadata:
          db-tableA:
            replication-method: INCREMENTAL
            replication-key: updated_at
          db-tableB:
            replication-method: INCREMENTAL
            replication-key: updated_at
          db-tableC:
            replication-method: INCREMENTAL
            replication-key: updated_at
It works perfectly .. I added env vars to use in the yml file as following (I want to build the select tables config dynamically using the env var value):
Copy code
- name: dev
  config:
    plugins:
      extractors:
      - name: tap-mysql
        select:
        - "${DB_NAME}-tableA.*"
        - "${DB_NAME}-tableB.*"
        - "${DB_NAME}-tableC.*"
        metadata:
          "${DB_NAME}-tableA":
            replication-method: INCREMENTAL
            replication-key: updated_at
          "${DB_NAME}-tableB":
            replication-method: INCREMENTAL
            replication-key: updated_at
          "${DB_NAME}-tableC":
            replication-method: INCREMENTAL
            replication-key: updated_at
But, its not working (not getting anything extracted - no errors in the logs) .. any ideas or thoughts?
a
I think this is a known issue, using env vars in an array (under the select): https://github.com/meltano/meltano/issues/3171
b
thanks .. but it seems not resolved yet!
u
yeah its not solved yet, can you put a comment in the issue to explain your use case to help get it moved to higher priority. A work around is to put those configurations in environment variables but thats obviously more work and not ideal
d
@bassel As a workaround, you can: • Remove
select
from
meltano.yml
• Update the
metadata
keys to have
*
instead of
${DB_NAME}
• When you run
meltano elt tap-mysql target-<foo>
, pass
--select "${DB_NAME}-tableA" --select "${DB_NAME}-tableB" --select "${DB_NAME}-tableC"
That requires you to able to specify the env var at time of execution, if that’s an option
b
@douwe_maan what about the metadata field? how can I specify the replication method/key?
i.e.
Copy code
metadata:
          ${DB_NAME}-tableA:
            replication-method: INCREMENTAL
            replication-key: updated_at
d
@bassel There, you can use
*
instead of
${DB_NAME}
to apply the rules to all streams. Then when you use
--select
, only a subset of those streams will actually be pulled so the metadata options for the other streams are ignored
b
hmm .. each table has a different replication key / field @douwe_maan so the generic one will not work. or do you mean something like this:
Copy code
metadata:
          *-tableA:
            replication-method: INCREMENTAL
            replication-key: updated_at
u
Yeah that’s the syntax I mean, and you can provide a
*-tableB
with different properties as well, assuming that the DB_NAME will be different but they’ll all have the same tables with the same settings
b
@douwe_maan What if I did not specify the db name in the select statement or put
*-dbname.*
? Also, can I force it to select the default Db assigned to the user used in the connection string?
@douwe_maan Can you help me identifying where the
select
part is passed to
tap-mysql's
code? maybe we can make a workaround
u
What if I did not specify the db name in the select statement or put
*-dbname.*
?
That will work, if you then limit the DBs in the connection string.
Can you help me identifying where the
select
part is passed to
tap-mysql's
code? maybe we can make a workaround
What part are you looking for exactly? Meltano interprets the
select
property (and
--select
flag) and
metadata
property itself to generate a catalog that is then passed to the tap: https://docs.meltano.com/guide/integration#extractor-catalog-generation. The tap then interprets the
selected
,
replication-key
, and
replication-method
catalog metadata. It’ll be cleanest (not requiring changes to tap-mysql) if we can get this to work using the `select`/`metadata` property and/or a way to limit the DB in the connection string passed to
tap-mysql
.
u
Looks like there’s a
database
config: https://hub.meltano.com/extractors/tap-mysql/#database-setting
u
Setting that will be the easiest way to limit then 🙂 And the
select
and
metadata
rules in
meltano.yml
can include
*
where the DB name goes
u
b
What I was thinking of .. modifying the part where tap-mysql gets the select_sql generated and modify it a bit so it gets the
escaped_db
replaced with the
database_name
value in the connection string ..
Copy code
select_sql = f'SELECT {",".join(escaped_columns)} FROM {escaped_db}.{escaped_table}'
https://github.com/transferwise/pipelinewise-tap-mysql/blob/master/tap_mysql/sync_strategies/common.py#L85 @douwe_maan
u
Wouldn’t the database or filter_dbs setting work? That can easily be set to $DB_NAME.
b
not sure honestly .. what I was planning to do in more details:
Copy code
- name: tap-mysql
        config:
          host: $DB_HOST
          port: $DB_PORT
          user: $DB_USER
          database: $DB_NAME
          password: $DB_PASSWORD
        select:
        - "placholder-tableA.*"
        metadata:
          "placeholder-tableA":
            replication-method: INCREMENTAL
            replication-key: updated_at
I would take the value from
database: $DB_NAME
and replace the
placeholder
in the
select
and
metadata
fields .. @douwe_maan
will give it a try using this
Copy code
filter_dbs: $DB_NAME
        select:
        - "*-tableA.*"
        metadata:
          "*-tableA":
            replication-method: INCREMENTAL
            replication-key: updated_at
Copy code
filter_dbs: $DB_NAME
Worked! @douwe_maan Thanks