Hey everyone .. I used <tap-mysql> to test the ssh...
# troubleshooting
b
Hey everyone .. I used tap-mysql to test the ssh connection support, I ran into couple issues: 1. I configured it this way:
Copy code
version: 1
default_environment: dev
project_id: b5338889-5f01-402a-bfe5-77bc53628db9
environments:
- name: dev
  config:
    plugins:
      extractors:
      - name: tap-mysql
        select:
        - '*-table_A.*'
        metadata:
          '*-table_A':
            replication-method: INCREMENTAL
            replication-key: updated_at
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-mysql
    namespace: tap_mysql
    pip_url: git+<https://github.com/MeltanoLabs/tap-mysql.git>
    capabilities:
    - state
    - catalog
    - discover
    - about
    - stream-maps
    settings:
    - name: host
    - name: port
      kind: integer
    - name: user
    - name: password
      kind: password
    - name: database
    - name: sqlalchemy_url
      kind: password
    - name: ssh_tunnel.private_key
      kind: password
    - name: ssh_tunnel.private_key_password
      kind: password
    - name: ssh_tunnel.host
    - name: ssh_tunnel.username
    - name: ssh_tunnel.port
    config:
      host: <HOST>
      port: 3306
      password: <PASS>
      user: <USER>
      database: <DB>
      ssh_tunnel:
        enable: true
        host: <SSH-HOST>
        username: <SSH-USER>
        port: 22
  loaders:
  - name: target-s3-csv
    variant: transferwise
    pip_url: git+<https://github.com/transferwise/pipelinewise-target-s3-csv.git>
◦ Ran the following commands with no output (hangs/stays forever): a.
meltano --environment=dev select --list tap-mysql
b.
meltano --environment=dev config tap-mysql test
The old way I used tap-mysql (Transferwise variant) I used
filter_dbs: $DB_NAME
because on the DB name I have
-
like
db-name
and this way it worked when I defined the table name in the
select
this way
- "*-game_sessions.*"
.. not sure if the new tap-mysql (with ssh support) supports
filter_dbs
or not, if not, any other ways to deal with the db name issue? CC @visch @Will Da Silva (Arch) @taylor
v
hmm, it's probably during discovery you must have a ton of databases/tables and the information schema query takes a long time
b
that's why I guess db_filters was helping (The old tap-mysql) to only target specific db
v
We are swapping from
pymyssql
to the standard myssql cllient in the latest PR which will speed up things, but I"m guessing we'll still have this issue
How long have you left it running for?
b
6 mins .. still going
I'll run the etl command then, and see what happens .. but is the select part correct? I guess its not .. since I have a
-
in the table name
I went with this way:
Copy code
select:
        - "*-table_A.*"
But I had to add filter_dbs .. not sure if this version supports this or not?
v
First, thank you for running the tap, we'll try to figure out what we can do here! Ideally I could test for this kind of thing too not sure how to write a quick test for that so we'll have to just make up a solution (if filter_dbs work we should probably go after that)
I have to check, one sec @bassel
b
Take your time
v
Copy code
visch@DESKTOP-9BDPA9T:~/git/tap-mysql$ meltano select tap-mysql --list
2023-08-16T15:12:20.323842Z [info     ] The default environment 'test' will be ignored for `meltano select`. To configure a specific environment, please use the option `--environment=<environment name>`.
Legend:
        selected
        excluded
        automatic

Enabled patterns:
        *-test*.*

Selected attributes:
        [automatic] melty-test_replication_key.id
        [selected ] melty-test_replication_key.name
        [selected ] melty-test_replication_key.updated_at
visch@DESKTOP-9BDPA9T:~/git/tap-mysql$ cat meltano.yml
version: 1
send_anonymous_usage_stats: true
project_id: tap-mysql
default_environment: test
environments:
- name: test
plugins:
  extractors:
  - name: tap-mysql
    namespace: tap_mysql
    pip_url: -e .
    capabilities:
    - state
    - catalog
    - discover
    - about
    - stream-maps
    settings:
    - name: host
    - name: port
      kind: integer
    - name: user
    - name: password
      kind: password
    - name: database
    - name: sqlalchemy_url
      kind: password
    - name: ssh_tunnel.private_key
      kind: password
    - name: ssh_tunnel.private_key_password
      kind: password
    - name: ssh_tunnel.host
    - name: ssh_tunnel.username
    - name: ssh_tunnel.port
    config:
      sqlalchemy_url: "<mysql+pymysql://root:password@localhost:3306/melty>"
    select:
    - "*-test*.*"
  loaders:
  - name: target-jsonl
    variant: andyh1203
    pip_url: target-jsonl
Yes the dbname is included in the select so your current setup should eventually work. By default we pull from every database so you're almost certainly right that filter_dbs would help us here
b
filter_dbs can be used in this version? If not, then our case will not work 😕😕😕
My current stup references to the db name as * so filter_dbs will help to target the db I want
v
I'm looking into this couple things in the SDK I'm trying to work through
I'll have to come back to this, I'll put in an issue though into tap-mysql still curious how long it takes to run
I can verify we pull the catalog for all DB's right now though regardless of your select / sqlalchemy setup