Happy Monday! I have some challenges with my melta...
# troubleshooting
m
Happy Monday! I have some challenges with my meltano job. I'm running a relatively simple tap-mysql target-snowflake taks for 4 different tables. For some reason a big table with 30k is not fully copied. Only about 1.5k records are copied. What might be an issue? This same job run to target-jsonl copies all 30k records just fine. TA,
In the log I found that everything below the selected line is present in target, everything above is missing. So effectively I have only 1.5k of the most recent records
a
I'm not familiar with snowflake, and almost certain you will have checked this. But is this possible a primary key issue? Do you have a unique key for all 30k rows? target-jsonl will just append all results regardless of PK.
e
Might also be worth trying to sync with
--full-refresh
.
m
No luck so far, as per the log - all the records are listed, but only the last 1500 are present in snowflake. I wonder if it's somehow related to the issue described here: https://community.snowflake.com/s/article/No-data-visible-in-destination-table-after-successful-ingestion-job
a
Do you have some other kind of db you could test with? Could you spin up postgres locally via docker?
m
So, do you suggest to send data to postgress rather than to snowflake? Sounds like a good idea, I will try this next
a
Yes sorry, using a different target. Testing with target-jsonl is helpful, but testing with some other kind of db closer to snowflake will at least allow you to eliminate PK/state/upsert vs append issues and narrow it down to snowflake itself with the issue.
Some docs to get local postgres running on docker: https://docs.meltano.com/getting-started/part2/#getting-your-target-ready
m
So, I added target-postgres as per the instructions, and I got expected 31500 records. So it all points to a problem in target-snowflake.
Looks like the setting I was looking for was hard_delete: false. Once added to my config, everything works as expected. The question remains, what is this setting exactly? It's not well documented in meltano's target-snowflake. (I'm using full table replication)
a
Are you working on the wise variant of snowflake then? I should have asked https://hub.meltano.com/loaders/target-snowflake--transferwise/#hard_delete-setting
m
No, I work in meltano variant, still this setting made a difference. Here's my config:
Copy code
environments:
- name: dev
  config:
    plugins:
      extractors:
      - name: tap-mysql
        config:
          host: localhost
          port: 3306
          user: loader
          database: generated_data
        metadata:
          '*':
            replication-method: FULL_TABLE
            # replication-method: INCREMENTAL
            # replication-key: id
        select:
        - generated_data-flight.*
      loaders:
      - name: target-snowflake
        config:
          account: *********
          database: DWH_SALESDEMO
          schema: RAW_TECHLOG
          user: MELTANO
          warehouse: SALESDEMO_ELT
          role: DATALOADER
          default_target_schema: RAW_TECHLOG
          file_format: DWH_SALESDEMO.RAW_TECHLOG.MELTANO_CSV
          hard_delete: false
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-mysql
    variant: transferwise
    pip_url: pipelinewise-tap-mysql
  loaders:
  - name: target-snowflake
    variant: meltanolabs
    pip_url: meltanolabs-target-snowflake
  - name: target-jsonl
    variant: andyh1203
    pip_url: target-jsonl
  - name: target-postgres
    variant: meltanolabs
    pip_url: meltanolabs-target-postgres~=0.0.7
    config:
      user: meltano
      database: postgres
      add_record_metadata: true
      host: localhost
a
That might be a generic feature of the target SDK then: https://sdk.meltano.com/en/latest/capabilities.html#singer_sdk.helpers.capabilities.TargetCapabilities.HARD_DELETE But it's not documented on the hub by the looks of it