hi i m little confused what the chunk relation to record cou Meltano #troubleshooting

hi i'm little confused what the chunk relation to ...

Michal Ondras

04/17/2024, 12:54 AM

hi i'm little confused what the chunk relation to record count, etc. ultimately i don't want to only pick and send one event here

visch

04/17/2024, 1:40 AM

Depends on what you want. One is from the tap, one is from the target

Michal Ondras

04/17/2024, 2:10 AM

ic. okay right on the logs there. lol. ideally i want to ingest the events from the postgres tap to target snowflake faster than it's current intake. the data have an

id

created_at

columns. i tried changing the incremental load from

created_at

id

but it really hasn't changed anything. any tips? @visch ; i also changed the batch_size on the snowflake end and that's reflected on the current_size in the logs; thanks

visch

04/17/2024, 2:12 AM

I say it because it's not clear what you're after. So you're saying running meltano is too slow. https://github.com/meltano/meltano/issues/6613#issuecomment-1215074973 Has good steps for figuring out what exactly is too slow then you can come back with the data you get here and we can dive deeper

visch

04/17/2024, 2:17 AM

We can make more generalized guesses if you share you meltano.yml but the steps there are more concrete

Michal Ondras

04/17/2024, 2:22 AM

yeah, it's taking over 4hour to run single job

Copy code

version: 1
default_environment: beta

plugins:
  extractors:
  - name: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git@v0.0.8>
    config:
      ssl_enable: true
      ssl_certificate_authority: '/ssl-certs/global-bundle.pem'
      default_replication_method: INCREMENTAL
      filter_schemas: [public]
      start_date: 2024-01-01
    select: [public-events.*, public-failed_events.*]
    schema:
      public-events:
        metadata:
          type:
          - object
          - "null"
      public-failed_events:
        event:
          type:
          - object
    metadata:
      'public-events.*':
        replication-method: INCREMENTAL
        replication-key: id
      'public-failed_events.*':
        replication-method: INCREMENTAL
        replication-key: id

  loaders:
  - name: target-snowflake
    variant: meltanolabs
    config:
      database: MAIN
      default_target_schema: SCRATCH
      batch_size_rows: 150000
      batch_config:
        batch_size: 150000
        encoding:
          format: jsonl
          compression: gzip
        storage:
          root: "file://"

environments:
- name: beta
  config:
    plugins:
      loaders:
      - name: target-snowflake
        config:
          default_target_schema: SCRATCH
- name: production
  config:
    plugins:
      loaders:
      - name: target-snowflake
        config:
          default_target_schema: RAW_PROJECT

jobs:
- name: project
  tasks:
  - tap-postgres target-snowflake

visch

04/17/2024, 10:49 AM

Solid project very clean! I'm curious about the performance numbers now, my guess is your slowness is on the postgres side but getting the data from those steps will tell us more concretely If it's tap postgres then almost certainly it's coming from the query itself so then we'd want to figure out what's taking so long for the query to return data. Is it the database causing slow response times? If so then maybe we are missing some index. Is it the sheer amount of data? Then we need to look at the network and decide if we can move the data differently. Etc

Michal Ondras

04/18/2024, 10:32 PM

so we indexed the postgres database but i still see chunk = 1 on the tap and i duhno what that refers to other then it's pulling one record which is dishearting if it

4 Views

Open in Slack

Previous Next