hi i'm little confused what the chunk relation to ...
# troubleshooting
m
hi i'm little confused what the chunk relation to record count, etc. ultimately i don't want to only pick and send one event here
v
Depends on what you want. One is from the tap, one is from the target
m
ic. okay right on the logs there. lol. ideally i want to ingest the events from the postgres tap to target snowflake faster than it's current intake. the data have an
id
,
created_at
columns. i tried changing the incremental load from
created_at
to
id
but it really hasn't changed anything. any tips? @visch ; i also changed the batch_size on the snowflake end and that's reflected on the current_size in the logs; thanks
v
I say it because it's not clear what you're after. So you're saying running meltano is too slow. https://github.com/meltano/meltano/issues/6613#issuecomment-1215074973 Has good steps for figuring out what exactly is too slow then you can come back with the data you get here and we can dive deeper
We can make more generalized guesses if you share you meltano.yml but the steps there are more concrete
m
yeah, it's taking over 4hour to run single job
Copy code
version: 1
default_environment: beta

plugins:
  extractors:
  - name: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git@v0.0.8>
    config:
      ssl_enable: true
      ssl_certificate_authority: '/ssl-certs/global-bundle.pem'
      default_replication_method: INCREMENTAL
      filter_schemas: [public]
      start_date: 2024-01-01
    select: [public-events.*, public-failed_events.*]
    schema:
      public-events:
        metadata:
          type:
          - object
          - "null"
      public-failed_events:
        event:
          type:
          - object
    metadata:
      'public-events.*':
        replication-method: INCREMENTAL
        replication-key: id
      'public-failed_events.*':
        replication-method: INCREMENTAL
        replication-key: id

  loaders:
  - name: target-snowflake
    variant: meltanolabs
    config:
      database: MAIN
      default_target_schema: SCRATCH
      batch_size_rows: 150000
      batch_config:
        batch_size: 150000
        encoding:
          format: jsonl
          compression: gzip
        storage:
          root: "file://"

environments:
- name: beta
  config:
    plugins:
      loaders:
      - name: target-snowflake
        config:
          default_target_schema: SCRATCH
- name: production
  config:
    plugins:
      loaders:
      - name: target-snowflake
        config:
          default_target_schema: RAW_PROJECT

jobs:
- name: project
  tasks:
  - tap-postgres target-snowflake
v
Solid project very clean! I'm curious about the performance numbers now, my guess is your slowness is on the postgres side but getting the data from those steps will tell us more concretely If it's tap postgres then almost certainly it's coming from the query itself so then we'd want to figure out what's taking so long for the query to return data. Is it the database causing slow response times? If so then maybe we are missing some index. Is it the sheer amount of data? Then we need to look at the network and decide if we can move the data differently. Etc
m
so we indexed the postgres database but i still see chunk = 1 on the tap and i duhno what that refers to other then it's pulling one record which is dishearting if it