https://linen.dev logo
#plugins-general
Title
# plugins-general
b

binoy_shah

09/08/2023, 2:23 PM
This question is regarding
tap-postgres
We are trying to pull data from 3rd party Postgres Database and have tried
tap-postgres:meltanolabs
as well as
tap-postgres:wise
The 3rd party table has 100s of columns and we only need ~50 of them but no-matter which version we use, the data pull is always includes all columns Below is the
meltano.yml
section
Copy code
- name: pdr
    inherit_from: tap-postgres
    variant: transferwise
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
    config:
      filter_schemas: 'pdr'
    select:
      - "pdr-mv_participant_all.id"
      - "pdr-mv_participant_all.modified"
      - "pdr-mv_participant_all.participant_id"
      - "pdr-mv_participant_all.participant_origin"
      - "pdr-mv_participant_all.is_ghost_id"
      - "pdr-mv_participant_all.withdrawal_status"
      - "pdr-mv_participant_all.suspension_status"
      - "pdr-mv_participant_all.deceased_status"
      - "pdr-mv_participant_all.deceased_status_date"
      - "pdr-mv_participant_all.test_participant"
      - "pdr-mv_participant_all.addr_state"
      - "pdr-mv_participant_all.hpo"
      - "pdr-mv_participant_all.hpo_id"
      - "pdr-mv_participant_all.organization_id"
      - "pdr-mv_participant_all.site_id"
      - "pdr-mv_participant_all.enrollment_status"
      - "pdr-mv_participant_all.withdrawal_status"
      - "pdr-mv_participant_all.withdrawal_date"
      - "pdr-mv_participant_all.suspension_time"
      - "pdr-mv_participant_all.sign_up_time"
      - "pdr-mv_participant_all.enrollment_member"
      - "pdr-mv_participant_all.enrollment_core_stored"
      - "pdr-mv_participant_all.enrollment_core_ordered"
      - "pdr-mv_participant_all.ehr_status"
      - "pdr-mv_participant_all.ehr_receipt"
      - "pdr-mv_participant_all.ehr_update"
      - "pdr-mv_participant_all.biobank_id"
      - "pdr-mv_participant_all.ubr_overall"
      - "pdr-mv_participant_all.ubr_ethnicity"
      - "pdr-mv_participant_all.ubr_age_at_consent"
      - "pdr-mv_participant_all.ubr_sex"
      - "pdr-mv_participant_all.ubr_sexual_gender_minority"
      - "pdr-mv_participant_all.ubr_income"
      - "pdr-mv_participant_all.ubr_education"
      - "pdr-mv_participant_all.ubr_geography"
      - "pdr-mv_participant_all.ubr_disability"
      - "pdr-mv_participant_module.*"
      - "pdr-mv_participant_biobank_order.*"
      - "pdr-mv_participant_biobank_sample.*"
      - "pdr-mv_participant_race.*"
      - "pdr-mv_mod_overallhealth.*"
      - "pdr-mv_participant_pm.*"
      - "pdr-mv_color_user_event_metrics.*"
      - "pdr-mv_genomic_informing_loop.*"
    metadata:
      '*':
        replication-method: INCREMENTAL
        replication-key: "modified"
        key_properties: "id"
        view-key-properties: [ "id" ]
Important Aspects of ☝️ 1. Only filtered Schema to
pdr
2. Only selected columns for
mv_participant_all
3.
mv_participant_all
is a view in 3rd party database But when we run it via
meltano --log-level=debug elt --transform=skip --state-id=pdr-data --select=pdr-mv_participant_all pdr data-warehouse-pdr --force
It dumps all columns of
mv_participant_all
in snowflake.