binoy_shah
09/08/2023, 2:23 PMtap-postgres
We are trying to pull data from 3rd party Postgres Database and have tried tap-postgres:meltanolabs
as well as tap-postgres:wise
The 3rd party table has 100s of columns and we only need ~50 of them but no-matter which version we use, the data pull is always includes all columns
Below is the meltano.yml
section
- name: pdr
inherit_from: tap-postgres
variant: transferwise
pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
config:
filter_schemas: 'pdr'
select:
- "pdr-mv_participant_all.id"
- "pdr-mv_participant_all.modified"
- "pdr-mv_participant_all.participant_id"
- "pdr-mv_participant_all.participant_origin"
- "pdr-mv_participant_all.is_ghost_id"
- "pdr-mv_participant_all.withdrawal_status"
- "pdr-mv_participant_all.suspension_status"
- "pdr-mv_participant_all.deceased_status"
- "pdr-mv_participant_all.deceased_status_date"
- "pdr-mv_participant_all.test_participant"
- "pdr-mv_participant_all.addr_state"
- "pdr-mv_participant_all.hpo"
- "pdr-mv_participant_all.hpo_id"
- "pdr-mv_participant_all.organization_id"
- "pdr-mv_participant_all.site_id"
- "pdr-mv_participant_all.enrollment_status"
- "pdr-mv_participant_all.withdrawal_status"
- "pdr-mv_participant_all.withdrawal_date"
- "pdr-mv_participant_all.suspension_time"
- "pdr-mv_participant_all.sign_up_time"
- "pdr-mv_participant_all.enrollment_member"
- "pdr-mv_participant_all.enrollment_core_stored"
- "pdr-mv_participant_all.enrollment_core_ordered"
- "pdr-mv_participant_all.ehr_status"
- "pdr-mv_participant_all.ehr_receipt"
- "pdr-mv_participant_all.ehr_update"
- "pdr-mv_participant_all.biobank_id"
- "pdr-mv_participant_all.ubr_overall"
- "pdr-mv_participant_all.ubr_ethnicity"
- "pdr-mv_participant_all.ubr_age_at_consent"
- "pdr-mv_participant_all.ubr_sex"
- "pdr-mv_participant_all.ubr_sexual_gender_minority"
- "pdr-mv_participant_all.ubr_income"
- "pdr-mv_participant_all.ubr_education"
- "pdr-mv_participant_all.ubr_geography"
- "pdr-mv_participant_all.ubr_disability"
- "pdr-mv_participant_module.*"
- "pdr-mv_participant_biobank_order.*"
- "pdr-mv_participant_biobank_sample.*"
- "pdr-mv_participant_race.*"
- "pdr-mv_mod_overallhealth.*"
- "pdr-mv_participant_pm.*"
- "pdr-mv_color_user_event_metrics.*"
- "pdr-mv_genomic_informing_loop.*"
metadata:
'*':
replication-method: INCREMENTAL
replication-key: "modified"
key_properties: "id"
view-key-properties: [ "id" ]
Important Aspects of ☝️
1. Only filtered Schema to pdr
2. Only selected columns for mv_participant_all
3. mv_participant_all
is a view in 3rd party database
But when we run it via meltano --log-level=debug elt --transform=skip --state-id=pdr-data --select=pdr-mv_participant_all pdr data-warehouse-pdr --force
It dumps all columns of mv_participant_all
in snowflake.