Syed Hamza Raza Kazmi
05/08/2024, 8:18 AMEdgar Ramírez (Arch.dev)
05/13/2024, 2:21 PMLOG_BASED
replication, the row deletion will not be communicated to the loader
8. The row will be upserted or appended in the destination
9. https://docs.meltano.com/concepts/plugins/#select-extrahaleemur_ali
05/13/2024, 3:43 PMwhat will happen if a new table gets added?
what will happen if a new column gets added?
How to select and unselect columns and tablesit depends. if your tap's select block looks like below:
- name: my-tap
select:
- *
then all tables will get selected.
however, if you have the select block specified as below:
- name: my-tap
select:
- table_1.*
- table_2.id
- table_2.name
- table_2.email
- -table_2.social_insurance_number
Then only table_1
& specified fields from table_2
will be selected. New columns added to table_1
will automatically get added. However, new columns added to table_2
will not be added automatically. The wildcard matching is very powerful. You can automatically select specific columns based on their name pattern by judicious use of the wildcard. For instance, user_*
will automatically select fields user_name
, user_email
& user_phone
.
Another thing shown above is the -
operator. In our example, we are explicitly stating that table_2.social_insurance_number
is not to be transferred.
Hopefully this answers your 1st, 2nd & last questions as well.
Renaming Things:
what will happen if a table is renamed?
what will happen if a column is renamed?Meltano won't automatically detect & handle this for you. There isn't a "standard" way to detect & handle scenarios where tables / columns get renamed. Meltano provides abstractions that allow you to handle these scenarios gracefully. Taps built on the meltano-sdk have the ability to map data at the stream level or the column level. For older taps, you can also use the standalone mapper and pipe the data through it. Rather than
meltano run my-tap my-target
you'd invoke meltano as meltano run my-tap my-mapper my-target
You can I won't repeat what's already been stated in the official documentation. Please see Inline Data Mapping | Meltano Documentation.
what will happen if a column is removed?
what will happen if a table is removed?Meltano would think of the existence of tables / columns in the source system, but not the removal interaction (i.e. table/column exists -> table/column does not exist) Meltano performs a discovery step before extracting data from a source system. This discovery step allows meltano to discover the schema of the source system. It will extract the intersection of the set of streams / columns that are present in both the discovered schema as well as the selection specification, so the removed columns will show up as nulll and the removed tables will stop getting extracted. The very recent versions of meltano will also warn you if you are selecting attributes that are not present in the discovered schema (humble brag: I added that functionality)
what will happen if a row is deleted?
what will happen if a row is updated?This depends on how you are configuring your data ingestions & your source systems. Incremental ingestion will allow you to select records that have been updated since the last ingestion, and meltano can support incremental ingestion if the source system (and the tap) supports it. However, incremental ingestion will not just by itself be able to detect deletions. Some auxiliary mechanism must be implemented to handle them. A few popular ones are: • soft deletion: the source system has a column
deleted_at
that tracks when the record is deleted. This is integrated to the target system. The source system can then delete this record. Typically the source system would keep the deleted record for a sufficiently long bufffer period for all applications feeding on its data to be able to reflect deletions.
• deletion streams: whenever a record is deleted, the deletion is recorded in a separate table, and that is integrated into the downstream table.
if you're performing a full replication, then you are able to determine deleted / updated records by comparing the most recent ingested data vs the prior ingested data.
If you're doing log based replication, then you can have the benefit of both incremental integration while being able to track deletions.
Hope this helps.Syed Hamza Raza Kazmi
05/14/2024, 5:32 AMSyed Hamza Raza Kazmi
06/05/2024, 10:07 AMRun invocation could not be completed as block failed: Extractor failed
• what will happen if a column is renamed?
◦ removed the data from existing column only data for deleted row was not removed. New column with rename was not added
• what will happen if a column is removed
◦ removed the existing data from the column, only data for deleted row was not removed.
• what will happen if a table is removed?
• what will happen if a row is deleted?
◦ row not deleted nor any soft deletes
• what will happen if a row is updated?
◦ It will updated the existing record and won’t create new record
My meltano.yml
file:
version: 1
default_environment: dev
project_id: 1773fe3c-5afb-4699-8776-2fd9fd5ee05e
environments:
- name: dev
- name: staging
- name: prod
plugins:
extractors:
- name: tap-postgres
variant: meltanolabs
pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
config:
host: localhost
port: 5432
database: postgres
user: myuser
default_replication_method: INCREMENTAL
filter_schemas:
- public
loaders:
- name: target-postgres
variant: meltanolabs
pip_url: meltanolabs-target-postgres
config:
database: crediblex
host: localhost
user: myuser
port: 5432
jobs:
- name: el
tasks:
- tap-postgres target-postgres
I am using this command to run
meltano run el
am I missing something ?visch
08/07/2024, 12:53 PMhaleemur_ali
08/07/2024, 1:30 PMselect
parameter in your tap config. Really hoping you were able to unblock yourself
plugins:
extractors:
- name: tap-postgres
variant: meltanolabs
pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
config:
host: localhost
port: 5432
database: postgres
user: myuser
default_replication_method: INCREMENTAL
filter_sc
select:
- public-my_table.* # this selects everything from table public.my_table, note the dash (-) between the schema and table name.