Hi Team, I am looking for documentation from wher...
# documentation
s
Hi Team, I am looking for documentation from where I can find out how Meltano handle schema changes, need answers to below mentioned questions? • what will happen if a new table gets added? • what will happen if a new column gets added? • what will happen if a table is renamed? • what will happen if a column is renamed? • what will happen if a column is removed? • what will happen if a table is removed? • what will happen if a row is deleted? • what will happen if a row is updated? • How to select and unselect columns and tables
e
Depends on the extractor and loader, but 1. A new stream will be added to the catalog and synced 2. It will be added to the catalog too and synced accordingly 3. It will be added as a new stream and synced to a table with the new name. No automatic renaming in the destination would occur. 4. Similar to above 5. The column will no longer be synced and it will not be removed in the destination 6. Similar to above 7. Unless the tap supports
LOG_BASED
replication, the row deletion will not be communicated to the loader 8. The row will be upserted or appended in the destination 9. https://docs.meltano.com/concepts/plugins/#select-extra
h
@Edgar Ramírez (Arch.dev) answer is correct. I just wanted to provide a slightly more detailed reply: In meltano you have the option to integrate data with wildcard matching as well as explicit mapping. There is also facility to rename stream elements (stream itself or message attributes) & perform transformations on individual messages (stream maps). Combining all of the above, you'll be able to handle scenarios listed above. Adding things / Selecting things:
what will happen if a new table gets added?
what will happen if a new column gets added?
How to select and unselect columns and tables
it depends. if your tap's select block looks like below:
Copy code
- name: my-tap
  select:
    - *
then all tables will get selected. however, if you have the select block specified as below:
Copy code
- name: my-tap
  select:
    - table_1.*
    - table_2.id
    - table_2.name
    - table_2.email
    - -table_2.social_insurance_number
Then only
table_1
& specified fields from
table_2
will be selected. New columns added to
table_1
will automatically get added. However, new columns added to
table_2
will not be added automatically. The wildcard matching is very powerful. You can automatically select specific columns based on their name pattern by judicious use of the wildcard. For instance,
user_*
will automatically select fields
user_name
,
user_email
&
user_phone
. Another thing shown above is the
-
operator. In our example, we are explicitly stating that
table_2.social_insurance_number
is not to be transferred. Hopefully this answers your 1st, 2nd & last questions as well. Renaming Things:
what will happen if a table is renamed?
what will happen if a column is renamed?
Meltano won't automatically detect & handle this for you. There isn't a "standard" way to detect & handle scenarios where tables / columns get renamed. Meltano provides abstractions that allow you to handle these scenarios gracefully. Taps built on the meltano-sdk have the ability to map data at the stream level or the column level. For older taps, you can also use the standalone mapper and pipe the data through it. Rather than
meltano run my-tap my-target
you'd invoke meltano as
meltano run my-tap my-mapper my-target
You can I won't repeat what's already been stated in the official documentation. Please see Inline Data Mapping | Meltano Documentation.
what will happen if a column is removed?
what will happen if a table is removed?
Meltano would think of the existence of tables / columns in the source system, but not the removal interaction (i.e. table/column exists -> table/column does not exist) Meltano performs a discovery step before extracting data from a source system. This discovery step allows meltano to discover the schema of the source system. It will extract the intersection of the set of streams / columns that are present in both the discovered schema as well as the selection specification, so the removed columns will show up as nulll and the removed tables will stop getting extracted. The very recent versions of meltano will also warn you if you are selecting attributes that are not present in the discovered schema (humble brag: I added that functionality)
what will happen if a row is deleted?
what will happen if a row is updated?
This depends on how you are configuring your data ingestions & your source systems. Incremental ingestion will allow you to select records that have been updated since the last ingestion, and meltano can support incremental ingestion if the source system (and the tap) supports it. However, incremental ingestion will not just by itself be able to detect deletions. Some auxiliary mechanism must be implemented to handle them. A few popular ones are: • soft deletion: the source system has a column
deleted_at
that tracks when the record is deleted. This is integrated to the target system. The source system can then delete this record. Typically the source system would keep the deleted record for a sufficiently long bufffer period for all applications feeding on its data to be able to reflect deletions. • deletion streams: whenever a record is deleted, the deletion is recorded in a separate table, and that is integrated into the downstream table. if you're performing a full replication, then you are able to determine deleted / updated records by comparing the most recent ingested data vs the prior ingested data. If you're doing log based replication, then you can have the benefit of both incremental integration while being able to track deletions. Hope this helps.
👍 1
s
thank you guys for detailed reply
np 1
@Edgar Ramírez (Arch.dev), @haleemur_ali I tried meltano and results weren't that you described Please look at my findings • what will happen if a new table gets added? ◦ new table was not added in the target • what will happen if a new column gets added? ◦ column was not added in target • what will happen if a table is renamed? • gave we error when table rename ◦
Run invocation could not be completed as block failed: Extractor failed
• what will happen if a column is renamed? ◦ removed the data from existing column only data for deleted row was not removed. New column with rename was not added • what will happen if a column is removed ◦ removed the existing data from the column, only data for deleted row was not removed. • what will happen if a table is removed? • what will happen if a row is deleted? ◦ row not deleted nor any soft deletes • what will happen if a row is updated? ◦ It will updated the existing record and won’t create new record My
meltano.yml
file:
Copy code
version: 1
default_environment: dev
project_id: 1773fe3c-5afb-4699-8776-2fd9fd5ee05e
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
    config:
      host: localhost
      port: 5432
      database: postgres
      user: myuser
      default_replication_method: INCREMENTAL
      filter_schemas:
      - public
  loaders:
  - name: target-postgres
    variant: meltanolabs
    pip_url: meltanolabs-target-postgres
    config:
      database: crediblex
      host: localhost
      user: myuser
      port: 5432
jobs:
- name: el
  tasks:
  - tap-postgres target-postgres
I am using this command to run
meltano run el
am I missing something ?
v
yes, you don't have anything selected
h
sorry i forgot about this thread - was on holidays most of June, But yes, it looks like you're missing a
select
parameter in your tap config. Really hoping you were able to unblock yourself
Copy code
plugins:
  extractors:
  - name: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
    config:
      host: localhost
      port: 5432
      database: postgres
      user: myuser
      default_replication_method: INCREMENTAL
      filter_sc
    select:
    - public-my_table.* # this selects everything from table public.my_table, note the dash (-) between the schema and table name.