How can I replicate a table with some offset I have a very l Meltano #singer-taps

How can I replicate a table with some offset? I ha...

Pawel Plaszczak

01/10/2025, 8:45 AM

How can I replicate a table with some offset? I have a very large table and for test purposes would like to only the recent records. I use tap-oracle. I think that replication_key_value variable is for this. For test purposes, I created table PERSONS, and with three records PERSON_ID with values 1,2,3. I wanted to set offset in such a way, that only the record 3 would be replicated. I first tried to set up replication-key-value in meltano.yml, but the variable kept being ignored by meltano. I then tried another method: I saw that I could pass a STATE object to meltano invoke command. So I captured the state (json) from the previous meltano run in a file state.log:

"type":"STATE",

"value":{

"bookmarks":{

"PLASZPA-PERSONS":{

"last_replication_method":"INCREMENTAL",

"replication_key":"PERSON_ID",

"replication_key_value":3

and then I invoked tap-oracle with this state:

meltano invoke tap-oracle -s state.log

But again, the replication-key-value was ignored - all three records with PERSON_ID values 1,2, and 3 got replicated. Is this because this variable is not supported by tap-oracle?

👀 1

Edgar Ramírez (Arch.dev)

01/10/2025, 8:06 PM

I first tried to set up replication-key-value in meltano.yml

what does that

meltano.yml

look like?

Pawel Plaszczak

01/13/2025, 5:55 PM

Here is the content of meltano.yml, with some sensitive data replaced by *: version: 1 default_environment: dev project_id: e3cb5953-cc6c-4546-8311-a4d841f8d5b8 environments: - name: dev - name: staging - name: prod plugins: extractors: - name: tap-github variant: meltanolabs pip_url: git+https://github.com/MeltanoLabs/tap-github.git - name: tap-oracle variant: s7clarke10 pip_url: git+https://github.com/s7clarke10/pipelinewise-tap-oracle.git config: host: * port: 1597 common_service_name: * user: PLASZPA service_name: * # default_replication_method: FULL_TABLE filter_schemas: PLASZPA filter_tables: - PLASZPA-PERSONS metadata: PLASZPA-PERSONS: replication-method: INCREMENTAL replication-key: KW_UPDATE_DATE

Edgar Ramírez (Arch.dev)

01/13/2025, 9:05 PM

I can't see anything wrong with this config or with the tap code. Maybe @Steve Clarke knows.

Pawel Plaszczak

01/14/2025, 8:24 AM

Thank you Edgar. Meanwhile, can you at least confirm that this functionality and these variables (replication-key, replication-key-value) are supported by tap-oracle extractor, and that the expected result should be what I think? (I want to note that in the log I saw an ambiguous warning, which was not an error, and read something like "method possibly not supported by plugin".

Steve Clarke

01/14/2025, 8:46 AM

Hi @Pawel Plaszczak , like @edgar_ramirez_mondragon mentioned I am unsure why the incremental method would not work. You can see by this line https://github.com/s7clarke10/pipelinewise-tap-oracle/blob/53bb75ed27d7796d2f492e74cec87f10f1bce4d4/tap_oracle/sync_strategies/incremental.py#L62 that it supports bookmarks against the stored the last replication key value. Is it a table or a view? Sometimes it doesn’t like it if it is a view. I have seen this with tap sybase. I have sometimes had to override and provide config to say the object is a table to get the incremental working (even though it is a view). I have mentioned this about a year or two ago in posts, a search on the correct config for that might be worth it. Sorry answering this on my phone with limited access to a device to provide a more specific answer.

Steve Clarke

01/14/2025, 8:49 AM

It may be this setting, "is-view": false

Pawel Plaszczak

01/14/2025, 4:35 PM

Hello, Steve, Edgar and all, and thank you so much for the help so far. I have a nice update. It works. I seriously don't know why, because meanwhile I changed my entire configuration. Now I am copying data from remote Oracle to Postgres, and now that I revisited this issue, it works. However, I still have question: With meltano invoke, I can predefine the required state in json file (like shown earlier in my message) and then this works correctly:

$ meltano invoke tap-oracle -s mystate.json

However, if I want to run "meltano run", there is no -s option, so I cannot pass the state. I was only able to ignore the state altogether, like this, and this works:

$ meltano run tap-oracle target-postgres --no-state-update

So, how could I define the required state for the entire etl? Also, alternatively, how does meltano/singer store the state and how could I reset it? I found the file ./.meltano/run/tap-oracle/state.json that looked like the state storage, but deleting this file did not help - obviously the state is stored elsewhere. I also tried various combinations of following two variables in the meltano.yml, like you suggested,: replication_key_value and is_view, but both variables seem to be ignored, no matter what I put there:

metadata:

PLASZPA-PERSONS2:

replication-method: INCREMENTAL

replication-key: KW_UPDATE_DATE

replication_key_value: 2022-12-30T00:00:00.00+00:00

is-view: false

So what would be the correct and recommended way to define the state for the oracle-postgres ETL, so that I can replicate, for instance, only the last five thousand rows from a table of 10 million rows?

Pawel Plaszczak

01/15/2025, 1:13 PM

Update: I found the solution. This command-line option is indeed missing in run command, but it exists in invoke command. Hence, I succeeded with the expected funtionality using this command syntax:

Copy code

$ meltano invoke tap-oracle -s mystate.json|meltano invoke target-postgres

The only caveat is that target-postgres, if invoked like this with invoke, will write to schema melty while, if invoked with "run", it will write to schema tap-oracle. To enforce consistent target schema across both commands, I had to add this variable to meltano.yml:

default_target_schema: tap_oracle

And then finally everything works as expected: I can define the requested start state in mystate.json, and the target table, in the requested schema tap_oracle, gets populated with the data younger than the state. @Steve Clarke and @Edgar Ramírez (Arch.dev), thank you for your support, I am moving on with the project!

Edgar Ramírez (Arch.dev)

01/16/2025, 7:42 PM

I'm glad you figured it out @Pawel Plaszczak! The fact that

-s <state file>

works but

meltano run

doesn't makes me think the plugin in

meltano.yml

is missing

capabilities: [state]

🤔

👍 1

Steve Clarke

01/16/2025, 7:46 PM

Glad to hear you are on the right track 😊 .

13 Views

Open in Slack

Previous Next