Hey everyone wave Happy Monday Question regarding the <https Meltano #troubleshooting

Hey everyone :wave: Happy Monday! Question regardi...

Stéphane Burwash

09/30/2024, 3:40 PM

Hey everyone 👋 Happy Monday! Question regarding the metadata parameter. Currently, we're trying to set flexible on our taps (full-table or incremental, depending on the use case). As mentioned in the docs, this can be done by using the

metadata

parameter. However, we found that within the

tap-stream

definition itself,

replication_method

parameter took precedence over the

metadata

config, meaning that we could not set a default

replication-method

and

replication-key

. My question is: Is it possible to set a default replication-method & replication-key which could then be overwritten in the parameters? Thanks 😄

Stéphane Burwash

09/30/2024, 3:54 PM

Update: Linked to logic https://github.com/meltano/sdk/blob/f49f25b1cd530c01d885c50ffac9656ae4ea9c9a/singer_sdk/streams/core.py#L260 Where no timestamp is returned in incremental syncs

Edgar Ramírez (Arch.dev)

09/30/2024, 3:59 PM

Hey Stéphane!

Where no timestamp is returned in incremental syncs

Hmm, is that the behavior you're seeing? I think on the contrary, we return

None

only on Full-Table syncs. Are you hardcoding the _`replication_method`_ attribute in your streams? If not, the default replication method is derived from the presence or absence of a replication key: https://github.com/meltano/sdk/blob/f49f25b1cd530c01d885c50ffac9656ae4ea9c9a/singer_sdk/streams/core.py#L658-L660 Otherwise, the catalog override is respected: https://github.com/meltano/sdk/blob/f49f25b1cd530c01d885c50ffac9656ae4ea9c9a/singer_sdk/streams/core.py#L656-L657 The replication key also respects any overrides: https://github.com/meltano/sdk/blob/f49f25b1cd530c01d885c50ffac9656ae4ea9c9a/singer_sdk/streams/core.py#L1270-L1271

Stéphane Burwash

09/30/2024, 4:01 PM

We hardcode the replication method in our steams ✅ yes as a default:

Copy code

name = "tasks"
    path = "/v4/tasks"
    primary_keys = ["id"]
    records_jsonpath = "$.data[*]"
    replication_method = "INCREMENTAL"
    replication_key = "updatedDate"
    schema = Tasks.schema

And wanted a quick way to modify to

FULL_TABLE

while getting the starting_timestamp from the

start_date

IF possible. Is it not best practice to hardcode the replication method?

Edgar Ramírez (Arch.dev)

09/30/2024, 4:09 PM

Gotcha. So yeah, if you hardcode the

replication_method

attribute, you won't be able to override it with the catalog because you lose these: https://github.com/meltano/sdk/blob/f49f25b1cd530c01d885c50ffac9656ae4ea9c9a/singer_sdk/streams/core.py#L656-L657

And wanted a quick way to modify to
FULL_TABLE
while getting the starting_timestamp from the
start_date
IF possible.

I believe a

--full-refresh

might give you that: state is ignored and the configured

start_date

is used. But you might've already explored that option.

Stéphane Burwash

09/30/2024, 4:10 PM

Sadly

--full-refresh

doesn't work that well in production for me 😅 so I'll check removing the hard-code, thanks!

Edgar Ramírez (Arch.dev)

09/30/2024, 4:13 PM

Sadly
--full-refresh
doesn't work that well in production for me

For my curiosity: is it due to constraints in your prod environment, or is it some aspect of Meltano that prohibits it? Gotta say, even if you remove the hard-code the default behavior is to ignore

start_date

for

FULL_TABLE

, you might wanna override

get_starting_replication_key_value

Stéphane Burwash

09/30/2024, 4:15 PM

For my curiosity: is it due to constraints in your prod environment, or is it some aspect of Meltano that prohibits it?

It requires that we update the CLI command structure when running in production, which at this time is tricky. We'll be working to make it more flexible in the future however 😄

👍 1

Stéphane Burwash

09/30/2024, 4:16 PM

Thanks for all the help! I should have all the information I need to create a half-descent production workflow then 😉

🙌 1

Edgar Ramírez (Arch.dev)

09/30/2024, 4:16 PM

It requires that we update the CLI command structure when running in production, which at this time is tricky. We'll be working to make it more flexible in the future however

Would it help if there was env var, e.g.

MELTANO_RUN_FULL_REFRESH

that you could use instead of the CLI flag?

Stéphane Burwash

09/30/2024, 4:17 PM

Yes that would definitely be cool for us

Edgar Ramírez (Arch.dev)

09/30/2024, 4:28 PM

https://github.com/meltano/meltano/issues/8816

Open in Slack

Previous Next