Hey everyone :wave: Happy Monday! Question regardi...
# troubleshooting
s
Hey everyone 👋 Happy Monday! Question regarding the metadata parameter. Currently, we're trying to set flexible on our taps (full-table or incremental, depending on the use case). As mentioned in the docs, this can be done by using the
metadata
parameter. However, we found that within the
tap-stream
definition itself,
replication_method
parameter took precedence over the
metadata
config, meaning that we could not set a default
replication-method
and
replication-key
. My question is: Is it possible to set a default replication-method & replication-key which could then be overwritten in the parameters? Thanks 😄
Update: Linked to logic https://github.com/meltano/sdk/blob/f49f25b1cd530c01d885c50ffac9656ae4ea9c9a/singer_sdk/streams/core.py#L260 Where no timestamp is returned in incremental syncs
e
Hey Stéphane!
Where no timestamp is returned in incremental syncs
Hmm, is that the behavior you're seeing? I think on the contrary, we return
None
only on Full-Table syncs. Are you hardcoding the _`replication_method`_ attribute in your streams? If not, the default replication method is derived from the presence or absence of a replication key: https://github.com/meltano/sdk/blob/f49f25b1cd530c01d885c50ffac9656ae4ea9c9a/singer_sdk/streams/core.py#L658-L660 Otherwise, the catalog override is respected: https://github.com/meltano/sdk/blob/f49f25b1cd530c01d885c50ffac9656ae4ea9c9a/singer_sdk/streams/core.py#L656-L657 The replication key also respects any overrides: https://github.com/meltano/sdk/blob/f49f25b1cd530c01d885c50ffac9656ae4ea9c9a/singer_sdk/streams/core.py#L1270-L1271
s
We hardcode the replication method in our steams yes as a default:
Copy code
name = "tasks"
    path = "/v4/tasks"
    primary_keys = ["id"]
    records_jsonpath = "$.data[*]"
    replication_method = "INCREMENTAL"
    replication_key = "updatedDate"
    schema = Tasks.schema
And wanted a quick way to modify to
FULL_TABLE
while getting the starting_timestamp from the
start_date
IF possible. Is it not best practice to hardcode the replication method?
e
Gotcha. So yeah, if you hardcode the
replication_method
attribute, you won't be able to override it with the catalog because you lose these: https://github.com/meltano/sdk/blob/f49f25b1cd530c01d885c50ffac9656ae4ea9c9a/singer_sdk/streams/core.py#L656-L657
And wanted a quick way to modify to
FULL_TABLE
while getting the starting_timestamp from the
start_date
IF possible.
I believe a
--full-refresh
might give you that: state is ignored and the configured
start_date
is used. But you might've already explored that option.
s
Sadly
--full-refresh
doesn't work that well in production for me 😅 so I'll check removing the hard-code, thanks!
e
Sadly
--full-refresh
doesn't work that well in production for me
For my curiosity: is it due to constraints in your prod environment, or is it some aspect of Meltano that prohibits it? Gotta say, even if you remove the hard-code the default behavior is to ignore
start_date
for
FULL_TABLE
, you might wanna override
get_starting_replication_key_value
.
s
For my curiosity: is it due to constraints in your prod environment, or is it some aspect of Meltano that prohibits it?
It requires that we update the CLI command structure when running in production, which at this time is tricky. We'll be working to make it more flexible in the future however 😄
👍 1
Thanks for all the help! I should have all the information I need to create a half-descent production workflow then 😉
🙌 1
e
It requires that we update the CLI command structure when running in production, which at this time is tricky. We'll be working to make it more flexible in the future however
Would it help if there was env var, e.g.
MELTANO_RUN_FULL_REFRESH
that you could use instead of the CLI flag?
s
Yes that would definitely be cool for us
e