John Doe
02/08/2024, 7:52 PMSTATE_MSG_FREQUENCY = 10
is_sorted = True
I'm not utilizing the get_starting_replication_key_value
method, as I'm filtering directly in the params with a date from a configuration file:
params: dict = {
"f_creationDate": f"creationDate:[{start_date} TO {end_date}]",
"orderBy": f"{self.replication_key},asc",
}
Although my Meltano project is printing STATE messages, no state is being saved:
{
"type": "STATE",
"value": {
"bookmarks": {
"orders": {
"replication_key_signpost": "2024-02-08T18:43:24.747564+00:00",
"starting_replication_value": "2023-01-01",
"replication_key": "creationDate",
"replication_key_value": "2023-01-12T03:13:20.0000000+00:00"
}
}
}
}
Using this to run in local: meltano --log-level=debug el tap-vtex target-jsonl --state-id=tap-vtex-to-jsonl
Any insights or suggestions on resolving this issue would be greatly appreciated. Thanks!Edgar Ramírez (Arch.dev)
02/08/2024, 8:27 PMcapabilities
of your tap look like?John Doe
02/08/2024, 8:31 PMplugins:
extractors:
- name: "tap-vtex"
namespace: "tap_vtex"
pip_url: -e .
capabilities:
- state
- catalog
- discover
Edgar Ramírez (Arch.dev)
02/08/2024, 8:34 PMstate
is there and I also see you mentioned it above, sorry about that 😅.
If you're not using get_starting_replication_key_value
, how are you ensuring the tap uses the latest bookmark in the next sync?John Doe
02/08/2024, 8:53 PMJohn Doe
02/08/2024, 10:33 PMget_starting_replication_key_value
method, since it is a very particular case of how the Vtex API works. In this case if we search in a date range (example: 2023-01-01 to 2023-06-30) and if it returns us more than 30 pages, we have to exchange the start_date according to the date that has the last record of the page 30, since the api only allows to retrieve 30 pages and in this way we are "subtracting" the amount of pages that we have left.
Taking this into consideration, the get_starting_replication_key_value
method always returns the date that we initially used, which causes the error: InvalidStreamSortException. That's why I was avoiding using it, since I don't see a way for it to return the date I really need at the moment. Anyway I attach an image of what it looks like, I hope that explains a little more.
But even without the method, I see according to the STATE message that it is taking the date of the previous record to the message. But not sure if that's correct.
This is a very particular situation haha 😪Edgar Ramírez (Arch.dev)
02/09/2024, 5:05 AMstarting_date
and self.current_date
and use the greatest?John Doe
02/09/2024, 3:54 PM