Hi all, I'm currently using my custom tap-vtex (fr...
# troubleshooting
j
Hi all, I'm currently using my custom tap-vtex (from the Vtex API) extractor in my Meltano project and I've encountered an issue with saving state. It appears that the state is only saved at the end of a successful execution. However, when simulating a network failure by disconnecting my PC from the network, the state isn't saved, and the next run starts from the beginning. It's worth mentioning that I'm using a System Database (PostgreSQL with Docker). My tap has the state set in the capabilities, and I'm also using the following properties in my tap:
Copy code
STATE_MSG_FREQUENCY = 10
is_sorted = True
I'm not utilizing the
get_starting_replication_key_value
method, as I'm filtering directly in the params with a date from a configuration file:
Copy code
params: dict = {
    "f_creationDate": f"creationDate:[{start_date} TO {end_date}]",
    "orderBy": f"{self.replication_key},asc",
}
Although my Meltano project is printing STATE messages, no state is being saved:
Copy code
{
  "type": "STATE",
  "value": {
    "bookmarks": {
      "orders": {
        "replication_key_signpost": "2024-02-08T18:43:24.747564+00:00",
        "starting_replication_value": "2023-01-01",
        "replication_key": "creationDate",
        "replication_key_value": "2023-01-12T03:13:20.0000000+00:00"
      }
    }
  }
}
Using this to run in local:
meltano --log-level=debug el tap-vtex target-jsonl --state-id=tap-vtex-to-jsonl
Any insights or suggestions on resolving this issue would be greatly appreciated. Thanks!
1
e
hi @John Doe! what do the
capabilities
of your tap look like?
j
Hey @Edgar Ramírez (Arch.dev), Are you referring to the meltano.yml configuration for the tap? If so, here's the relevant section:
Copy code
plugins:
  extractors:
  - name: "tap-vtex"
    namespace: "tap_vtex"
    pip_url: -e .
    capabilities:
    - state
    - catalog
    - discover
e
Yeah. I see
state
is there and I also see you mentioned it above, sorry about that 😅. If you're not using
get_starting_replication_key_value
, how are you ensuring the tap uses the latest bookmark in the next sync?
j
That's a good question, I understood that the method only served to make filters in the API (in my case the f_creationDate start_date) Let me add it, test it and let you know what happens 😅
😅 1
I hope I can explain why I was not using the
get_starting_replication_key_value
method, since it is a very particular case of how the Vtex API works. In this case if we search in a date range (example: 2023-01-01 to 2023-06-30) and if it returns us more than 30 pages, we have to exchange the start_date according to the date that has the last record of the page 30, since the api only allows to retrieve 30 pages and in this way we are "subtracting" the amount of pages that we have left. Taking this into consideration, the
get_starting_replication_key_value
method always returns the date that we initially used, which causes the error: InvalidStreamSortException. That's why I was avoiding using it, since I don't see a way for it to return the date I really need at the moment. Anyway I attach an image of what it looks like, I hope that explains a little more. But even without the method, I see according to the STATE message that it is taking the date of the previous record to the message. But not sure if that's correct. This is a very particular situation haha 😪
e
Ah gotcha. Yeah, date range pagination has been consistently troublesome and I'm yet to come up with the right abstractions to make devs lives easier. Maybe you can compare
starting_date
and
self.current_date
and use the greatest?
j
Hey @Edgar Ramírez (Arch.dev) ! I did what you said and looks like we have good news 👀 Thanks!!
🙌 1