Hi All I m wondering if it is possible to resume a job from Meltano #troubleshooting

Hi All, I’m wondering if it is possible to resume ...

drew_ipson

12/09/2021, 4:42 PM

Hi All, I’m wondering if it is possible to resume a job from the latest recorded state. I had a pod die on my mid job run and I’m looking at the state recorded in the system db, which left its job status as running. I know I can pass a state file for a job run using the latest state, but am unsure of the configuration of my state that was recorded in the system db. My state file would look like this:

Copy code

{
  "singer_state": {
    "bookmarks": {
      "public-cost_usage_info": {
        "last_replication_method": "FULL_TABLE",
        "version": 1638986978643,
        "xmin": null
      },
      "public-document_status": {
        "last_replication_method": "INCREMENTAL",
        "replication_key": "status_time",
        "version": 1638986979029,
        "replication_key_value": "2021-12-08T18:09:38.997474+00:00"
      },
      "public-export_job_status": {
        "last_replication_method": "FULL_TABLE",
        "version": 1638993360229,
        "xmin": null
      },
      "public-job_run_status": {
        "last_replication_method": "FULL_TABLE",
        "version": 1638993360541,
        "xmin": null
      },
      "public-post_step_info": {
        "last_replication_method": "FULL_TABLE",
        "version": 1638993361040,
        "xmin": null
      },
      "public-query_job_status": {
        "last_replication_method": "FULL_TABLE",
        "version": 1638993361335,
        "xmin": null
      },
      "public-step_info": {
        "last_replication_method": "FULL_TABLE",
        "version": 1638993361635,
        "xmin": 5978580
      }
    },
    "currently_syncing": "public-step_info"
  }
}

My question is focused on the

public-step_info

table. The pod died with this table

currently_syncing

but the replication method is still full-table. Is the

version

field a time stamp of where it left off with the last recorded batch upload? Or is the

xmin

field used to calculate the last upload time? Will it resume from there? It is a large table and would hate to have to do a refresh, but also don’t want to duplicate data. Context: tap-postgres target-snowflake Any help on the logic regarding state would be extremely helpful!

drew_ipson

12/09/2021, 7:21 PM

I was able to pass in the state file per the documentation and it appears to have picked off from the right place. If there are any explanations on how the state logic works, that would still be appreciated! 😁

edgar_ramirez_mondragon

12/09/2021, 7:59 PM

Hi @drew_ipson! Thanks for sharing and letting us know the kludge worked for you. We certainly are aware that our current state docs are a bit lacking in detail. We do have an issue to improve that so feel free to leave your 👍 , comments or even an MR to the docs (would be nice to let people know of this workaround)

drew_ipson

12/09/2021, 8:11 PM

@edgar_ramirez_mondragon Thank you for this. I’d be happy to contribute. Do you know how the xmin field and version field are used to calculate where the tap should start?

Open in Slack

Previous Next