Hello! I'd love some help debugging a state issue....
# singer-tap-development
c
Hello! I'd love some help debugging a state issue. I maintain tap-sleeper (built with the sdk) and just started using it with a new meltano project. The first time I run
meltano run tap-sleeper target-duckdb
, it all works flawlessly. On subsequent runs, however, I get the following error:
ValueError: State file contains duplicate entries for partition: {state_partition_context}
, where the matching state values are
Copy code
[
    {
        "context": {
            "current_week": 0,
            "current_season": "2024",
            "league_id": "<league_id>",
            "max_week": 17,
            "replication_week": 17,
        }
    },
    {
        "context": {
            "current_week": 0,
            "current_season": "2024",
            "league_id": "<league_id>",
            "max_week": 17,
            "replication_week": 17,
        },
        "starting_replication_value": None,
    },
]
The issue seems to be with
starting_replication_value
🤔 Anyone know how to fix this?
e
is
league_id
different between those two context dictionaries?
c
league_id
is the same in both context dictionaries. Apologies, I should have made that clearer in my description!
e
Ok so I think the issue is then exactly what the exception describes. The state for the stream in question has duplicate contexts. I'd look at the generated messages on a sync without state to see if indeed the stream is synced more than once with that context.
There may be a missing identifier that needs to come from the parent class
c
Thanks for the response! This sounds promising
There may be a missing identifier that needs to come from the parent class
I'm fairly confident this is user-error, since I created the tap, I'm using parent/child streams, and relying pretty heavily on
context
. I'll take a closer look at adding parent identifiers to the child stream. Thanks!
np 1
Hello @Edgar Ramírez (Arch.dev) - I have one more follow-up question for you. I've been thinking on your comment and looking at my code, but can't seem to identify the issue. This is the problematic stream - are you suggesting that I need to add an identifier from the parent context to each row in the child stream? Appreciate the help!
e
I need to add an identifier from the parent context to each row in the child stream?
rather add a parent ID to the context, but I see you're already passing
league_id
from from
LeagueStream
so it makes me think that the
league
stream is syncing some records twice. Could you share the complete state generated on the first run?
👀 1
Oh, I see you shipped some state changes!
c
Indeed, I gave it a try, but I'm still not sure it's quite right 🤔 I'm going to inspect the
league
stream a bit closer to exhaustively verify that it is not syncing some records twice. If I'm still having trouble, I'll share the state with you. Thank you!
👌 1