Hi I'm facing an error with a child stream in `tap...
# troubleshooting
p
Hi I'm facing an error with a child stream in
tap-jira
. While the parent stream gets the unique records, the child stream is fetching duplicate records. How can I tackle this?
r
Can you share some logs or the code?
p
here you can see, from url, that the record is duplicated
r
You might have to filter duplicates out in
parse_response
if the parent or child stream is erroneously returning multiple records of the same ID - although you might want to consider why this is happening (e.g. is the API response wrong, is a comment with the same ID as another an edit of the same comment).
p
thanks for the reply! The parent stream is working alright, with non duplicated records. The api endpoint is also giving the right response, but the child stream is duplicating records
r
Can you share your child stream class then?
p
Copy code
class IssueComments(JiraStream):

    """
    <https://developer.atlassian.com/cloud/jira/platform/rest/v3/api-group-issue-comments/#api-rest-api-3-issue-issueidorkey-comment-get>
    """

    """
    name: stream name
    path: path which will be added to api url in client.py
    schema: instream schema
    primary_keys = primary keys for the table
    replication_key = datetime keys for replication
    records_jsonpath = json response body
    """

    name = "issue_comments"

    parent_stream_type = IssueStream

    ignore_parent_replication_keys = True

    path = "/issue/{issue_id}/comment"

    primary_keys = ["id"]

    records_jsonpath = "$[comments][*]"

    instance_name = "comments"

    schema = PropertiesList(
        Property("id", StringType),
        Property("issueId", StringType),
        Property("self", StringType),
        Property(
            "author",
            ObjectType(
                Property("accountId", StringType),
                Property("self", StringType),
                Property("displayName", StringType),
                Property("active", BooleanType),
            ),
        ),
        Property("created", DateTimeType),
        Property("updated", DateTimeType),
        Property(
            "body",
            ObjectType(
                Property("type", StringType),
                Property("version", IntegerType),
                Property(
                    "content",
                    ArrayType(
                        ObjectType(
                            Property("type", StringType),
                            Property(
                                "content",
                                ArrayType(
                                    ObjectType(
                                        Property("type", StringType),
                                        Property("text", StringType),
                                    )
                                ),
                            ),
                        )
                    ),
                ),
            ),
        ),
        Property(
            "updateAuthor",
            ObjectType(
                Property("accountId", StringType),
                Property("self", StringType),
                Property("displayName", StringType),
                Property("active", BooleanType),
            ),
        ),
    ).to_dict()

    def post_process(self, row:dict, context:dict) -> dict:
        row["issueId"] = context["issue_id"]
        return row
I also observed now that 1st comment is always once 2nd comment is duplicated twice 3rd comment is duplicated thrice.. so on I think something is wrong in the child stream itself but don't have clarity around singer so much
here you can see the behaviour
640851 -- occurs once (1st comment) 641018 -- occurs twice (2nd comment) 641099 -- occurs thrice (3rd comment)
r
Can I see the
JiraStream
implementation too?
p
the same behaviour is across all the child streams thinkspin
r
I wonder if it's something to do with this: https://github.com/prakharcode/tap-jira/blob/0eaac5921dac8ef2efbf300305f0cc29b8dc95ae/tap_jira/client.py#L89-L90 Perhaps the
get_next_page_token
logic is causing the increasing duplication. I would have a look into that in more detail.
p
I'm also exploring that essentially
will update but thanks for the time, much appreciated
r
No problem! Good luck! 😁
p
solved it
indeed the problem was pagination logic -- by default the return per page is 100 records and pagination logic is per record based so it was making 1 req for one object which fetched 100 records then another request which fetched the same 100 records over and over till 100 records
thanks for the direction :)
r
Nice! Glad you figured it out. 😄