Somewhat urgent question For child streams does an increment Meltano #troubleshooting

Somewhat urgent question: For child streams, does ...

fred_reimer

04/08/2022, 9:31 PM

Somewhat urgent question: For child streams, does an incremental continue to query child streams if the parent stream query no longer returns data that causes it to return anything from get_child_context? We have deleted an account, and it is not returned by the API anymore, hence get_child_content is not even called, let alone the tap returning anything, as the record is simply not there. However, on incrementals for a child stream, it is still querying for that old account_id. I checked in the meltano DB and in the job table there is still the account_id in the state info (in payload field/column). Do we really have to manually edit the job table to remove old account_ids?

aaronsteers

04/08/2022, 10:40 PM

Hi, @fred_reimer. This is an interesting use case. Is this a publicly available tap I could look at? What comes to mind is this line of code, which falls back to

partitions

if context is not set. The

partitions

list is seeded from the last

STATE

message, so the behavior you describe would make sense but only if

context

is missing/empty. That said, as long as you still have the parent-child relationship in tact, I don't know why

context

from the parent would not be used.

Copy code

context_list = [context] if context is not None else self.partitions

aaronsteers

04/08/2022, 10:45 PM

Would be helpful to look at the code, but short of that, can you help me understand the parent-child relationship and if you have set any value for

ignore_parent_replication_keys

and/or

state_partitioning_keys

? I would not expect the old

STATE

partitions to be cleaned out but I also would not expect the parent's children to be continually queried when the parent does not exist.

aaronsteers

04/08/2022, 10:47 PM

If the

of parent count is not very large, you can avoid partition-level bookmarks by setting

state_partitioning_keys

to a higher-level granularity or to

[]

to track just a single stream per key.

fred_reimer

04/08/2022, 11:07 PM

The tap is not public, but it's not particularly proprietary. Just a tap for a SAAS solution that we utilize as a customer/partner. Basic structure is: • accounts stream ◦ primary_keys ["account_id"] ◦ get_child_context - return {"account_id": record["account_id"]} • account_info stream ◦ parent_stream_type accounts stream ◦ ignore_parent_replication_keys True ◦ primary_keys ["account_id"] ◦ replication_key "timestamp" ◦ uses context to access account_id to make queries, set account_id in record, etc. So it's all fairly straight forward. the

is not very large now, accounts is maybe a dozen or two, but it will grow (not to thousands). We are not doing anything fancy here. Just when accounts no longer processes a record for a deleted account_id, then child stream is still trying to do an incremental and sync. That is, until we manually edited the job record in the DB and updated the payload for the last id for the job_id, which worked. But this can't be a manual process. This has to work automatically....

fred_reimer

04/08/2022, 11:15 PM

@aaronsteers done, but if you are on your weekend enjoy. I appreciate the second set of eyes, and any recommendations you may have. Thanks!

fred_reimer

04/08/2022, 11:18 PM

I can look into publishing this tap. Like I said, it's not particularly proprietary, and is for a third-party SAAS service. I'll let you know.

aaronsteers

04/08/2022, 11:49 PM

Thanks for sharing this detail. For short term, I do think there's a workaround to set

state_partitioning_keys = []

on the account_info child stream. Do you mind testing this if feasible to do so? And also, could you open an issue so we can look into the root cause?

aaronsteers

04/08/2022, 11:51 PM

The account id key sticking around in the state sartitions list in the job record is expected. But that partition definition driving the list of account IDs is not expected.

aaronsteers

04/08/2022, 11:52 PM

(As a one-time operation, you may also need to remove the old partitions, or just start a new job id for this test.)

fred_reimer

04/09/2022, 12:56 AM

I can likely test this next week. Stay tuned...

Open in Slack

Previous Next