stephen_bailey
10/07/2021, 12:00 PMid
, for example), but the source system apparently does not have strict limitations on them, or has special cases where they might be omitted.
The tap completes successfully, but there are errors on load due to the null primary keys. I'd like to just remove these from the yielded records, and was wondering if others have suggestions. Right now, I'm just adding a filter into `parse_response`:
def parse_response(self, response: requests.Response) -> Iterable[dict]:
records = extract_jsonpath(self.records_jsonpath, input=response.json())
yield from [
row for row in records
if all(row.get(k) is not None for k in self.primary_keys)
]
But wondering if others have tackled this before, or if this would be a generally useful tap feature?stephen_bailey
10/07/2021, 12:03 PMrequired=True
attribute in the catalog may make sense, with something like a filter_records_with_schema_violations=True
flagedgar_ramirez_mondragon
10/07/2021, 4:10 PMstephen_bailey
10/07/2021, 4:12 PMstephen_bailey
10/07/2021, 4:15 PMstephen_bailey
10/07/2021, 4:15 PMstephen_bailey
10/07/2021, 4:22 PMstream_maps
+ the catalog required=True
to do this automatically.aaronsteers
10/07/2021, 4:26 PMpost_process()
to just always filter out certain records based on a condition.aaronsteers
10/07/2021, 4:27 PMpost_process()
is that you can alter the record or just not return it.stephen_bailey
10/07/2021, 4:28 PMNone
and that simply omits the record, that would be perfectaaronsteers
10/07/2021, 4:29 PMNone
has the effect of filtering out the record.aaronsteers
10/07/2021, 4:32 PMstephen_bailey
10/07/2021, 4:55 PMdef post_process(self, row: dict, context: Optional[dict] = None) -> dict:
if any(row.get(k) == None for k in self.primary_keys):
return None
return row