andrio_frizon
08/09/2023, 6:03 PMfalse
.
What is the “most correct” way to handle a case like this?
I’ve made some tests with stream_maps to try to map the false rows to an empty array, but the simpleeval does not allow to perform operations with lists.
As a partial solution, I’m doing this check on post_process but as I have many similar fields I’m not sure if this is the way to go.andrio_frizon
08/09/2023, 6:03 PM{
"result": [
{
"ID": "59",
"UF_CRM_1689847377": [
"one value",
"another value"
]
},
{
"ID": "73",
"UF_CRM_1689847377": false
},
{
"ID": "77",
"UF_CRM_1689847377": []
},
]
}
visch
08/09/2023, 6:41 PMtype: ["array", "boolean"]
visch
08/09/2023, 6:42 PMpost_process
is a good option as well (make false be an empty array or something)andrio_frizon
08/09/2023, 6:54 PMI think you can doI considered this option, it seems all records were then mapped to either true or false, so even the records that had the proper array turned out with wrong values 😕type: ["array", "boolean"]
andrio_frizon
08/09/2023, 6:55 PMIf you have an opinion on how it should be modeled then I thinkYeah, that’s the way I’m headed right nowis a good option as well (make false be an empty array or something)post_process
visch
08/09/2023, 6:55 PMI considered this option, it seems all records were then mapped to either true or false, so even the records that had the proper array turned out with wrong values 😕That doesn't make sense, can you elaborate?
andrio_frizon
08/09/2023, 6:55 PMandrio_frizon
08/09/2023, 7:06 PMThat doesn’t make sense, can you elaborate?Sure! Let me try to give more context. I’m attaching an image of the API response I’m getting using postman, showing some entities, including examples with the array of strings, empty array, and boolean. When I use the schema for the tap with this configuration:
{
"properties": {
"ID": {
"type": [
"string"
],
"description": "ID"
},
"UF_CRM_1689847377": {
"type": [
"array",
"boolean"
],
"items": {
"type": [
"string"
]
}
}
}
}
This is how it is being mapped after going through target-jsonl:
{"ID": "59", "UF_CRM_1689847377": true}
{"ID": "73", "UF_CRM_1689847377": false}
{"ID": "77", "UF_CRM_1689847377": true}
{"ID": "111", "UF_CRM_1689847377": false}
{"ID": "129", "UF_CRM_1689847377": false}
{"ID": "191", "UF_CRM_1689847377": false}
{"ID": "305", "UF_CRM_1689847377": false}
-------
When I go back to using only "type": ["array"]
instead of "type": ["array", "boolean"]
:
{
"properties": {
"ID": {
"type": [
"string"
],
"description": "ID"
},
"UF_CRM_1689847377": {
"type": [
"array"
],
"items": {
"type": [
"string"
]
}
}
}
}
And with post-processing:
def post_process(self, row: dict, context: Optional[dict]) -> dict:
if row['UF_CRM_1689847377'] == False:
row['UF_CRM_1689847377'] = []
return row
The result is as I expected:
{"ID": "59", "UF_CRM_1689847377": ["one value", "another value"]}
{"ID": "73", "UF_CRM_1689847377": []}
{"ID": "77", "UF_CRM_1689847377": []}
{"ID": "111", "UF_CRM_1689847377": []}
{"ID": "129", "UF_CRM_1689847377": []}
{"ID": "191", "UF_CRM_1689847377": []}
{"ID": "305", "UF_CRM_1689847377": []}
visch
08/09/2023, 7:13 PMWhen I use the schema for the tap with this configuration:How does that tap read that. Are you passing in via the catalog or directly as the schema for your stream?
visch
08/09/2023, 7:13 PMedgar_ramirez_mondragon
08/09/2023, 7:21 PM{
"properties": {
"ID": {
"type": [
"string"
],
"description": "ID"
},
"UF_CRM_1689847377": {
"type": [
"array",
"boolean"
],
"items": {
"type": [
"string"
]
}
}
}
}
results in boolean values because of how we conform any boolean property in the schema: https://github.com/meltano/sdk/blob/f6bbf0c5ddba689ee1ab6df1f110529f35adf12f/singer_sdk/helpers/_typing.py#L470-L492.
One solution for your tap may be to disable conforming, ie setting TypeConformanceLevel.NONE
in the stream(s): https://sdk.meltano.com/en/v0.31.0/classes/singer_sdk.Stream.html#singer_sdk.Stream.TYPE_CONFORMANCE_LEVEL
I personally like the post_process
approach better, specially if there's a way to tell which props need to be processed beforehand, eg they have a special prefix.andrio_frizon
08/09/2023, 7:37 PMHow does that tap read that. Are you passing in via the catalog or directly as the schema for your stream?Right now I’m using directly as the schema of the stream, but eventually I’ll extract it to be passed via catalog.
specially if there’s a way to tell which props need to be processed beforehandUnfortunatelly I don’t think that’s the case, as I’m extracting data from a highly customizable CRM so I don’t think I can assume a prefix will be always present. I was only concerned about doing it on the post_process because there can be dozens (even hundreds) of fields on this condition 😕
andrio_frizon
08/09/2023, 7:38 PMReuben (Matatika)
08/09/2023, 8:53 PMpost_process
, as we found there isn't really a better solution available at the moment. Check out this PR to tap-auth0
for context.
I also made this issue, which contains links off to some Slack discussions and related issues - I'd be interested in your thoughts after having a look through those. 🙂