niall_woodward
08/24/2021, 9:04 AM[
{
"id":250616,
"field_id":337,
"list_entry_id":null,
"entity_id":38706,
"value":{
"city":"San Francisco",
"state":"California",
"country":"United States",
"continent":null,
"street_address":null
}
},
{
"id":250615,
"field_id":1284,
"list_entry_id":null,
"entity_id":38706,
"value":"Computer Software"
},
{
"id":32760,
"field_id":198,
"list_entry_id":null,
"entity_id":38706,
"value":38659
},
{
"id":177634,
"field_id":751,
"list_entry_id":605,
"entity_id":38706,
"value":{
"id":71,
"text":"Low",
"rank":1,
"color":4
}
},
...
]
where value
can be an integer, string or object. Is this possible?visch
08/24/2021, 12:42 PMvisch
08/24/2021, 12:43 PMaaronsteers
08/24/2021, 4:43 PMobject
types - meaning an object node with no defined subproperties. Unfortunately, there is not (yet!) consistent support for this in a wide variety of targets. I think your best bet might be to either create a few different nullable fields, each strongly typed, or else fallback to a string representation which the transform downstream can cleanup later.
And to be fair, the Singer spec would allow what you are proposing, using an anyOf
JSON Schema construct, but the reality is that many targets would likely not know how to handle this so it's safer if you can simplify to commonly-used types.aaronsteers
08/24/2021, 4:43 PMniall_woodward
08/24/2021, 4:53 PMniall_woodward
08/24/2021, 4:53 PMaaronsteers
08/24/2021, 4:54 PMpost_process()
would be a good place to put this. 👍niall_woodward
08/24/2021, 4:54 PMaaronsteers
08/24/2021, 4:56 PMjosh_lloyd
08/25/2021, 2:32 AMtap-pendo
. 2 relevant things to this thread:
1. Apparently, target-snowflake (pipelinewise variant) accounts for the anyOf
feature in Json Schemas
2. Since the SDK’s typing doesn’t allow for anyOf
I used the following hack to get the Snowflake target to load my variable data. (note that it is loaded as a varchar
not variant
but it serves my purposes just fine)
class PollEventsStream(PendoStream):
name = "pollEvents"
...
schema = th.PropertiesList(
...
).to_dict()
del schema['properties']['pollResponse']['type']
schema['properties']['pollResponse']['anyOf'] = [{"type": "string"}, {"type": "integer"}]
Not ideal or robust probably, but it’ll get me through till we can add this feature into the SDK
I should also clarify that when I tried to set the field to th.StringType
the tap would error out. Hence the necessity for this hackaaronsteers
08/25/2021, 4:33 AMtyping
helper module also contains this CustomType
class which can be used to create custom JSONSchema objects where no other pre-built type fits the requirement:
class CustomType(JSONTypeHelper):
"""Accepts an arbitrary JSON Schema dictionary."""
def __init__(self, jsonschema_type_dict: dict) -> None:
"""Initialize JSONTypeHelper by importing an existing JSON Schema type."""
self._jsonschema_type_dict = jsonschema_type_dict
@property
def type_dict(self) -> dict: # type: ignore # OK: @classproperty vs @property
"""Return dict describing the type."""
return self._jsonschema_type_dict
This (👆) only solves for what you can do with the Tap SDK - it doesn't necessarily answer what targets can handle. In generally, we're still learning and observing what mainstream targets are able to handle.aaronsteers
08/25/2021, 4:40 AMcapabilities
taxonomy which proposes a datatype-failsafe
capability as discussed+described here.
The point with this capability is to ensure that the target does not crash regardless of whether or not the json schema is correctly parsed and even if the type is not contained in the list of expected/handled cases.