Hey team - I'm currently developing a tap using th...
# singer-tap-development
n
Hey team - I'm currently developing a tap using the SDK (which is superb), using pipelinewise-target-snowflake as the target. I'm wondering how I can define the tap schema where a field has no strict schema. To be more specific, I'm building a stream for this resource which has a property 'value', for example:
Copy code
[
  {
    "id":250616,
    "field_id":337,
    "list_entry_id":null,
    "entity_id":38706,
    "value":{
      "city":"San Francisco",
      "state":"California",
      "country":"United States",
      "continent":null,
      "street_address":null
    }
  },
  {
    "id":250615,
    "field_id":1284,
    "list_entry_id":null,
    "entity_id":38706,
    "value":"Computer Software"
  },
  {
    "id":32760,
    "field_id":198,
    "list_entry_id":null,
    "entity_id":38706,
    "value":38659
  },
  {
    "id":177634,
    "field_id":751,
    "list_entry_id":605,
    "entity_id":38706,
    "value":{
      "id":71,
      "text":"Low",
      "rank":1,
      "color":4
    }
  },
  ...
]
where
value
can be an integer, string or object. Is this possible?
v
(I don't know snowflake so take this with a grain of salt) I think you want a json object for value as it's not consistent. For the objects that are not coming back as json maybe a small transform in your tap to make it be a json object
Looks like there is also a concept of the column typoe of VARIANT
a
@niall_woodward - We have some active discussions this past week on "variant"
object
types - meaning an object node with no defined subproperties. Unfortunately, there is not (yet!) consistent support for this in a wide variety of targets. I think your best bet might be to either create a few different nullable fields, each strongly typed, or else fallback to a string representation which the transform downstream can cleanup later. And to be fair, the Singer spec would allow what you are proposing, using an
anyOf
JSON Schema construct, but the reality is that many targets would likely not know how to handle this so it's safer if you can simplify to commonly-used types.
Does that help?
n
Thanks for your replies, both. @aaronsteers, I think so, it sounds like my only solution for now is to just use a string representation. So in that case, do you suggest I do some type checking in the stream and always force the "value" key into a string so that it complies with the schema?
In a post_process perhaps?
a
Yes,
post_process()
would be a good place to put this. 👍
n
Cool. Thanks a lot!
a
No problem at all! I see we don't have an Affinity tap yet in the Hub. When you feel this is close to being ready, feel free to post to #C01UGBSJNG5 and we can help point you in the right direction to get it listed! simple smile
j
This is a rather timely question for me. I was running into virtually the exact same issue developing an SDK version of the
tap-pendo
. 2 relevant things to this thread: 1. Apparently, target-snowflake (pipelinewise variant) accounts for the
anyOf
feature in Json Schemas 2. Since the SDK’s typing doesn’t allow for
anyOf
I used the following hack to get the Snowflake target to load my variable data. (note that it is loaded as a
varchar
not
variant
but it serves my purposes just fine)
Copy code
class PollEventsStream(PendoStream):
    name = "pollEvents"
    ...
    schema = th.PropertiesList(
        ...
    ).to_dict()
    del schema['properties']['pollResponse']['type']
    schema['properties']['pollResponse']['anyOf'] = [{"type": "string"}, {"type": "integer"}]
Not ideal or robust probably, but it’ll get me through till we can add this feature into the SDK I should also clarify that when I tried to set the field to
th.StringType
the tap would error out. Hence the necessity for this hack
a
@josh_lloyd and @niall_woodward - a couple quick points... First, the workaround described by @josh_lloyd will work - since the type helpers only facilitate the creation of the json schema dict objects, you're free to modify those any way you'd like. Second, the
typing
helper module also contains this
CustomType
class which can be used to create custom JSONSchema objects where no other pre-built type fits the requirement:
Copy code
class CustomType(JSONTypeHelper):
    """Accepts an arbitrary JSON Schema dictionary."""

    def __init__(self, jsonschema_type_dict: dict) -> None:
        """Initialize JSONTypeHelper by importing an existing JSON Schema type."""
        self._jsonschema_type_dict = jsonschema_type_dict

    @property
    def type_dict(self) -> dict:  # type: ignore  # OK: @classproperty vs @property
        """Return dict describing the type."""
        return self._jsonschema_type_dict
This (👆) only solves for what you can do with the Tap SDK - it doesn't necessarily answer what targets can handle. In generally, we're still learning and observing what mainstream targets are able to handle.
Related: in #C01QS0RV78D today (Wednesday) we'll discuss a
capabilities
taxonomy which proposes a
datatype-failsafe
capability as discussed+described here. The point with this capability is to ensure that the target does not crash regardless of whether or not the json schema is correctly parsed and even if the type is not contained in the list of expected/handled cases.