Hey folks, what's the best way to specify the json...
# best-practices
a
Hey folks, what's the best way to specify the jsonschema for json fields whose structure is non known in advance? Like an
additional_information
field that contains a bunch of keys which may not be standardized? If it's relevant, I plan on loading the data to BigQuery. We're also doing inline schema definitions like the attached code. Thanks y'all!
Copy code
schema = th.PropertiesList(
        th.Property(
            "id",
            th.StringType,
        ),
        th.Property(
            "conversationId",
            th.StringType,
        ),
        th.Property(
            "customerId",
            th.StringType,
        ),
        th.Property(
            "timestamp",
            th.DateTimeType,
        )...
   ).to_dict()
e
Hi @avishua_stein! You could use ObjectType without arguments:
Copy code
th.Property(
    "additional_information",
    th.ObjectType(),
)
a
Thank you! Do you know if I can specify some keys and bring in the other unspecified keys?
e
Yes, you could pass the known properties as positional arguments and use
additional_properties
for the rest, e.g.
Copy code
th.Property(
    "additional_information",
    th.ObjectType(
        th.Property("known", th.StringType),
        additional_properties=th.StringType,
    ),
)
d
Also wondering about support for
patternProperties
within
singer_sdk.typing
. Does it exist already and if not is support planned?
a
Looks like that didn't work but I think it's an issue with how BQ/ target-bigquery is set up. The target I'm using wants to convert all
Object
types to the BQ
RECORD
type. I think the issue is that for
RECORD
, BQ expects a schema and drops any keys which aren't specified. outputting to jsonl returned the desired data
e
@avishua_stein yeah, each target handles struct types differently. I know jmriego’s variant does support unstructured objects @daniel_luftspring there’s no official plan for that. It would be an easy add, if you’re willing to contribute an issue, a PR, or both 🙂
d
Happy to try contributing if you could point me in the right direction 🙃
a
Thanks! I'm using this variant and it's working well! It ignores the schema entirely but it's better than nothing! I'll check the one you linked!
e
@daniel_luftspring I’ve logged https://github.com/meltano/sdk/issues/1189