hi folks, I'm looking for guidance on the best wa...
# best-practices
h
hi folks, I'm looking for guidance on the best way / place to implement the following functionality. I am building a singer-tap for a rest api using the meltano SDK where one the end points returns data in the following structure, and will be executing it through meltano.
Copy code
{
  "className": "entityName",
  "keyName": "primaryKeyName",
  "keyValue": "primaryKeyValue",
  "fieldRows": [
    {
      "fieldName": "fieldName1",
      "fieldValue": "fieldValue1"
    },
    {
      "fieldName": "customFieldName2",
      "fieldValue": "customFieldValue2"
    }
  ]
}
The number of fields in "fieldRows" is unknown and can evolve. I'd like to reshape the data as follows before sending it to the target, and have the target schema evolve when new fields are added.
Copy code
{
  "primaryKeyName": "primaryKeyValue",
  "fieldName1": "fieldValue1",
  "fieldName2": "fieldValue2"
}
Essentially, the transformation logic would be something like below for each record returned by this endpoint.
Copy code
def flatten(row):
  flat = {row["keyName"]: row["keyValue"]}
  for item in row["fieldRows"]:
    flat[item["fieldName"]] = item["fieldValue"]
  return flat
this is the schema of what the endpoint returns
Copy code
{
  "$schema": "<http://json-schema.org/schema#>",
  "type": "object",
  "key_properties": [
    "className",
    "keyName",
    "keyValue"
  ],
  "bookmark_properties": [],
  "properties": {
    "fieldRows": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "fieldName": {
            "type": "string"
          },
          "fieldValue": {
            "type": "string"
          }
        }
      }
    },
    "className": {
      "type": "string"
    },
    "keyName": {
      "type": "string"
    },
    "keyValue": {
      "type": "string"
    }
  }
}
p
Hey @haleemur_ali - all of what youre trying to do is possible but one tricky part that I'm not quite sure what I'd recommend is the fact that your schema could theoretically change mid sync. The way that this works is that before the sync is executed Meltano runs a
--discover
call to the tap to generate a json schema (in your case it would probably be dynamic schema based on a recent sample of the fieldRows data) thats then passed to the target before starting the sync. In most implementations the target will assert that the records all match that schema before writing to the destination. If you started the sync expecting only
fieldName1
to exist then
fieldName2
showed up then you'd get an error.
Theres also the concept of record flattening that many target implement to do the flattening of nested JSON into columns. So if I had to make a recommendation I'd say: have your tap translate the
fieldRows
into one nested object field like
Copy code
{
  "primaryKeyName": "primaryKeyValue",
  "fieldRows": {
    "fieldName1": "fieldValue1",
    "fieldName2": "fieldValue2"
  }
}
then use the target record flattening setting to unnest those for you to create a table structure like:
Copy code
primaryKeyName, fieldRows.fieldName1, fieldName1.fieldName2
primaryKeyValue, fieldValue1, fieldValue2
then if that object grows the target will append a new column based on the key that was added. This would bypass the issue with the schema changing because all records would pass validation since the schema is static and contains the
fieldRows
as an object with key value pairs.
Another alternative is to take this same approach and let it write the nested json object into your destination and do the flattening there. This would follow the ELT approach more closely by keeping your data basically in raw format.
h
Thank you Pat. I unfortunately had to focus on a different problem for a week, but will take another stab keeping your comments in mind. I appreciate he support