hi folks I m looking for guidance on the best way place to i Meltano #best-practices

hi folks, I'm looking for guidance on the best wa...

haleemur_ali

04/23/2023, 9:47 PM

hi folks, I'm looking for guidance on the best way / place to implement the following functionality. I am building a singer-tap for a rest api using the meltano SDK where one the end points returns data in the following structure, and will be executing it through meltano.

Copy code

{
  "className": "entityName",
  "keyName": "primaryKeyName",
  "keyValue": "primaryKeyValue",
  "fieldRows": [
    {
      "fieldName": "fieldName1",
      "fieldValue": "fieldValue1"
    },
    {
      "fieldName": "customFieldName2",
      "fieldValue": "customFieldValue2"
    }
  ]
}

The number of fields in "fieldRows" is unknown and can evolve. I'd like to reshape the data as follows before sending it to the target, and have the target schema evolve when new fields are added.

Copy code

{
  "primaryKeyName": "primaryKeyValue",
  "fieldName1": "fieldValue1",
  "fieldName2": "fieldValue2"
}

Essentially, the transformation logic would be something like below for each record returned by this endpoint.

Copy code

def flatten(row):
  flat = {row["keyName"]: row["keyValue"]}
  for item in row["fieldRows"]:
    flat[item["fieldName"]] = item["fieldValue"]
  return flat

this is the schema of what the endpoint returns

Copy code

{
  "$schema": "<http://json-schema.org/schema#>",
  "type": "object",
  "key_properties": [
    "className",
    "keyName",
    "keyValue"
  ],
  "bookmark_properties": [],
  "properties": {
    "fieldRows": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "fieldName": {
            "type": "string"
          },
          "fieldValue": {
            "type": "string"
          }
        }
      }
    },
    "className": {
      "type": "string"
    },
    "keyName": {
      "type": "string"
    },
    "keyValue": {
      "type": "string"
    }
  }
}

pat_nadolny

04/25/2023, 2:27 PM

Hey @haleemur_ali - all of what youre trying to do is possible but one tricky part that I'm not quite sure what I'd recommend is the fact that your schema could theoretically change mid sync. The way that this works is that before the sync is executed Meltano runs a

--discover

call to the tap to generate a json schema (in your case it would probably be dynamic schema based on a recent sample of the fieldRows data) thats then passed to the target before starting the sync. In most implementations the target will assert that the records all match that schema before writing to the destination. If you started the sync expecting only

fieldName1

to exist then

fieldName2

showed up then you'd get an error.

pat_nadolny

04/25/2023, 2:34 PM

Theres also the concept of record flattening that many target implement to do the flattening of nested JSON into columns. So if I had to make a recommendation I'd say: have your tap translate the

fieldRows

into one nested object field like

Copy code

{
  "primaryKeyName": "primaryKeyValue",
  "fieldRows": {
    "fieldName1": "fieldValue1",
    "fieldName2": "fieldValue2"
  }
}

then use the target record flattening setting to unnest those for you to create a table structure like:

Copy code

primaryKeyName, fieldRows.fieldName1, fieldName1.fieldName2
primaryKeyValue, fieldValue1, fieldValue2

then if that object grows the target will append a new column based on the key that was added. This would bypass the issue with the schema changing because all records would pass validation since the schema is static and contains the

fieldRows

as an object with key value pairs.

pat_nadolny

04/25/2023, 2:40 PM

Another alternative is to take this same approach and let it write the nested json object into your destination and do the flattening there. This would follow the ELT approach more closely by keeping your data basically in raw format.

haleemur_ali

05/03/2023, 11:11 PM

Thank you Pat. I unfortunately had to focus on a different problem for a week, but will take another stab keeping your comments in mind. I appreciate he support

2 Views

Open in Slack

Previous Next