Hello, I am trying to make a simple api call and s...
# troubleshooting
a
Hello, I am trying to make a simple api call and save it as a csv file. It works when I don't specify the schema. However, when I specify, it throws an error saying
data should not be empty
. The response I want to save contains 30 jsons. 15 of them contain 22 keys, whilst the other half contain 1 key less. I mention all of the 22 keys in the schema and the nested ones. Anyone has any idea why it might be breaking? Should I add something to accept the missing key from part of the records? I'll add the schema and the yml file in the thread đŸ§”
✅ 1
Copy code
{
  "$schema": "<http://json-schema.org/draft-07/schema#>",
  "type": "object",
  "properties": {
    "id": {
      "type": "integer"
    },
    "title": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ]
    },
    "description": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ]
    },
    "category": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ]
    },
    "price": {
      "anyOf": [
        {
          "type": "number"
        },
        {
          "type": "null"
        }
      ]
    },
    "discountPercentage": {
      "anyOf": [
        {
          "type": "number"
        },
        {
          "type": "null"
        }
      ]
    },
    "rating": {
      "anyOf": [
        {
          "type": "number"
        },
        {
          "type": "null"
        }
      ]
    },
    "stock": {
      "anyOf": [
        {
          "type": "integer"
        },
        {
          "type": "null"
        }
      ]
    },
    "tags": {
      "anyOf": [
        {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        {
          "type": "null"
        }
      ]
    },
    "brand": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "required": []
    },
    "sku": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ]
    },
    "weight": {
      "anyOf": [
        {
          "type": "integer"
        },
        {
          "type": "null"
        }
      ]
    },
    "dimensions": {
      "type": "object",
      "properties": {
        "width": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ]
        },
        "height": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ]
        },
        "depth": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ]
        }
      },
      "required": []
    },
    "warrantyInformation": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ]
    },
    "shippingInformation": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ]
    },
    "availabilityStatus": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ]
    },
    "reviews": {
      "type": "array",
      "items": {
        "type": "object"
      }
    },
    "returnPolicy": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ]
    },
    "minimumOrderQuantity": {
      "anyOf": [
        {
          "type": "integer"
        },
        {
          "type": "null"
        }
      ]
    },
    "meta": {
      "type": "object",
      "properties": {
        "createdAt": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ]
        },
        "updatedAt": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ]
        },
        "barcode": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ]
        },
        "qrCode": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ]
        }
      },
      "required": []
    },
    "images": {
      "anyOf": [
        {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        {
          "type": "null"
        }
      ]
    },
    "thumbnail": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ]
    }
  },
  "required": [
    "id"
  ]
}
Copy code
plugins:
  extractors:
  - name: tap-rest-api-msdk
    variant: widen
    pip_url: tap-rest-api-msdk
    config:
      api_url: <https://dummyjson.com/products>
      streams:
        - name: products_etl
          primary_keys:
            - id
          records_path: $.products[*]
          schema: extract/products_schema.json
          schema-flattening: False
          limit: 15


  loaders:
  - name: target-csv
    variant: hotgluexyz
    pip_url: git+<https://github.com/hotgluexyz/target-csv.git>
    config:
      destination_path: raw_data
      delimiter: ","
      quotechar: '"'
      doublequote: true
      allow_missing_keys: true
e
The tap config you shared works for me, with only small changes to the schema:
The reason for the changes is that tap-rest-api-msdk flattens JSON arrays into strings, so schema validation was failing.
a
Oh...thank you so much. I didn't realize that. Was going crazy that my schema wasn't working. I even checked the auto generated one to see the difference, but I didn't realize that. (I just started using it) Thank you again 🙏
e
Np!
a
Actually I just tested it and it still breaks when I run the target-csv. If I run only the invoke tap-rest-api-msdk, it runs fine. It breaks if I don't specify the nested dictionary as individual columns. e.g.:
Copy code
"meta_createdAt": {
      "type": "string"
   },
    "meta_updatedAt": {
      "type": "string"
    },
    "meta_barcode": {
      "type": "string"
    },
    "meta_qrCode": {
      "type": "string"
    }
Or am I missing something? đŸ€”
e
Hmm I don't get the same result and works fine with the schema I shared above. What's your version of Python and Meltano, operating system?
a
meltano ==3.4.2 python==3.10.6 MacOs Sonoma 14.4.1
e
Ok, can you try this command:
Copy code
$ .meltano/loaders/target-csv/venv/bin/pip list                                                                                                                               
Package         Version
--------------- -----------
backoff         1.8.0
ciso8601        2.3.1
jsonschema      2.6.0
pip             24.1b1
python-dateutil 2.9.0.post0
pytz            2018.4
setuptools      70.0.0
simplejson      3.11.1
singer-python   5.12.1
six             1.16.0
target-csv      0.3.6
wheel           0.43.0
a
Screenshot 2024-06-20 at 11.48.42.png
e
Can't see any differences that would cause schema validation to work differently đŸ€”
a
I upgraded the setuptools now. I'll test it again.
👍 1
In any case... the target-csv has a weird behaviour. Even without specifying my own schema. Where a key is missing, it doesn't assign a NaN (null, etc...). It writes the information from the following keys. In the end, it looks like this. (This is just an example, the column that is supposed to have NaN is another one. This one just contains info from another key.
e
Gotcha. Can you try a different variant perhaps?
Copy code
meltano remove loader target-csv
meltano add loader target-csv --variant meltanolabs
https://hub.meltano.com/loaders/target-csv--meltanolabs/
a
Hello... sorry it took a while. Now, the schema works and it writes the NaN in the right place. It is just not writting the nested dictionaries neither as new columns nor as a string representation. 😄 So, 2 columns are empty I probably have to change something.
e
Hey! You mean using the
meltanolabs
variant?
a
Yes
Screenshot 2024-06-20 at 20.11.18.png,Screenshot 2024-06-20 at 20.11.41.png