Can someone explain to me the intended purpose of ...
# troubleshooting
c
Can someone explain to me the intended purpose of the metadata in a tap's schema file (catalog.json file)? For example, we're working with the square tap, which has the schema for each endpoint followed by the metadata, like the below:
Copy code
"stream": "items",
      "metadata": [
        {
          "breadcrumb": [],
          "metadata": {
            "table-key-properties": [
              "id"
            ],
            "forced-replication-method": "INCREMENTAL",
            "inclusion": "available",
            "valid-replication-keys": [
              "updated_at"
            ],
            "selected": true
          }
        },
        {
          "breadcrumb": [
            "properties",
            "id"
          ],
          "metadata": {
            "inclusion": "automatic",
            "selected": true
          }
        },
...
The table properties make sense, but the field properties don't seem to have any impact on what data is extracted? For example, setting
selected
to true vs false at the column level still includes that column during an extraction. It doesn't even seem to have an impact whether I include the metadata at all for a column. I could entirely omit the metadata for the id property, but that data will still be extracted as long as it's included in the schema section. Does the metadata for fields serve any purposes? Or is it just the table level metadata that's important (where you configure keys, replication method, and replication keys)?
Some additional context, the square tap in particular seems to ignore selection/exclusion rules defined in the meltano.yml file as well, so perhaps this is a problem with the square tap specifically, rather than general behavior
p
@chris_schmid which variant are you using? Are you running taps without meltano? Meltano usually manages discovering the metadata, selecting/deselecting, and passing the catalog to the tap and it sounds like youre manually editing the catalog so I'm wonder how exactly youre testing. From a quick skim of the hotgluexyz variant it doesnt look like its doing anything custom and is using the SDK so selection criteria should work as expected and would be a bug if not
c
We're using the singer variant, which seems like it was out of date and based on an older version of Square's api. I don't remember seeing the hotgluexyz variant when we set this up
But it sounds like it's worth looking into migrating over to the hotglue version
p
@chris_schmid yeah it was only added like 3 weeks ago so its possible you installed singer before it was added. From a quick skim of the code it looks like hotglue doesnt support as many streams as singer-io does (although singer-io might have old streams that arent relevant anymore). I'm sure hotglue would accept a PR to get those streams added though, with the SDK its usually pretty straight forward once you have some streams already set up
c
Thanks @pat_nadolny. I'll look into it!
Hmmm...you're right. It looks like the hotglue variant only includes 2 of the 10 streams we need
p
Which 10 do you need? One thing I noticed with a second look is that it seems like hotglue implements the catalog streams as 1 streams whereas singer splits it into 5 streams...I think. • Items • Categories • Discounts • Taxes • ModifierLists
c
Where are you seeing the catalog streams? I'm looking here, but didn't see any of the catalog related streams. These are the streams we need: • Categories • Customers • Discounts • Items • Locations • ModifierLists • Orders • Payments • Refunds • Taxes
Oh, I was looking at v2. I assume you're looking at v1: https://gitlab.com/hotglue/tap-square/-/blob/main/tap_square/streams.py