Is there a mechanism for conveying the max digits of a field Meltano #plugins-general

Is there a mechanism for conveying the max-digits ...

paul_tiplady

02/24/2022, 6:47 PM

Is there a mechanism for conveying the max-digits of a field in tap schemas? (e.g. tap-mysql). I fixed target-bigquery to differentiate DECIMAL from FLOAT: https://github.com/adswerve/target-bigquery/issues/22 But if a MySQL column is defined thusly: ``amount` numeric(65, 2) NOT NULL` then I need a BIGDECIMAL not a DECIMAL in the target. I don’t see a way to convey the max digits in JSONSchema. (Short of a “max/min” field, but JSON doesn’t have enough digits to hold the actual max). So that

is currently omitted from the tap’s singer schema as far as I can see:

Copy code

"amount": {
            "inclusion": "available",
            "multipleOf": 0.01,
            "type": [
              "null",
              "number"
            ]
          },

One other solution is just to flag “use BIGDECIMAL instead of DECIMAL” for all columns created in BQ by the target. But that’s obviously not a universal solution.

paul_tiplady

02/24/2022, 6:52 PM

Bigquery docs: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#decimal_types

paul_tiplady

02/24/2022, 6:52 PM

MySQL docs: https://dev.mysql.com/doc/refman/8.0/en/fixed-point-types.html

paul_tiplady

02/24/2022, 6:57 PM

Maybe the actual solution in Meltano is to somehow map the schema types; the target does support a

bq-bignumeric

type if we can map the tap’s schema fields accordingly. But this seems unsatisfying from a Singer ecosystem perspective.

paul_tiplady

02/24/2022, 6:59 PM

https://gitlab.com/meltano/meltano/-/issues/2145#note_375974045 for example

edgar_ramirez_mondragon

02/24/2022, 7:14 PM

@paul_tiplady have you tried https://docs.meltano.com/concepts/plugins#schema-extra? It should address this if the tap supports schema overrides via catalog.

paul_tiplady

02/24/2022, 8:24 PM

@edgar_ramirez_mondragon yeah that was the direction I was gesturing towards with the link above. It’s certainly an option that works. However, I think it’s at best a band-aid, not a general approach: 1. This doesn’t work if you want to composably pipe from one tap to multiple targets. Not sure if that’s explicitly contemplated in Meltano (ELT would have everything going to the warehouse first), but if that’s a generic use-case, then schema annotations (to use target-specific fields like

bq-bigint

) are problematic. 2. It means I need to manually annotate every field to override the inferred type; schema inference is one of the big selling points of using Singer in the first place. These schema annotations become annoying to maintain; I already have a Python script to generate my

meltano.yml

file for 50 source-tables, and I’m trying to make that layer thinner, not thicker. 3. I think it’s probably a footgun for new users, too. The destination-schema will look correct at a glance, and work for some data, until you happen to load some data that doesn’t fit in the non-big-DECIMAL. Basically, the schema the tap generates is subtly incorrect, and if you don’t know to look for it, you could easily miss the problem. If the goal is to have taps/targets plug together cleanly and composably, then I think this is a fundamental problem with the Singer usage of JSONSchema. Interested in everyone else’s thoughts on whether this should be something that “just works”, or if it’s reasonable to have the long-term solution be to require schema overrides here.

Open in Slack

Previous Next