So I'm not sure if this is a limitation of the sin...
# singer-targets
d
So I'm not sure if this is a limitation of the singer spec or an issue with target-snowflake, but I'm trying to figure out how to get my data in as a number/decimal, instead of a double. My data type in SQL server is decimal(38,20). Here is my output data type:
"val2":{"inclusion":"available","multipleOf":1e-20,"type":["null","number"]}}
However, this seems to come out in snowflake as a double/float and loses the precision. I see that a potential fix for this is to specify singer_decimal on the tap. In that case, I see that the data type is the following:
"val2":{"inclusion":"available","format":"singer.decimal","type":["null","number","string"],"additionalProperties":{"scale_precision":"(38,20)"}}}
The problem with this approach is that it ends up in snowflake as a varchar. Is this because snowflake target isn't properly supporting singer_decimal?
For what it's worth, regardless of singer_decimal specification, the actual value that I see in the output file of the tap has the correct precision. It's just lost somewhere when pushing it to snowflake.
v
The place to look is target-snowflake, my guess is it doesnt' support decimals right now but that's just a guess!
e
So what's the Snowflake type you'd expect to be created for that field? i.e. which one of https://docs.snowflake.com/en/sql-reference/data-types-numeric?
d
number(38,20)
e
Ok, and does the tap emit this
Copy code
"val2":{"inclusion":"available","format":"singer.decimal","type":["null","number","string"],"additionalProperties":{"scale_precision":"(38,20)"}}}
I can't find a tap-mysql in MeltanoHub that supports that, so I'm curious what variant this is. FWIW, I've reopened https://github.com/meltano/sdk/issues/1890.
d
Yes, that's from the tap. It's the wintersrd variant, which is the default on meltano hub.
e
Oh, this is tap-mssql, I thought it was tap-mysql. Makes sense then that it matches Steve's proposal in the issue.
v
personally I think the multipleOf works just fine 🤷
d
I haven't finished reading here, but why would this be an sdk issue rather than just snowflake? Or is it to make it easier to implement?
I'm not sure on the implementation details for multiple of, but I did notice that precision is lost at the target. Again, not sure if that's the tap's fault or not though because the output file doesn't lose any precision
That said, when testing with 37 decimal places instead of 20, I did seem to get some division error, which I'm assuming was a divide by zero due to rounding.
e
I haven't finished reading here, but why would this be an sdk issue rather than just snowflake? Or is it to make it easier to implement?
I think it makes sense to upstream, but happy to review a target-snowflake PR or just an issue is also fine. target-snowflake has a lot of customization anyway, but I do want to reduce it at some point.
I'm not sure on the implementation details for multiple of, but I did notice that precision is lost at the target. Again, not sure if that's the tap's fault or not though because the output file doesn't lose any precision (edited)
I 100% agree. The next version of the SDK will make it easier for SQL taps to customize the SQL->JSON schema mapping, and I have just that as an example: https://sdk.meltano.com/en/latest/guides/sql-tap.html#custom-type-mapping
s
Jumping in here. So when we transfer data from databases like SQL Server, Sybase, DB2, Oracle, we contributed to the Meltano Hub default variants of these taps to allow you to emit the numeric data e.g. decimal / float data as strings i.e. not scientific notation - this requires the singer_decimal config item to be set to true. When this is set, logic in the tap also provides additional information we require to convert it from a string back into a number in the target loader once it has been transferred without any loss in precision. This is achieved by the taps having logic built-in to also give some hints as to what the original column definition was in the original source system e.g. "additionalProperties":{"scale_precision":"(38,20)". We have our own variant of the original pipelinewise target snowflake which has been patched and a few bugs removed - https://github.com/mjsqu/pipelinewise-target-snowflake . This variant has logic to recognize the
singer decimal
and make use of the additional properties. If you use this variant, it will automatically convert and store the data as a number with the correct scale and precision as per the source system. This has been very important for us as we found during our testing that there is a loss of precision if the numeric based decimal and floats were not emitted as strings via the enablement of singer decimal. I have been meaning to raise a PR to the Meltanolabs version of target-snowflake to add our logic so it could be our preferred loader. It is worth noting for Oracle Databases however that singer decimal is harder as Oracle doesn't really set a scale and precision. It is often set as just number and the user can store a mix of integers and decimals. If this is the case we will not emit any precision and scale and that data will be transferred as a string. It would be the responsibility of the data transform e.g. dbt to set an appropriate data type as it is too dangerous to guess what is correct. We will still transfer it as a string to avoid python and scientific notation leading rounding issues.
This https://github.com/mjsqu/pipelinewise-target-snowflake/blob/e2a216c3cfa3d56fe66902e611812fe146528996/target_snowflake/db_sync.py#L99-L100 section in our variant of target-snowflake which will set the correct datatype in Snowflake for singer-decimal columns. The actual PR is here : https://github.com/mjsqu/pipelinewise-target-snowflake/commit/558a0d6bb0ba4166f188cd1aac0c970bb9ed36a1 A couple of notable enhancements we made to the original pipelinewise version. 1. Fixing an issue with NULL's in an INT leading to the conversion of a Float datatype causing rounding - https://github.com/transferwise/pipelinewise-target-snowflake/issues/404 2. Adding the ability to set no proxy to allow internal routing within AWS avoid going out via the public network - https://github.com/mjsqu/pipelinewise-target-snowflake/pull/4 3. Ability to set a retention period for the table - https://github.com/mjsqu/pipelinewise-target-snowflake/pull/27 4. Ability to retain the column order as per the source system - https://github.com/mjsqu/pipelinewise-target-snowflake/pull/3 5. Ability to have append and truncate options for loaded data - https://github.com/mjsqu/pipelinewise-target-snowflake/pull/7
d
Thanks for all the detail on that. I will probably see if it some of that can be ported over to the meltano variant if the meltano team doesn't get to it first