So I m not sure if this is a limitation of the singer spec o Meltano #singer-targets

So I'm not sure if this is a limitation of the sin...

Daniel Luo

09/09/2024, 3:54 PM

So I'm not sure if this is a limitation of the singer spec or an issue with target-snowflake, but I'm trying to figure out how to get my data in as a number/decimal, instead of a double. My data type in SQL server is decimal(38,20). Here is my output data type:

"val2":{"inclusion":"available","multipleOf":1e-20,"type":["null","number"]}}

However, this seems to come out in snowflake as a double/float and loses the precision. I see that a potential fix for this is to specify singer_decimal on the tap. In that case, I see that the data type is the following:

"val2":{"inclusion":"available","format":"singer.decimal","type":["null","number","string"],"additionalProperties":{"scale_precision":"(38,20)"}}}

The problem with this approach is that it ends up in snowflake as a varchar. Is this because snowflake target isn't properly supporting singer_decimal?

Daniel Luo

09/09/2024, 3:56 PM

For what it's worth, regardless of singer_decimal specification, the actual value that I see in the output file of the tap has the correct precision. It's just lost somewhere when pushing it to snowflake.

visch

09/09/2024, 3:58 PM

The place to look is target-snowflake, my guess is it doesnt' support decimals right now but that's just a guess!

Edgar Ramírez (Arch.dev)

09/09/2024, 9:03 PM

So what's the Snowflake type you'd expect to be created for that field? i.e. which one of https://docs.snowflake.com/en/sql-reference/data-types-numeric?

Daniel Luo

09/09/2024, 9:04 PM

number(38,20)

Edgar Ramírez (Arch.dev)

09/10/2024, 12:36 AM

Ok, and does the tap emit this

Copy code

"val2":{"inclusion":"available","format":"singer.decimal","type":["null","number","string"],"additionalProperties":{"scale_precision":"(38,20)"}}}

I can't find a tap-mysql in MeltanoHub that supports that, so I'm curious what variant this is. FWIW, I've reopened https://github.com/meltano/sdk/issues/1890.

Daniel Luo

09/10/2024, 12:38 AM

Yes, that's from the tap. It's the wintersrd variant, which is the default on meltano hub.

Edgar Ramírez (Arch.dev)

09/10/2024, 12:39 AM

Oh, this is tap-mssql, I thought it was tap-mysql. Makes sense then that it matches Steve's proposal in the issue.

visch

09/10/2024, 12:40 AM

personally I think the multipleOf works just fine 🤷

Daniel Luo

09/10/2024, 12:41 AM

I haven't finished reading here, but why would this be an sdk issue rather than just snowflake? Or is it to make it easier to implement?

Daniel Luo

09/10/2024, 12:42 AM

I'm not sure on the implementation details for multiple of, but I did notice that precision is lost at the target. Again, not sure if that's the tap's fault or not though because the output file doesn't lose any precision

Daniel Luo

09/10/2024, 12:45 AM

That said, when testing with 37 decimal places instead of 20, I did seem to get some division error, which I'm assuming was a divide by zero due to rounding.

Edgar Ramírez (Arch.dev)

09/10/2024, 12:45 AM

I haven't finished reading here, but why would this be an sdk issue rather than just snowflake? Or is it to make it easier to implement?

I think it makes sense to upstream, but happy to review a target-snowflake PR or just an issue is also fine. target-snowflake has a lot of customization anyway, but I do want to reduce it at some point.

I'm not sure on the implementation details for multiple of, but I did notice that precision is lost at the target. Again, not sure if that's the tap's fault or not though because the output file doesn't lose any precision (edited)

I 100% agree. The next version of the SDK will make it easier for SQL taps to customize the SQL->JSON schema mapping, and I have just that as an example: https://sdk.meltano.com/en/latest/guides/sql-tap.html#custom-type-mapping

steve_clarke

09/10/2024, 8:29 AM

Jumping in here. So when we transfer data from databases like SQL Server, Sybase, DB2, Oracle, we contributed to the Meltano Hub default variants of these taps to allow you to emit the numeric data e.g. decimal / float data as strings i.e. not scientific notation - this requires the singer_decimal config item to be set to true. When this is set, logic in the tap also provides additional information we require to convert it from a string back into a number in the target loader once it has been transferred without any loss in precision. This is achieved by the taps having logic built-in to also give some hints as to what the original column definition was in the original source system e.g. "additionalProperties":{"scale_precision":"(38,20)". We have our own variant of the original pipelinewise target snowflake which has been patched and a few bugs removed - https://github.com/mjsqu/pipelinewise-target-snowflake . This variant has logic to recognize the

singer decimal

and make use of the additional properties. If you use this variant, it will automatically convert and store the data as a number with the correct scale and precision as per the source system. This has been very important for us as we found during our testing that there is a loss of precision if the numeric based decimal and floats were not emitted as strings via the enablement of singer decimal. I have been meaning to raise a PR to the Meltanolabs version of target-snowflake to add our logic so it could be our preferred loader. It is worth noting for Oracle Databases however that singer decimal is harder as Oracle doesn't really set a scale and precision. It is often set as just number and the user can store a mix of integers and decimals. If this is the case we will not emit any precision and scale and that data will be transferred as a string. It would be the responsibility of the data transform e.g. dbt to set an appropriate data type as it is too dangerous to guess what is correct. We will still transfer it as a string to avoid python and scientific notation leading rounding issues.

steve_clarke

09/10/2024, 9:01 AM

This https://github.com/mjsqu/pipelinewise-target-snowflake/blob/e2a216c3cfa3d56fe66902e611812fe146528996/target_snowflake/db_sync.py#L99-L100 section in our variant of target-snowflake which will set the correct datatype in Snowflake for singer-decimal columns. The actual PR is here : https://github.com/mjsqu/pipelinewise-target-snowflake/commit/558a0d6bb0ba4166f188cd1aac0c970bb9ed36a1 A couple of notable enhancements we made to the original pipelinewise version. 1. Fixing an issue with NULL's in an INT leading to the conversion of a Float datatype causing rounding - https://github.com/transferwise/pipelinewise-target-snowflake/issues/404 2. Adding the ability to set no proxy to allow internal routing within AWS avoid going out via the public network - https://github.com/mjsqu/pipelinewise-target-snowflake/pull/4 3. Ability to set a retention period for the table - https://github.com/mjsqu/pipelinewise-target-snowflake/pull/27 4. Ability to retain the column order as per the source system - https://github.com/mjsqu/pipelinewise-target-snowflake/pull/3 5. Ability to have append and truncate options for loaded data - https://github.com/mjsqu/pipelinewise-target-snowflake/pull/7

Daniel Luo

09/10/2024, 6:11 PM

Thanks for all the detail on that. I will probably see if it some of that can be ported over to the meltano variant if the meltano team doesn't get to it first

3 Views

Open in Slack

Previous Next