aaron_phethean
02/25/2022, 9:20 AMSELECT
to_timestamp(created_at, 'YYYY-MM-DDTHH:MI:SS')::date "orders.created_at"
, SUM(total_price::DECIMAL) "orders.sum_total_price"
FROM orders
WHERE to_timestamp(created_at, 'YYYY-MM-DDTHH:MI:SS')::date >= current_date - interval '30 day'
GROUP BY "orders.created_at"
ORDER BY "orders.created_at" DESC
I see the following possible options:
• tap transforms the types, before serialising the json output (e.g. 5.99, five dollars and ninety nine cents becomes a number type in the json payload, the risk is that some targets do not use a type with enough bits to store the value correctly)
• target transforms the types, with the “format” hint from the tap the target can use the correct type in the database
• transform in a data warehouse prepare stage, with dbt staging models
Taking ‘created_at’ as an example ‘format’ hint is the best option -
"created_at": {
"format": "date-time",
"type": "string"
},
with this implemented the SQL aggregation is simply
SELECT
created_at::date "orders.created_at"
, SUM(total_price::DECIMAL) "orders.sum_total_price"
FROM orders
WHERE created_at::date >= current_date - interval '30 day'
GROUP BY created_at::date
ORDER BY created_at::date DESC
My understanding is that a similar hint to ‘total_price’ should yield a decimal type in postgres from that target.
"total_price": {
"format": "singer.decimal",
"type": "string"
},
And a query of
SELECT
created_at::date "orders.created_at"
, SUM(total_price) "orders.sum_total_price"
FROM orders
WHERE created_at::date >= current_date - interval '30 day'
GROUP BY created_at::date
ORDER BY created_at::date DESC
However, ‘singer.decimal’ in the transferwise variant doesn’t seem to be implemented.visch
02/25/2022, 1:23 PMvisch
02/25/2022, 1:29 PMvisch
02/25/2022, 1:34 PMtap transforms the types, before serialising the json output (e.g. 5.99, five dollars and ninety nine cents becomes a number type in the json payload, the risk is that some targets do not use a type with enough bits to store the value correctly)Who defines the type? Support more types to map to? Trade off is that we add subjectivity to typing, but is that valuable enough?
target transforms the types, with the “format” hint from the tap the target can use the correct type in the databaseSame problem, who define the type? Just implemented a bit different
transform in a data warehouse prepare stage, with dbt staging modelsThe answer most everyone goes with right now that I know of
visch
02/25/2022, 1:37 PMaaron_phethean
02/25/2022, 3:16 PMaaronsteers
02/25/2022, 4:51 PMaaronsteers
02/25/2022, 4:53 PMaaron_phethean
02/26/2022, 6:02 PMaaron_phethean
03/01/2022, 2:40 PMdef post_process(self, row: dict, context: Optional[dict] = None) -> Optional[dict]:
super().post_process(row, context)
row["subtotal_price"] = Decimal(row["subtotal_price"])
row["total_price"] = Decimal(row["total_price"])
return row
2. 'multipleOf'
"total_price": {
"type": "number",
"minimum": 0,
"multipleOf": 0.01
},
"subtotal_price": {
"type": "number"
},
The validation of 0.01 multiple seems to work ok.
However, the column type in postgres is float8. This needs to be a decimal or numeric type to accurately store monetary values.
You may see issues with a single cent lost here and there when the amounts are small. You'll see the issue a lot more often in Indonesian Rupiah where 1 Rupiah equals 0.000070 United States Dollars
This leads my to conclude that there needs to be better handling of decimal types in the default postgres loader (transferwise variant). When the default loader was the meltano variant, all number types were stored as numeric.
Thoughts?
• just patch the transferwise variant to default to numeric for number types perhaps?
• implement handling for 'singer.decimal'
• implement all loaders with good tests in the SDK?visch
03/01/2022, 3:13 PMYou may see issues with a single cent lost here and there when the amounts are small. You'll see the issue a lot more often in Indonesian Rupiah where 1 Rupiah equals 0.000070 United States Dollars
Can you make a more concrete example? Do you mean that the tap would send the wrong information due to precision issues?
Like if the number is 0.000070
value: "0.000070"
you're saying this wouldn't work?visch
03/01/2022, 3:14 PM.00001
visch
03/01/2022, 3:16 PMaaron_phethean
03/01/2022, 4:43 PMvisch
03/01/2022, 4:55 PMNUMERIC
correct?
I want to start at some solid ground instead of pointing at the json schema for singer. To me solid ground is the data from the tap, and how the data is passed into the target.
Focusing on the inbetween layer comes after we define what we want imoaaron_phethean
03/02/2022, 10:50 AMmeltano run tap-shopify target-postgres
SELECT
created_at::date "orders.created_at"
, SUM(total_price) "orders.sum_total_price"
FROM orders
WHERE created_at::date >= current_date - interval '30 day'
GROUP BY created_at::date
ORDER BY created_at::date DESC
visch
03/02/2022, 2:26 PMnumber
with multipleOf
, minimum
and maximum
)
That also answers what the singer representation is for this target (We'll hold aside the argument about what should/shouldn't be there as this target forces our hand)
---
tap-shopify now needs to provide this data in the correct format.
I don't know much about tap-shopify, took a quick peak at docs here https://shopify.dev/api/admin-rest/2022-01/resources/order#resource-object. Found price json example here
"current_total_price": "10.00",
"current_total_price_set": {
"current_total_price_set": {
"shop_money": {
"amount": "30.00",
"currency_code": "CAD"
},
"presentment_money": {
"amount": "20.00",
"currency_code": "EUR"
}
}
}
In this example the price is given as a string. This makes all of the target-postgres stuff we did ahead of this moot as the API provides the data as a string. Now we could argue that we should convert from a string to currency in the tap (I think that's a stretch, as dbt is the normal spot for this but I'll go with it), we'd need to add some subjectivity here and pick the fields with _price and then create the corresponding number
with multipleOf
, maximum
, minimum
values for that.aaron_phethean
03/02/2022, 5:33 PMaaronsteers
03/02/2022, 6:02 PMaaron_phethean
03/02/2022, 9:35 PMvisch
03/02/2022, 9:38 PM