https://linen.dev logo
#singer-tap-development
Title
# singer-tap-development
a

aaron_phethean

02/25/2022, 9:20 AM
Q.  How do we get a decimal type into postgres? First some background - Consider the orders source in tap-shopify, where there are two properties that need types for simple aggregations. created_at - timestamp total_price - monetary amount / decimal I think the follow principles should apply to all taps: • data should arrive unchanged from the tap -> target • data should arrive with an appropriate type in the target To explain further - With no type information, the orders table will have strings as follows. created_at(varchar) | total_price(varchar) 2022-02-25T083132 | 198.00 2022-02-24T163754 | 99.00 2022-02-23T191802 | 99.00 2022-02-23T191727 | 594.00 2022-02-22T123629 | 198.00 2022-02-22T120013 | 99.00 To perform a simple aggregation by day would require the following and is particularly problematic considering support for multiple database types (postgres, snowflake, big query, etc):
Copy code
SELECT
    to_timestamp(created_at, 'YYYY-MM-DDTHH:MI:SS')::date "orders.created_at"
    , SUM(total_price::DECIMAL) "orders.sum_total_price"
  FROM orders
  WHERE to_timestamp(created_at, 'YYYY-MM-DDTHH:MI:SS')::date >= current_date - interval '30 day'
  GROUP BY "orders.created_at"
  ORDER BY "orders.created_at" DESC
I see the following possible options: • tap transforms the types, before serialising the json output  (e.g. 5.99, five dollars and ninety nine cents becomes a number type in the json payload, the risk is that some targets do not use a type with enough bits to store the value correctly) • target transforms the types, with the “format” hint from the tap the target can use the correct type in the database • transform in a data warehouse prepare stage, with dbt staging models Taking ‘created_at’ as an example ‘format’ hint is the best option -
Copy code
"created_at": {
      "format": "date-time",
      "type": "string"
    },
with this implemented the SQL aggregation is simply
Copy code
SELECT
    created_at::date "orders.created_at"
    , SUM(total_price::DECIMAL) "orders.sum_total_price"
  FROM orders
  WHERE created_at::date >= current_date - interval '30 day'
  GROUP BY created_at::date
  ORDER BY created_at::date DESC
My understanding is that a similar hint to ‘total_price’ should yield a decimal type in postgres from that target.
Copy code
"total_price": {
      "format": "singer.decimal",
      "type": "string"
    },
And a query of
Copy code
SELECT
    created_at::date "orders.created_at"
    , SUM(total_price) "orders.sum_total_price"
  FROM orders
  WHERE created_at::date >= current_date - interval '30 day'
  GROUP BY created_at::date
  ORDER BY created_at::date DESC
However, ‘singer.decimal’ in the transferwise variant doesn’t seem to be implemented.