<@U06CE9BN604> Thoughts on handling json schema co...
# announcements
n
@douwe_maan Thoughts on handling json schema combinations when targeting postgres, or do you have a strong recommendation for a different data warehouse?
d
@nevin_morgan You're referring to columns that can hold different types, per the discovered JSON schema? In general, Snowflake is more optimized for data warehouse tasks than Postgres, which is "just" an RDBMS, but I don't know how best to deal with such columns in either. The specific target you'll use (different versions of target-postgres and target-snowflake exist) will also could also make a difference.
n
specifically the anyof / oneof combos choke with https://github.com/meltano/target-postgres.git
@douwe_maan I have been considering snowflake, but I can't figure out for the life of me the cost of it vs something like bigquery or redshift
d
@nevin_morgan An
anyOf
bug was actually fixed by @julian_knight just last week, so that may help you as well: https://github.com/meltano/target-postgres/pull/5
n
would you happen to have any insight on how to figure out the cost of snowflake? I was only looking at postgres because it can be run on my fleet of servers in house relatively easily
that pricing page makes no sense lol
j
In the process of that fix, I considered the possibility of target-postgres treating any type that didn't map to a normal postgres type to a json column. It would be a bit of a pain to query, but you'd be able to support situations like
{"type": ["string", "number"]}
as well as more complicated ones
d
@nevin_morgan Lol, they have an entire Pricing Guide that requires you to register 🤦‍♂️ https://www.snowflake.com/pricing-page-registration-page/ I don't have experience with Snowflake pricing myself though; I'd suggest asking @adam_roderick since you're already working together, I'm sure he'll be able to advise you better on what will be most appropriate in your situation 🙂
n
👍 thanks for that info. I will see if Adam has any insight. I just started a trial to see how fast I burn the credits they give me lol. Meltano has really helped me get the data flowing, but I need to figure out the nexus of all of this which is the warehouse.
n
Hi @nevin_morgan I use Snowflake a lot. I can help you figure out pricing
n
I am in a bit of a unique position. I would need snowflake to come in under the cost of my in house compute resources, or be so much better that it justifies the cost which has been the rub
n
What is your cloud provider? what are is the biggest source of data for your setup? how many users/bi tools will query snowflake?
Snowflake pricing is a combination of storage + compute. In my opinion, storage is very very cheap. It's all about the compute. I think its similar for big query as well. Your storage comes down to storing raw data on gcs vs s3 vs azure blob store for both so dont worry about that
n
I really doubt it will cost me less than 400 a month for 6 or 8 clients pulling data from GA, adwords, bing, facebook, and criteo, plus an eventual snowplow setup
n
compute cost = credits consumed x cost per credit. cost per credit you can assume to be 2 CAD.
if you are going to move to snowplow data I dont think postgres will cut if for you
unless you are okay with migrating in the middle of the project which is usually not a great idea
you can definitely bring the snowflake cost to be under 400 for initial months but eventually you might exceed 400 as more and more data is being crunched
n
that depends on the sort of hardware you have to throw at a problem. I am eyeballing 20 cores / 40 threads of spare capacity with another 48 cores / 96 threads that aren't online yet. with that much compute on tap I can probably get away with https://www.citusdata.com
it's what makes snowflake a tough sell for me. I have an in house rack full of idle servers, so snowflake needs to show value for me that once I did figure out the math on costs earlier today hasn't quite made sense.
n
mmm, if have db experts and resources for maintaining a performant postgresql citusdata does look interesting but still looks like a OLTP to me
n
that has been my exact concern. My ideal scenario is to find an open source OLAP platform I can run on these machines.
snowflake would be my ideal solution, but I am a bit hamstrung by a shoestring budget for outside resources courtesy of good old Covid
n
its all pay per use for snowflake and big query but if you still dont like it. then you can try setting up druid or presto