Hello, I was wondering if anyone has used target-...
# plugins-general
g
Hello, I was wondering if anyone has used target-redshift (not the pipelinewise variant) and has found a way to adjust column sizing (rather than having a general column size for all columns). Thanks!
@kai_yokoyama @edward_ryan @corneille_ombang
a
@gunnar - Do you mean for string columns? Storage should be basically identical regardless of the length (since those strings are compressed columnarly) so there may not be much upside to customizing the column size. My understanding is that a million rows of
varchar(MAX)
should take the same amount of space as
varchar(10)
, and the MAX version is less likely to fail inserts if the column gets expanded in the future or if there are outliers in string length.
e
Thanks AJ So it makes most sense to use VarChar(MAX) as standard across all columns? There is no savings by declaring a column for fewer characters (example 50 character first name column vs a 5000 character landing page URL column) @kai_yokoyama @corneille_ombang
a
In the landing table, there should be no difference really in overall storage usage. However, it could have some probably minimal negative impact on downstream queries. According to this documentation page, is that it sounds like downstream temp tables or interim results for complex operations may be impacted since interim tables don’t apply the same compression as the permanent storage tables. Given this, it might be beneficial to try to honor the original string length - but you could also mitigate this by having your next layer of the data pipeline perform the casting.
e
Thanks AJ!!
a
Sure thing - happy to help!