Hello guys! Does anyone have some experience loadi...
# best-practices
a
Hello guys! Does anyone have some experience loading JSON fields to Redshift? Our idea is to load these fields as SUPER tpye: https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html We’re using pipelinewise’s target-redshift: https://github.com/transferwise/pipelinewise-target-redshift Json fields are loaded as
varchar
by default, which I think is not ideal. Maybe there is a config option using
schema_mapping
to parse these to SUPER? Our should we do this as a transform step?
m
We’ve just started experimenting with this. We added the following change to our fork of this target just yesterday:
Copy code
---
 target_redshift/db_sync.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/target_redshift/db_sync.py b/target_redshift/db_sync.py
index 0c4f0a4..fd00a7c 100644
--- a/target_redshift/db_sync.py
+++ b/target_redshift/db_sync.py
@@ -51,7 +51,9 @@ def column_type(schema_property, with_length=True):
     varchar_length = DEFAULT_VARCHAR_LENGTH
     if schema_property.get('maxLength', 0) > varchar_length:
         varchar_length = LONG_VARCHAR_LENGTH
-    if 'object' in property_type or 'array' in property_type:
+    if 'object' in property_type:
+        column_type = 'super'
+    if 'array' in property_type:
         column_type = 'character varying'
         varchar_length = LONG_VARCHAR_LENGTH
following the comment in https://github.com/transferwise/pipelinewise-target-redshift/issues/119
we haven’t really tested this yet, but our source data is in MongoDB and our current process loads the entire MongoDB document into one Redshift column as VARCHAR type. We have some MongoDB documents that are so large they exceed the VARCHAR max limit so we are looking into alternative approaches.
fwiw my fork is here https://github.com/transferwise/pipelinewise-target-redshift/compare/master...menzenski:pipelinewise-target-redshift:py-311# it doesn’t have many changes, but the maintainers of the upstream target seem to have abandoned it 😢
a
wow that is super cool, thanks! 🙇 we also had some problem with the abandonment, and had to use a python 3.11 compatible fork