Hello guys Does anyone have some experience loading JSON fie Meltano #best-practices

Hello guys! Does anyone have some experience loadi...

adrian_soltesz

04/05/2023, 12:30 PM

Hello guys! Does anyone have some experience loading JSON fields to Redshift? Our idea is to load these fields as SUPER tpye: https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html We’re using pipelinewise’s target-redshift: https://github.com/transferwise/pipelinewise-target-redshift Json fields are loaded as

varchar

by default, which I think is not ideal. Maybe there is a config option using

schema_mapping

to parse these to SUPER? Our should we do this as a transform step?

Matt Menzenski

04/05/2023, 1:51 PM

We’ve just started experimenting with this. We added the following change to our fork of this target just yesterday:

Copy code

---
 target_redshift/db_sync.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/target_redshift/db_sync.py b/target_redshift/db_sync.py
index 0c4f0a4..fd00a7c 100644
--- a/target_redshift/db_sync.py
+++ b/target_redshift/db_sync.py
@@ -51,7 +51,9 @@ def column_type(schema_property, with_length=True):
     varchar_length = DEFAULT_VARCHAR_LENGTH
     if schema_property.get('maxLength', 0) > varchar_length:
         varchar_length = LONG_VARCHAR_LENGTH
-    if 'object' in property_type or 'array' in property_type:
+    if 'object' in property_type:
+        column_type = 'super'
+    if 'array' in property_type:
         column_type = 'character varying'
         varchar_length = LONG_VARCHAR_LENGTH

following the comment in https://github.com/transferwise/pipelinewise-target-redshift/issues/119

Matt Menzenski

04/05/2023, 1:52 PM

we haven’t really tested this yet, but our source data is in MongoDB and our current process loads the entire MongoDB document into one Redshift column as VARCHAR type. We have some MongoDB documents that are so large they exceed the VARCHAR max limit so we are looking into alternative approaches.

Matt Menzenski

04/05/2023, 1:55 PM

fwiw my fork is here https://github.com/transferwise/pipelinewise-target-redshift/compare/master...menzenski:pipelinewise-target-redshift:py-311# it doesn’t have many changes, but the maintainers of the upstream target seem to have abandoned it 😢

adrian_soltesz

04/05/2023, 1:59 PM

wow that is super cool, thanks! 🙇 we also had some problem with the abandonment, and had to use a python 3.11 compatible fork

Open in Slack

Previous Next