I m using the transferwise snowflake target and having issue Meltano #troubleshooting

I’m using the transferwise snowflake target, and h...

dean_morin

10/07/2021, 10:30 PM

I’m using the transferwise snowflake target, and having issues with really large json blobs. I’m bringing data in from a postgres table with a jsonb column, where there are entries that are > 16MB. The size limit for both the variant & varchar types in snowflake is 16MB. Can anyone think of a workaround for this? There are only a few rows where the size is so large, I’d be happy with truncating them or maybe even excluding those rows. Is this something I could configure in meltano, or is it something I’d need to figure out with the target?

aaronsteers

10/07/2021, 10:42 PM

Hi, @dean_morin. No perfect ideas come to mind. Options I can think of: 1. Fork the tap or target to make this fail gracefully. 2. Override the tap schema in the hopes that a forced shorter data type length will coerce the data to be gracefully truncated. (Unclear if this would be fatal, depends on implementation details in the tap and target.) 3. Deselect the column. 4. Use a stream map transformation. (Unfortunately this probably won't work because we don't have the standalone mapper built yet, but this would work if either the Tap or Target was SDK-based.)

aaronsteers

10/07/2021, 10:44 PM

5. Create a view in your upstream (along with some type of casting/truncating) and pull from the view instead of the raw table.

dean_morin

10/07/2021, 11:14 PM

That last one is a really good thought, probably the easiest way of going about it…

dean_morin

10/07/2021, 11:15 PM

How do I know if a tap/target is SDK-based? Would those have been developed from the get-go using the SDK?

dean_morin

10/07/2021, 11:16 PM

Oh, although it’s a really big table, so a view probably won’t work because I’d need to do a full extraction every time 🤔

dean_morin

10/07/2021, 11:18 PM

Man, those stream maps look really useful, is making them a standalone thing a near term thing? Maybe I could wait for that…

aaronsteers

10/07/2021, 11:19 PM

You should be able to use the same incremental replication key and primary key behaviors on a view as on a table, although both have to be set manually in Meltano.yml. Only log-based replication does not work on views, as far as I know.

aaronsteers

10/07/2021, 11:21 PM

And thanks for the feedback on Stream Maps! We're extremely excited to bring these to "regular" taps and targets. This feature is in the 1-3 months' horizon. (Being intentionally non-committal to not overpromise. 😅 ) Cc @taylor, as we were just discussing prioritization earlier today.

aaronsteers

10/07/2021, 11:24 PM

To your last question, @taylor and @pat_nadolny just announced a new flag on the #C01UGBSJNG5 to show if a tap or target is built on the SDK. (Although I can speak from memory that Postgres and Snowflake are not yet ported.)

dean_morin

10/08/2021, 12:44 AM

Thanks! I think I tried incremental loading with a view a few months back, but probably din’t set the keys as you mentioned. Yeah, the filtering in stream maps looks awesome, but so the ability to strip out PII!

Open in Slack

Previous Next