Hello all I was wondering what redshift loader people are us Meltano #plugins-general

Hello all! I was wondering what redshift loader pe...

gunnar

02/24/2021, 8:53 PM

Hello all! I was wondering what redshift loader people are using / recommend? https://pypi.org/project/target-redshift/ I have used this one in the past. Let me know! Thanks!

douwe_maan

02/24/2021, 8:57 PM

@gunnar That one should work, but @ingo_klose has also had success with https://github.com/transferwise/pipelinewise-target-redshift and is in the process of contributing out-of-the-box support to Meltano: https://gitlab.com/meltano/meltano/-/merge_requests/2047

gunnar

02/24/2021, 9:00 PM

Thank you! I think I will continue with my setup using target-redshift rather than pipelinewise-target-redshift Thanks for the input!!

douwe_maan

02/24/2021, 9:01 PM

Sounds good, if that one works for you there's no reason to switch 🙂

gunnar

02/24/2021, 10:15 PM

@douwe_maan, Seemed to work fine (connection established in the beginning) until I ran into an error when it began uploading: They all relate to target-redshift CRITICAL cursor already closed psycopg2.OperationalError: SSL SYSCALL error: EOF detected During handling of the above exception, another exception occurred: psycopg2.InterfaceError: cursor already closed During handling of the above exception, another exception occurred: psycopg2.InterfaceError: connection already closed

douwe_maan

02/24/2021, 10:18 PM

This is the same issue you ran into last week, right? https://meltano.slack.com/archives/CFG3C3C66/p1613518350206800 Did you try the solution we discussed then?

gunnar

02/24/2021, 10:20 PM

The solution did not work unfortunately, however, I will attempt to add a handler to re-establish the connection.

nick_hamlin

03/01/2021, 5:33 PM

Late to this thread, but noting transferwise variant has been working well for me so far!

gunnar

03/03/2021, 4:52 PM

@nick_hamlin Hello! I was curious about a couple things regarding the transferwise variant: • What properties did you use in your config file (settings when adding a custom loader to meltano) • The executable for the transferwise variant seems to be target-redshift as well, is that correct?

nick_hamlin

03/03/2021, 4:56 PM

@gunnar, yep - here’s what’s in my

meltano.yml

for this:

Copy code

loaders:
  - name: pipelinewise-target-redshift
    namespace: pipelinewise_target_redshift
    pip_url: pipelinewise-target-redshift
    executable: target-redshift
    config:
      host: <REDACTED>
      port: <REDACTED>
      user: <REDACTED>
      password: <REDACTED>
      dbname: <REDACTED>
      aws_access_key_id: <REDACTED>
      aws_secret_access_key: <REDACTED>
      s3_bucket: <REDACTED>
      default_target_schema: <REDACTED>
      primary_key_required: false

gunnar

03/03/2021, 4:57 PM

@nick_hamlin Amazing! Thanks for your help!

nick_hamlin

03/03/2021, 4:58 PM

no problem, let me know if you need anything else!

gunnar

03/03/2021, 6:02 PM

So far so good! Other than any permission bugs, it seems to be working great. I'll confirm once my test run has fully completed!

gunnar

03/03/2021, 6:03 PM

@nick_hamlin I noticed during table creation, for varchar() columns it seems to be creating these large datatypes. EX: varchar(10000) and varchar(65535) in my Redshift table Have you had any experience with adjusting how these column types are created, or why they might auto-generate these large values? I can elaborate if needed. Thanks!!

nick_hamlin

03/03/2021, 6:42 PM

I’m definitely seeing the same behavior, but haven’t yet tried to optimize it. I’m assuming it’s just giving itself as much space as it thinks it might possibility need (since it doesn’t necessarily know the max length of the data in a text column at the time that it creates the tables)

nick_hamlin

03/03/2021, 6:44 PM

My first inclination would be to run the pipeline once, let it make the tables, then manually adjust the column sizes and rerun. It doesn’t do DROP/CREATE operations if the table already exists, so I’d think that’d probably work?

nick_hamlin

03/03/2021, 6:44 PM

(full disclosure that I haven’t actually tried that yet though )

nick_hamlin

03/03/2021, 9:45 PM

Update for you @gunnar: I have still not tried this exact thing, but I did just do something similar. The target incorrectly assumed that a particular column should be NOT NULLABLE, causing the pipeline to fail when I tried to load the data (which has nulls). I was able to drop/re-add the column as NULLABLE (redshift doesn’t support direct ALTERing of columns, but this gets the same result). Works fine now, so I’d think you could do the same thing to modify VARCHAR lengths as needed

gunnar

03/04/2021, 4:52 PM

Very helpful @nick_hamlin! Thanks for the input! I'll forsure give this a go, and fill you in if I come across anything interesting!

douwe_maan

03/04/2021, 4:54 PM

Please don't forget to file issues for any bugs or other oddities you run into with the target!

gunnar

03/04/2021, 5:19 PM

Have you run into any issues @nick_hamlin with running a long ELT pipeline and the connection closing or a message saying too much time between initial request and upload? Curious if I need to put anything in place so I can avoid wasting time as some of the ELT processes I will be running could take days+

gunnar

03/04/2021, 5:37 PM

I came across this error when running an ELT (this happened with a run that took 24 hours, as well as a fresh one that took only approx 30 minutes) I am using a Shopify Tap and Pipelinewise-Target-Redshift

Copy code

Loading failed (1): boto3.exceptions.S3UploadFailedError: Failed to upload /tmp/orders_yd41kqxp.csv.1 to meltano-bucket/pipelinewise_orders_20210304-161342-663010.csv.1: An error occurred (RequestTimeTooSkewed) when calling the CreateMultipartUpload operation: The difference between the request time and the current time is too large.
meltano                      | ELT could not be completed: Loader failed
ELT could not be completed: Loader failed

Please let me know if anyone has had any similar experiences.

douwe_maan

03/04/2021, 5:47 PM

@gunnar That's weird! Did you see https://stackoverflow.com/questions/25964491/aws-s3-upload-fails-requesttimetooskewed? The first answer suggests running

curl <http://s3.amazonaws.com> -v; date -u

and checking the time drift

gunnar

03/04/2021, 5:48 PM

I'll check that out now, thanks for the help as always @douwe_maan!

gunnar

03/05/2021, 4:18 PM

As an update for anyone interested: • I checked the timing as it lays out in that link and it looks like there is no time drift (there may have been the other day prior to me restarting my computer) • I found on that stack overflow link that some resolved the issue by simply restarting docker or their computer and it worked. • I currently am loading in a test using pipelinewise-target-redshift to see if the error is resolved... If I come across the error again, I will continue to work through it and will post the solution once I resolve the issue.

gunnar

03/05/2021, 4:34 PM

All set! Worked and loaded into Redshift! Nice touch was the auto-cleaning of the S3 bucket it initially loads into.

gunnar

03/05/2021, 5:00 PM

@nick_hamlin Do you have any suggestions for data_flattening_max_level configuration? I noticed it defaults to 0, however if I want to ensure that all table columns will always be flattened I assume I can just put a large integer in there.

nick_hamlin

03/05/2021, 5:01 PM

Haven’t done anything with that, but that seems like a reasonable idea?

Open in Slack

Previous Next