Hello all! I was wondering what redshift loader pe...
# plugins-general
g
Hello all! I was wondering what redshift loader people are using / recommend? https://pypi.org/project/target-redshift/ I have used this one in the past. Let me know! Thanks!
d
@gunnar That one should work, but @ingo_klose has also had success with https://github.com/transferwise/pipelinewise-target-redshift and is in the process of contributing out-of-the-box support to Meltano: https://gitlab.com/meltano/meltano/-/merge_requests/2047
g
Thank you! I think I will continue with my setup using target-redshift rather than pipelinewise-target-redshift Thanks for the input!!
d
Sounds good, if that one works for you there's no reason to switch 🙂
g
@douwe_maan, Seemed to work fine (connection established in the beginning) until I ran into an error when it began uploading: They all relate to target-redshift CRITICAL cursor already closed psycopg2.OperationalError: SSL SYSCALL error: EOF detected During handling of the above exception, another exception occurred: psycopg2.InterfaceError: cursor already closed During handling of the above exception, another exception occurred: psycopg2.InterfaceError: connection already closed
d
This is the same issue you ran into last week, right? https://meltano.slack.com/archives/CFG3C3C66/p1613518350206800 Did you try the solution we discussed then?
g
The solution did not work unfortunately, however, I will attempt to add a handler to re-establish the connection.
n
Late to this thread, but noting transferwise variant has been working well for me so far!
g
@nick_hamlin Hello! I was curious about a couple things regarding the transferwise variant: • What properties did you use in your config file (settings when adding a custom loader to meltano) • The executable for the transferwise variant seems to be target-redshift as well, is that correct?
n
@gunnar, yep - here’s what’s in my
meltano.yml
for this:
Copy code
loaders:
  - name: pipelinewise-target-redshift
    namespace: pipelinewise_target_redshift
    pip_url: pipelinewise-target-redshift
    executable: target-redshift
    config:
      host: <REDACTED>
      port: <REDACTED>
      user: <REDACTED>
      password: <REDACTED>
      dbname: <REDACTED>
      aws_access_key_id: <REDACTED>
      aws_secret_access_key: <REDACTED>
      s3_bucket: <REDACTED>
      default_target_schema: <REDACTED>
      primary_key_required: false
g
@nick_hamlin Amazing! Thanks for your help!
n
no problem, let me know if you need anything else!
g
So far so good! Other than any permission bugs, it seems to be working great. I'll confirm once my test run has fully completed!
@nick_hamlin I noticed during table creation, for varchar() columns it seems to be creating these large datatypes. EX: varchar(10000) and varchar(65535) in my Redshift table Have you had any experience with adjusting how these column types are created, or why they might auto-generate these large values? I can elaborate if needed. Thanks!!
n
I’m definitely seeing the same behavior, but haven’t yet tried to optimize it. I’m assuming it’s just giving itself as much space as it thinks it might possibility need (since it doesn’t necessarily know the max length of the data in a text column at the time that it creates the tables)
My first inclination would be to run the pipeline once, let it make the tables, then manually adjust the column sizes and rerun. It doesn’t do DROP/CREATE operations if the table already exists, so I’d think that’d probably work?
(full disclosure that I haven’t actually tried that yet though )
Update for you @gunnar: I have still not tried this exact thing, but I did just do something similar. The target incorrectly assumed that a particular column should be NOT NULLABLE, causing the pipeline to fail when I tried to load the data (which has nulls). I was able to drop/re-add the column as NULLABLE (redshift doesn’t support direct ALTERing of columns, but this gets the same result). Works fine now, so I’d think you could do the same thing to modify VARCHAR lengths as needed
g
Very helpful @nick_hamlin! Thanks for the input! I'll forsure give this a go, and fill you in if I come across anything interesting!
d
Please don't forget to file issues for any bugs or other oddities you run into with the target!
g
Have you run into any issues @nick_hamlin with running a long ELT pipeline and the connection closing or a message saying too much time between initial request and upload? Curious if I need to put anything in place so I can avoid wasting time as some of the ELT processes I will be running could take days+
I came across this error when running an ELT (this happened with a run that took 24 hours, as well as a fresh one that took only approx 30 minutes) I am using a Shopify Tap and Pipelinewise-Target-Redshift
Copy code
Loading failed (1): boto3.exceptions.S3UploadFailedError: Failed to upload /tmp/orders_yd41kqxp.csv.1 to meltano-bucket/pipelinewise_orders_20210304-161342-663010.csv.1: An error occurred (RequestTimeTooSkewed) when calling the CreateMultipartUpload operation: The difference between the request time and the current time is too large.
meltano                      | ELT could not be completed: Loader failed
ELT could not be completed: Loader failed
Please let me know if anyone has had any similar experiences.
d
@gunnar That's weird! Did you see https://stackoverflow.com/questions/25964491/aws-s3-upload-fails-requesttimetooskewed? The first answer suggests running
curl <http://s3.amazonaws.com> -v; date -u
and checking the time drift
g
I'll check that out now, thanks for the help as always @douwe_maan!
As an update for anyone interested: • I checked the timing as it lays out in that link and it looks like there is no time drift (there may have been the other day prior to me restarting my computer) • I found on that stack overflow link that some resolved the issue by simply restarting docker or their computer and it worked. • I currently am loading in a test using pipelinewise-target-redshift to see if the error is resolved... If I come across the error again, I will continue to work through it and will post the solution once I resolve the issue.
All set! Worked and loaded into Redshift! Nice touch was the auto-cleaning of the S3 bucket it initially loads into.
@nick_hamlin Do you have any suggestions for data_flattening_max_level configuration? I noticed it defaults to 0, however if I want to ensure that all table columns will always be flattened I assume I can just put a large integer in there.
n
Haven’t done anything with that, but that seems like a reasonable idea?