We're trying out `target-redshift` to pull in some...
# troubleshooting
c
We're trying out
target-redshift
to pull in some data to a database in our Redshift cluster. • In our test VPC it creates schemas/tables/columns with lowercase. • In our Prod VPC it creates schemas/tables/columns with uppercase. Any ideas for the difference in behaviour? Trying to nail it down so that I can run the DBT transforms that were expecting lowercase.
j
Thats strange.. Do you have the same
target-redshift
variant on both environments? What are the variant(s)?
c
Both are using:
transferwise
variant. The only config change was: `host`/`s3_bucket` in
meltano.yml
and `TARGET_REDSHIFT_PASSWORD`/`MELTANO_STATE_BACKEND_URI` in
.env
to swap between the test/prod VPC's.
With the above, it's the same docker container and then
--environment
+
.env
file to re-use it.
j
Another thing.. I thought redshift is case insensitive, hence this would work no problem.
Copy code
create temporary table MY_TABLE (
    id int
);

select ID from my_table;
This is not however the case, if you quote the identifiers in dbt. Perhaps an option could be to disable the quotations 🤷
c
That makes sense on the DBT-side. Was thinking of trying lookups without quotes. However, I'd like to figure out what was the cause for the difference in behaviour on the Meltano side.
j
For some reason I feel like transferwise target always lowercase the identifiers, but just in case it does not.. do you have exactly same taps / config for the different envs?
Might be that transferwise target re-uses the identifiers as is and it is the source that is creating the problem? 🤷
c
I guess that's the only difference between the two. I'm using
tap-spreadsheets-anywhere
to pull in CSV's. Did it iteratively on the Test VPC, with new CSV's being dropped in regularly. Had a few more files added since the last clean run. Prod VPC has all of the files being pulled-in in one go.
j
Hmm.. I don't think the tap is a variable here. Well for what it is worth.. transferwise's
target_redshift
will lowercase the identifiers here is the one for schema https://github.com/transferwise/pipelinewise-target-redshift/blob/80796b6c6e4647a85ddc9492e343547a027258e4/target_redshift/db_sync.py#L593 you can see the use of
.lower()
String method. Are you certain the tables objects were not created in PROD by something else and the target simply re-using them? Also can confirm looking at our redshift pipeline. In exporter's (
mssql
) stream I have a
Copy code
dbo-Myschema.MyTable
which then is translated to and sinked into Redshift's
Copy code
<default_schema>.mytable
I am also not able to create uppercase table names in redshift šŸ˜„ (SERVERLESS)
Copy code
CREATE TABLE "UPPER_TABLE_NAME"
(
    id INT
);

CREATE TABLE lower_table_name
(
    id INT
);

SELECT
    table_name
FROM information_schema.tables
WHERE table_name ILIKE '%table_name';

DROP TABLE lower_table_name;
DROP TABLE upper_table_name;
The lookup on information_schema yields
Copy code
+------------------+
| table_name       |
+------------------+
| upper_table_name |
| lower_table_name |
+------------------+
🤷
c
On the Prod VPC, I created the database in Redshift after a failed meltano run (Keep forgetting that
target-redshift
doesn't create the database if it doesn't exist), then did another run to create & populate the schema. On the Test VPC, to do a "clean" run, I usually do a cycle of dropping schemas + deleting the S3 state before doing a run.
j
If you are able to produce a MRE on this, I'd love to try to reproduce it šŸ˜•
c
I'll have a play with the Prod environment and let you know in a bit...
Hmmm.... well, it looks like not all Redshift's are configured the same: • Prod Redshift:
Copy code
table_name
UPPER_TABLE_NAME
lower_table_name
• Test Redshift:
Copy code
table_name
upper_table_name
lower_table_name
Think I need to dig into the Redshift docs and pass on a config change to the Ops team. Thanks for helping me dig @janis_puris!!
j
c
^^ awesome! Thanks for the link. You're definitely making my afternoon debug session a lot easier!! šŸ˜„
j
I was wrong about PW's
target-redshift
emitting only lowercase identifiers! Here is where it retrieves the identifier for
CREATE TABLE ...
statement 1. It will lowercase the table name from config 2. Then when it assembles the "path" i.e.
[<schema>.]<table>
it will uppercase the path with
.upper()
Additionally it will quote the table name in
"
This coupled with the preservation of the identifier casing on PROD VPC Redshift's config, would then result in the disparity you're observing.
c
Confirmed:
SHOW enable_case_sensitive_identifier;
is
on
in our Production Redshift, but
off
in our Test Redshift. Will chat with our Data Team to figure out the best way of cleaning up. --- Had also found the
.upper()
that you had just found, but now all of this is making more sense around why there is a difference.
From quick testing, I'm still getting the same behaviour on Our Prod Redshift after: • Dropping schemas in Redshift. • Deleting Meltano state from S3. •
SET enable_case_sensitive_identifier to off;
in Prod database. • Re-run meltano extraction/load. So it looks like there is a bit more to chip away at.
j
you are not able to reproduce this, right? https://meltano.slack.com/archives/C01TCRBBJD7/p1689692478586699?thread_ts=1689689366.984449&amp;cid=C01TCRBBJD7 In other words.. both TEST and PROD produce same table names
c
With your above code +
SET enable_case_sensitive_identifier to off;
, the Upper table is created as:
upper_table_name
. So that's good. Just trying to figure out what
target-redshift
is playing at now.
j
Are you sure that
SET
does not simply change the setting for your session only?
Can you do the test again in another session just to be sure?
c
doh
Yep, I'm only setting it for my session. Good catch!
i
The
safe_column_name
function is the root of the issue it seems, and there's no way to circumvent it. https://github.com/transferwise/pipelinewise-target-redshift/blob/80796b6c6e4647a85ddc9492e343547a027258e4/target_redshift/db_sync.py#L94
c
Learning Redshift. Looks like I need to chat to Our Ops team to set it permanently (https://docs.aws.amazon.com/redshift/latest/dg/cm_chap_ConfigurationRef.html#t_Modifying_the_default_settings).
Did a quick proof of concept with:
ALTER USER <user> SET enable_case_sensitive_identifier to off;
, and that now sets my DB to be the same as Our Test instance. So Meltano + DBT runs complete. That's good enough to get it off my plate for a little while and pass it over with a lovely list of caveats šŸ˜„ Thanks again @janis_puris for your help!!!