craig_astill
07/18/2023, 2:09 PMtarget-redshift
to pull in some data to a database in our Redshift cluster.
⢠In our test VPC it creates schemas/tables/columns with lowercase.
⢠In our Prod VPC it creates schemas/tables/columns with uppercase.
Any ideas for the difference in behaviour? Trying to nail it down so that I can run the DBT transforms that were expecting lowercase.janis_puris
07/18/2023, 2:25 PMtarget-redshift
variant on both environments?
What are the variant(s)?craig_astill
07/18/2023, 2:30 PMtransferwise
variant.
The only config change was: `host`/`s3_bucket` in meltano.yml
and `TARGET_REDSHIFT_PASSWORD`/`MELTANO_STATE_BACKEND_URI` in .env
to swap between the test/prod VPC's.craig_astill
07/18/2023, 2:31 PM--environment
+ .env
file to re-use it.janis_puris
07/18/2023, 2:32 PMcreate temporary table MY_TABLE (
id int
);
select ID from my_table;
This is not however the case, if you quote the identifiers in dbt. Perhaps an option could be to disable the quotations š¤·craig_astill
07/18/2023, 2:33 PMjanis_puris
07/18/2023, 2:34 PMjanis_puris
07/18/2023, 2:35 PMcraig_astill
07/18/2023, 2:39 PMtap-spreadsheets-anywhere
to pull in CSV's. Did it iteratively on the Test VPC, with new CSV's being dropped in regularly. Had a few more files added since the last clean run.
Prod VPC has all of the files being pulled-in in one go.janis_puris
07/18/2023, 2:44 PMtarget_redshift
will lowercase the identifiers
here is the one for schema
https://github.com/transferwise/pipelinewise-target-redshift/blob/80796b6c6e4647a85ddc9492e343547a027258e4/target_redshift/db_sync.py#L593
you can see the use of .lower()
String method.
Are you certain the tables objects were not created in PROD by something else and the target simply re-using them?
Also can confirm looking at our redshift pipeline.
In exporter's (mssql
) stream I have a
dbo-Myschema.MyTable
which then is translated to and sinked into Redshift's
<default_schema>.mytable
janis_puris
07/18/2023, 2:49 PMCREATE TABLE "UPPER_TABLE_NAME"
(
id INT
);
CREATE TABLE lower_table_name
(
id INT
);
SELECT
table_name
FROM information_schema.tables
WHERE table_name ILIKE '%table_name';
DROP TABLE lower_table_name;
DROP TABLE upper_table_name;
The lookup on information_schema yields
+------------------+
| table_name |
+------------------+
| upper_table_name |
| lower_table_name |
+------------------+
š¤·craig_astill
07/18/2023, 2:49 PMtarget-redshift
doesn't create the database if it doesn't exist), then did another run to create & populate the schema.
On the Test VPC, to do a "clean" run, I usually do a cycle of dropping schemas + deleting the S3 state before doing a run.janis_puris
07/18/2023, 2:50 PMcraig_astill
07/18/2023, 2:52 PMcraig_astill
07/18/2023, 3:01 PMtable_name
UPPER_TABLE_NAME
lower_table_name
⢠Test Redshift:
table_name
upper_table_name
lower_table_name
Think I need to dig into the Redshift docs and pass on a config change to the Ops team.
Thanks for helping me dig @janis_puris!!janis_puris
07/18/2023, 3:04 PMcraig_astill
07/18/2023, 3:08 PMjanis_puris
07/18/2023, 3:13 PMtarget-redshift
emitting only lowercase identifiers!
Here is where it retrieves the identifier for CREATE TABLE ...
statement
1. It will lowercase the table name from config
2. Then when it assembles the "path" i.e. [<schema>.]<table>
it will uppercase the path with .upper()
Additionally it will quote the table name in "
This coupled with the preservation of the identifier casing on PROD VPC Redshift's config, would then result in the disparity you're observing.craig_astill
07/18/2023, 3:15 PMSHOW enable_case_sensitive_identifier;
is on
in our Production Redshift, but off
in our Test Redshift.
Will chat with our Data Team to figure out the best way of cleaning up.
---
Had also found the .upper()
that you had just found, but now all of this is making more sense around why there is a difference.craig_astill
07/18/2023, 3:53 PMSET enable_case_sensitive_identifier to off;
in Prod database.
⢠Re-run meltano extraction/load.
So it looks like there is a bit more to chip away at.janis_puris
07/18/2023, 3:54 PMcraig_astill
07/18/2023, 3:57 PMSET enable_case_sensitive_identifier to off;
, the Upper table is created as: upper_table_name
. So that's good.
Just trying to figure out what target-redshift
is playing at now.janis_puris
07/18/2023, 3:58 PMSET
does not simply change the setting for your session only?janis_puris
07/18/2023, 3:58 PMcraig_astill
07/18/2023, 4:00 PMcraig_astill
07/18/2023, 4:00 PMian_lewis
07/18/2023, 4:01 PMsafe_column_name
function is the root of the issue it seems, and there's no way to circumvent it.
https://github.com/transferwise/pipelinewise-target-redshift/blob/80796b6c6e4647a85ddc9492e343547a027258e4/target_redshift/db_sync.py#L94craig_astill
07/18/2023, 4:03 PMcraig_astill
07/18/2023, 4:11 PMALTER USER <user> SET enable_case_sensitive_identifier to off;
, and that now sets my DB to be the same as Our Test instance. So Meltano + DBT runs complete.
That's good enough to get it off my plate for a little while and pass it over with a lovely list of caveats š
Thanks again @janis_puris for your help!!!