New thread to talk about the <SQL Server collation...
# singer-target-development
a
New thread to talk about the SQL Server collation mismatches issue. 🧵
@visch, @Henning Holgersen @ken_payne 👆
h
Some thinking out loud: This is based on reading the
singer_sdk/connectors/sql.py
file, I see that the comparison in the merge and in the adapt_column_type methods are a little different, but it looks like they both render the collation as part of the type comparison. Column types are returned from the database, and explicitly cast by the sdk into a
sqlalchemy.types.TypeEngine
type. This is then compared to a
sqlalchemy.types.TypeEngine
that has been created by the
to_sql_type
method from a json-schema definition (i.e. never touched the database). The comparison is in the form of
if str(sql_type) == str(current_type):
, and the
str
function prints not just VARCHAR() but also the collation, as a string, so it doesn’t match. That is in the adapt_column_type method. somewhat different in the merge method. Two ways to solve this: Generic in case other databases have similar issues when the dialect prints something more: use some method other than
str
for comparison. Sqlalchemy has some abstract comparator method but that might not be the way to go, idk, we might have to create our own little column comparator method. Specific for mssql, we could make sure the varchar/nvarchar types generated from the singer schema is given the correct collation, which is for most cases should be easy enough to do by checking the collation in the database. As of yet, I don’t know why this isn’t a problem for Derek’s mssql target.
a
Sounds like a good summary and good path forward. 👍
I originally thought that sql alchemy would have built-in understanding of type compatibility or perhaps type inheritance. Short of some built-in compatibility checks from that library, I think just more advanced comparison or compatibility checks on the SDK side are a good path forward.
h
Derek mentioned this might be very mssql specific and I think he’s right, sorry I dragged this up as a generic issue without checking more.
v
@Henning Holgersen I know I don't do any merges at all in that target-mssql so I bypass this issue by not having to deal with old tables. I do merges but it's in DBT on my side, those are very regular merge statements in sql (no collation specific stuff) so it's sounding more and more like a sqlalchemy specefic thing
/ sdk specefic thing str(type) looks pretty close 😄
b
I don't know if this will help you guys but I was able to recreate the issue on my mssql server with two databases and these SQL command.
Copy code
/*********************************
Create the simple test table
in an MSSQL database
*********************************/
CREATE TABLE SampleData.dbo.testme (
    Id int IDENTITY(1,1) PRIMARY KEY,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int
);
go
/*********************************
Insert some test data into the
test table
*********************************/
INSERT INTO [SampleData].[dbo].[testme]
           ([LastName]
           ,[FirstName]
           ,[Age])
     VALUES
           ('Norman','BuzzCut',20)
           ,('Duck','Donald',90)
		   ,('Mouse','Mickey',100)
go
/********************************
Create another database as a target
in MSSQL and test meltano run,
then truncate and run again
********************************/
truncate table datawarehouse.dbo.testme
/******************************
Truncate the target table again
then run this and you will get the
error if you do a meltano run
*******************************/
ALTER TABLE datawarehouse.dbo.testme
    ALTER COLUMN LastName VARCHAR(255) COLLATE Latin1_General_100_CI_AI_SC;
c
Silly question: Why
varchar()
and not
nvarchar()
?
b
I didn't think of it when creating the repo script. I will give it a try and see if that works differently.
h
I have not seen any difference with nvarchar. I’ll type up some hopefully reproducible examples tomorrow.
b
Just gave it a try it and like @Henning Holgersen said no difference between
nvarchar()
and
varchar()
you end up with the same error.
I can say that it only matter if the target column has collation set. If I set it on the tap column it doesn't come over. I don't get the error until I set collation on the target column.
c
Make sense. Didn't think it would make a difference. Was just curious.
v
Yeah and it seems that if we wanted to go down this rabbit hole and make choices for folks since server 2019 varchar supports UTF8 collation, so that's probably the best way to go now. Probably not something we want to push people on though https://stackoverflow.com/questions/612430/when-must-we-use-nvarchar-nchar-instead-of-varchar-char-in-sql-server#:~:text=You%20should%20use%20NVARCHAR%20%2F%20NCHAR,t%20support%20the%20characters%20needed. Good writeup on all this. target-mssql should probably stay agnostic to all of it though
2nd answer is a really good write up
c
The marvellous world of Microsoft and backward compatibility .... everybody else pretty much just moved on and forgot about the time before Unicode ... 😂
That 2nd answer is indeed a very good write up! https://stackoverflow.com/a/63637996/6056177
b