https://linen.dev logo
#random
Title
# random
h

Henning Holgersen

09/25/2022, 6:58 PM
This is in some ways tangential to Meltano, but I have been working on a snowflake->SQL Server pipeline, and when sketching the SQL Server target it appeared that bulk load via azure storage would be the most efficient. Since Snowflake can dump to azure storage, it would be inefficient to pipe each record in this case. But I like the singer/meltano scaffolding. I think it would be feasible to write a tap+sink that uses records of references to the blobs written instead of records of data. Has anyone done anything like this? I’m a little stuck on the metadata part. The “record” would be something like
{'type': 'RECORD', 'stream': 'ECON_DATASETS', 'record': {'FILE_NAME': '<azure://abc.blob.core.windows.net/export/<...>.csv>', 'SIZE_BYTES': 851, 'MD5': 'b67e6ac5d0e68ffb3b4feb192bcd58f7', 'LAST_MODIFIED': 'Sun, 25 Sep 2022 14:00:52 GMT'}, 'version': 1, 'time_extracted': '2022-09-25T15:31:53.209274Z'}
, but I would like to append the actual data schema too. Especially since plain SQL Servers seems to prefer CSVs.