Hello again, I have a potential use case where I w...
# troubleshooting
j
Hello again, I have a potential use case where I want Meltano (tap-mssql => target-snowflake) to establish the table/column definitions inside Snowflake but I don't want to extract or load any data, nor update any state files. Is that possible through a command? (kind of like BACPAC vs DACPAC ins MSSQL or
pg_dump with --schema-only
in PostGres.
Would that be something funky like setting the batch size to 0 so no rows are ever extracted?
👀 1
I need to create an "empty shell" of our raw data for DBT to create models on with no data in it. Kind of like an EMPTY_DB
@BuzzCutNorman pinging you in case you have any knowledge on how to do this with the tap
b
I don't know of a way to do this with command or config. Honestly I think this would be a new SDK/tap feature. cc @Edgar Ramírez (Arch.dev)
I need to create an "empty shell" of our raw data for DBT to create models
I am a novice DBT users so this might be a well duh question. How would the built out tables assist in creating models. Is there a discovery command that populates the models for you? Just trying to better understand what the blank tables achieve.
j
it's a weird intersection of our business intelligence software and how our upstream data is handled. we need to template reports with no data to give to our clients to publish. the rub is that each client is on a different version of our database which we import data from using tap-mssql and those versions may have schema differences. we can pin a client to a specific version of our elt pipeline but that results in potentially different tables (a column added, renamed...)
the blank tables would allow us to updating the templating of the data being ingested as we can change the column definition
b
Just trying to make sure I have this straight the BI software needs blank template reports that are based off DBT models generated from a discovery or run against the client database tables. I am guessing that is a gross simplification but just trying to understand the flow.
I am seeing this as you create a blank table set based on the client database which you run DBT against creating the necessary tables for the report which are blank so you can create the template reports.
j
Essentially that's the crux of it yes
b
The schema messages that are used by a target to create a table are generated when streams class
sync
method is run. Which is kicked off by the tap class
sync_all
method.
sync_all
is called when a tap class
invoke
method is called. I think invoke is what
meltano invoke
and
meltano run
both start. I am guessing you are envisioning something like
meltano --environment prod invoke tap-mssql--client --schema-only
. That way you can script out the creation of the template reports which I am guessing is kind of a half manual half scripted workflow at the moment. Am I still on a correct train of thought?
j
100% yeah, a
--schema-only
kind of flag would be what i'm looking for
b
Looking at
Stream.sync()
and
Tap.sync_all()
they are decorated with
@t.final
so I can not override them in
tap-mssql
. The next options for override would be
Stream.sync_batches
and
Stream.sync_records
. I will have to play around with this a little and see what comes of it. I am not sure off the top of my head how a target will deal with only schema messages.
❤️ 1
e
By all means log an issue feature request in the SDK repo!
j
I will do so this week. 😊 Also thanks for taking a deep dive Norman.
b
A quick update as a test I over wrote
Stream.sync()
in tap-mssql's
MSSQLStream
class to yield a blank dictionary
{}
which allowed tap-mssql to run and only emit SCHEMA and STATE messages. Sent that to a target and it happily accepted the schema and state messages only and created blank tables.
💯 1
🪄 1