Daniel Luo
06/17/2024, 9:12 PMTAP_MSSQL__SELECT_FILTER
? I have tried specifying tap_stream_id
as outputted by dumping the tap, which is in the format schema-tablename
. However, when running meltano run tap-mssql target-snowflake
with set TAP_MSSQL__SELECT_FILTER='["schema-tablename"]'
, it doesn't detect anything to sync. If I specify schema-tablename.*
instead, then it runs everything that I have selected in my meltano.yml
file, instead of just the single table. I cannot run meltano el --select
since I'm on Windows apparently.
2. Is there a way to use the same order of columns in the source when the destination table gets created? It's not very convenient when everything is jumbled up since many source tables have columns ordered in a specific way so that a select * from table
gives easily readable data.
3. Is it possible to output the .gz
archives to a different directory?Edgar Ramírez (Arch.dev)
06/18/2024, 11:32 AMTAP_MSSQL__SELECT_FILTER='["schema-tablename"]'
. meltano install extractor tap-mssql --clean
should do the trick. If that works, it means you're running into a cached catalog. We have an upcoming CLI option to help with that: https://github.com/meltano/meltano/pull/8580
2. Not currently, but targets usually process the columns in the order they are in the SCHEMA
message so you may be able to affect it by providing a custom order by the schema
extra, with the inconvenience that you'd have to be exhaustive:
plugins:
extractors:
- name: tap-mssql
schema:
my_column_1: {}
my_column_2: {}
I'm not sure if SQLAlchemy can guarantee (usage in the SDK) that columns will be extracted in the same order as in the source db, but I invite to test it out and if it's the case then we can work on a PR to preserve that order when generating the catalog.
3. Which .gz file do you mean?Daniel Luo
06/18/2024, 1:14 PMschema-tablename.*
? Is there a particular reason why el
isn't supported on Windows, or why run
can't have a filter parameter instead of using the env var? Running el --select schema-tablename
has the expected behavior in linux, but it's not convenient for development as our developers are working on windows
2. I can't say for sure, but for what it's worth, I have used dlt which does have a sqlalchemy backend, and they didn't have the problem with the column ordering and I've created the table multiple times through that. I've moved on since they didn't support log based replication.
3. I have a bunch of files in my root directory with the format target-snowflake--schema-tablename-guid-1.json.gz
. They contain the data that I'm presuming is being inserted.Edgar Ramírez (Arch.dev)
06/18/2024, 4:30 PMDoes it matter that the select in the yml file isI'd need to confirm, but you could try commenting it out for testing purposes.?schema-tablename.*
Is there a particular reason whyhttps://github.com/meltano/meltano/issues/2841isn't supported on Windows,el
whySincecan't have a filter parameter instead of using the env var?run
run
can run an arbitrary number of taps, I'd be impossible to determine which tap the parameter applies to 🫤
For column order I've logged https://github.com/meltano/sdk/issues/2489.
I have a bunch of files in my root directory with the formatYou might be able to change that by passing a custom batch config. The default is defined in https://github.com/MeltanoLabs/target-snowflake/blob/0dbfd522347affdd35547a13c40cf20f29bd26d1/target_snowflake/sinks.py#L26-L29.. They contain the data that I'm presuming is being inserted.target-snowflake--schema-tablename-guid-1.json.gz
Daniel Luo
06/18/2024, 6:07 PMEdgar Ramírez (Arch.dev)
06/19/2024, 10:39 AMDaniel Luo
06/20/2024, 2:07 PMEdgar Ramírez (Arch.dev)
06/20/2024, 5:44 PM"select": true
generated by select_filter
?
I know other taps do. For example
$ TAP_GETPOCKET__SELECT_FILTER='["items"]' meltano --log-level=warning invoke --dump=catalog tap-getpocket --about | jq '.streams[0].selected'
true
$ TAP_GETPOCKET__SELECT_FILTER='["items"]' meltano --log-level=warning invoke tap-getpocket | wc -l
752
$ TAP_GETPOCKET__SELECT_FILTER='["items2"]' meltano --log-level=warning invoke --dump=catalog tap-getpocket --about | jq '.streams[0].selected'
false
$ TAP_GETPOCKET__SELECT_FILTER='["items"]' meltano --log-level=warning invoke tap-getpocket | wc -l
1
Can you try TAP_MSSQL__SELECT='["schema-tablename.*"]'
?Daniel Luo
06/20/2024, 5:50 PM2024-06-20T17:46:50.712865Z [warning ] Stream `[` was not found in the catalog
2024-06-20T17:46:50.714869Z [warning ] Stream `"` was not found in the catalog
2024-06-20T17:46:50.714869Z [warning ] Stream `d` was not found in the catalog
2024-06-20T17:46:50.715874Z [warning ] Stream `i` was not found in the catalog
2024-06-20T17:46:50.715874Z [warning ] Stream `m` was not found in the catalog
I tried also several variations of it with/without single/double quotes, or the square brackets. Haven't figured it out yetDaniel Luo
06/20/2024, 5:56 PMset TAP_MSSQL__SELECT=["schema-TableName"]
Daniel Luo
06/20/2024, 5:57 PMDaniel Luo
06/20/2024, 5:58 PM--merge-state
. If I'm planning on running filtered runs, should I always be passing that arg?Edgar Ramírez (Arch.dev)
06/20/2024, 6:16 PMWhat's the difference between SELECT and SELECT_FILTER? Surprised there's two that do what should be the same thingYeah. A bit before my time but some spelunking brought back the commit where it was added in the gitlab MR:
AddSo, not entirely sure 😅 but it does seem to manage the catalog different in a subtly different way that may or may not be supported by all taps.extractor extra to allow extracting a subset of selected entitiesselect_filter
Daniel Luo
06/20/2024, 6:32 PM--merge-state
? The documentation isn't completely clear to me, though that may also be my limited understanding of how meltano works behind the scenes. It sounds like I should always be setting this, though I'm not sure what happens to the state tracking if I don't specify it. Does that filtered run just maintain it's own state somewhere outside of the centralized location? If I only run it filtered, then would it not matter? Is it only important if I run a mix of everything or a filtered subset?Edgar Ramírez (Arch.dev)
06/20/2024, 6:47 PM--merge-state
is ensuring that pipelines that run with non-overlapping select criteria. If you always run with different select criteria, then simply using a --state-id-suffix
is enough to keep their state separate. On the other hand, if you want to, for example, run a backfill for a subset of the streams using a select rule in the end you want the states to be merged. The --merge-state
was added because some extractors read the current state but write a totally new one at the end of the run, without merging the dictionaries so the older bookmarks would be lost.Edgar Ramírez (Arch.dev)
06/20/2024, 6:47 PMDaniel Luo
06/20/2024, 7:01 PM--merge-state
, it's possible for two selectors to select the same stream twice?
2. Some extractors override the state with what's in the current filtered run, ignoring the rest of the streams that weren't run and effectively removing them.
If I'm just running single streams, would it make sense to just set --state-id-suffix= {tap_stream_id}?
Is merging state an expensive operation? Does it make sense to just always specify the flag so that we maintain a single state regardless of which streams we run?Daniel Luo
06/20/2024, 7:10 PMmeltano run
, I imagine it looks at the default state (if no --state-id-suffix
is provided), and then it will update the default state or the provided state (and this would be where the --merge-state
flag comes into play). If --state-id-suffix
is provided, then it would look only at that particular state and update that specific one. I'm guessing there might be issues if I run everything with a unique --state-id-suffix
, then decide I want to run multiple streams together? Is that correct?
What happens if the schema of the source data changes and no longer matches with the destination?