I'm using the buzzcutnorman variant of tap-mssql w...
# singer-taps
a
I'm using the buzzcutnorman variant of tap-mssql which is based on the SDK. There is an enormous list of tables in the DB but I only need access to a handful. I think that by default
discover
is generating a catalog entry for each of the 1000s of tables. Is there any way to avoid this? I have a
select
in my meltano.yml already but that doesn't seem to prevent it, only stops the records from non-required tables getting emitted.
b
Sorry to say that I don't know of a way nor have I implemented a way to avoid/disable the catalog discovery from running. The buzzcutnorman tap-mssql is using the SDK default catalog discovery. I know selective catalog discovery has been a feature request for sometime. Here is the link to the issue: https://github.com/meltano/sdk/issues/1234 . The question I have is what problem is the avoid/selective discovery going to solve for you if it was present?
v
Our "workaround" in tap-postgres is this https://github.com/MeltanoLabs/tap-postgres/pull/218 add a filter_schemas config
a
Nice try, but they all sit in
dbo
๐Ÿ™„
๐Ÿ˜ญ 1
v
filter_tables
๐Ÿ˜ฎ
b
I am still curious to know what problem is the avoid/selective discovery going to solve for you if it were present? I have a guess but don't want to assume what the problem you want solved is.
v
If it's the problem I think it is the discovery process takes 5-30 minutes to run (I've heard longer for some people) , it can reduce that down to seconds to minutes
๐Ÿ‘ 1
Where I've seen it for folks is when they have something like 10k tables
b
Thank you both for the clarification.
c
can you make another schema that has views that point to only the tables youโ€™re interested in? I know its not great modifying the source system but it might be a shortcut for saving time during iterative development, at least.
a
@visch does
filter_tables
already exist on a SQL tap somewhere I could pinch an implementation from? I'm guessing I'm going to have to fork here.
I'm guessing override
discover_catalog_entries
like in this PR, but probably for a include rather than exclude list https://github.com/meltano/sdk/pull/2482/files
v
@Andy Carter I haven't written one ๐Ÿ˜• there looks good I think I'd start here https://github.com/meltano/sdk/blob/main/singer_sdk/connectors/sql.py#L502 which leads here https://github.com/meltano/sdk/blob/main/singer_sdk/connectors/sql.py#L384 and this looks like a winner!
๐Ÿ™ 1
I also would look at what just querying sys.table / information schema and seeing how long it takes to run that query. If it's not so bad I might just convert the function to use that instead ๐Ÿคท
a
Copy code
def get_object_names(
        self, engine, inspected, schema_name: str
    ) -> list[tuple[str, bool]]:
        # Get list of tables and views
        if self.config.get('fake_it', False):
            table_names = ['my_table']
            view_names = []
            return [(t, False) for t in table_names] + [(v, True) for v in view_names]
        else:
            super().get_object_names(engine, inspected, schema_name)

    def get_schema_names(self, engine, inspected) -> list[str]:
        return ['dbo']
Hacky but I like it ๐Ÿ™‚
v
fake_it
๐Ÿ˜†
a
Copy code
finally:
    make_it()
๐Ÿ˜‚ 1
I managed to get the tap running directly but now all my streams are not selected, and even if I force
selected
by hardcoding
True
in the SDK, there are no fields in the stream (presumably they are all deselected too). Any pointers? Or do I just need to run it in meltano now rather than CLI directly? If I look at the stream mask, everything is coming back unselected
b
I find it is easier to run it via meltano to test changes. Just a test project and I put this in for tap-mssql in the plugins. The pip url points to my local git repository. Makes running things easier. Run
meltano install --clean
once you get the `meltano.yml' file all set and saved.
Copy code
- name: tap-mssql
    namespace: tap_mssql
    pip_url: -e ..\\..\\buzzcutnorman\\tap-mssql
    capabilities:
    - catalog
    - state
    - discover
    - about
    - stream-maps
    - schema-flattening
    settings:
    - name: dialect
    - name: driver_type
    - name: host
    - name: port
      kind: integer
    - name: user
    - name: password
      kind: string
      sensitive: true
    - name: database
    - name: driver
    - name: sqlalchemy_eng_params
      kind: object
    - name: fast_executemany
    - name: future
    - name: sqlalchemy_url_query
      kind: object
    - name: driver
    - name: TrustServerCertificate
    - name: MultiSubnetFailover
    - name: batch_config
      kind: object
    - name: encoding
      kind: object
    - name: format
    - name: compression
    - name: storage
      kind: object
    - name: root
    - name: prefix
    - name: start_date
    - name: hd_jsonschema_types
      kind: boolean
    config:
      dialect: mssql
      driver_type: pyodbc
      host: [you mssql server]
      user: [your username]
      database: [your-datebase]
      sqlalchemy_eng_params:
        fast_executemany: 'True'
      sqlalchemy_url_query:
        driver: ODBC Driver 18 for SQL Server
        TrustServerCertificate: yes
        MultiSubnetFailover: yes
a
Thanks. I am still getting empty records though, lots of messages like this.
2024-07-17 15:25:40,156 Selection metadata omitted for '()'. Using parent value of selected=None.
v
I thought your code was psuedo code, can you push up a draft PR @Andy Carter and link it I'll take a peek
b
I am sorry I missed that. I will add the psedo code to the issue and take a look at it later. Please no PR at this point.
๐Ÿ‘ 1
a
I'm just trying to hack things about in a fork at the mo, nowhere near PR stage yet.
b
Oh then the easiest thing is to edit the tap-mssql and/or SDK code in the project directly. I do this from time to time. If you don't run a meltano install the changes stay. Here is were you can find the code in a project. your_project_folder\.meltano\extractors\tap-mssql\venv\Lib\site-packages\singer_sdk\ your_project_folder\.meltano\extractors\tap-mssql\venv\Lib\site-packages\tap_mssql\
By project I mean a test one obviously not a production one. I bet you figured that but just needed to say it in case I made it sound as if I edit my production projects this way.๐Ÿ˜…
๐Ÿ™Œ 1
a
I assume I'm doing something very wrong, but I had to hack
Copy code
def _breadcrumb_is_selected(self, breadcrumb: Breadcrumb) -> bool:
    return True
in the SDK to get any records out via the cli. Will try now in a fuller meltano project and see what happens.
๐Ÿ‘ 1
๐Ÿคž 1
@BuzzCutNorman Running my forked tap in my meltano project selects the relevant records just fine. So no idea why the standalone tap via cli didn't select them.