Hi community What is the purpose of the discover command if Meltano #singer-tap-development

Hi community, What is the purpose of the discover ...

michel_ebner

11/19/2021, 9:10 AM

Hi community, What is the purpose of the discover command if the JSON schema is built and shipped with the tap? Why not directly give the complete catalog itself? I get the use if the API has some sort of exposure of the Endpoints and returned fields but this is pretty rare. APIs are often not well documented, fields are missing in the docu etc... I personnaly skip the discover part, as It does not help me in any way. Not even to check for changes in the API 😞 Happy to hear your thoughts and be convinced by the opposite 😉

taylor

11/19/2021, 3:38 PM

It’s most useful when introspecting databases, object stores, and files. Agreed though that for some APIs it isn’t always useful, but it would be great if APIs were able to declare their catalogs and data available.

aaronsteers

11/19/2021, 5:03 PM

@michel_ebner, many taps which have static schemas actually do have those JSON schemas as artifacts within their code repos. Example here. However, as @taylor mentions, there are tons of taps which can have dynamic schemas. Databases fit this category, but also systems like Salesforce where there's a large amount of user customization available.

michel_ebner

11/22/2021, 8:22 AM

Thanks @taylor and @aaronsteers for your input! I have mostly seens repos with JSON schemas as artifacts which is why I did not really understood the reason of those. If it is static, there is no real use of the discover in my opinion. Do you think it would be usefull to query the information from swagger if available and show those?

taylor

11/22/2021, 3:18 PM

The other use would be for programmatic access. If you can make a consistent request for discovery, even if it’s currently static it could be updated to be more dynamic in the future. So yeah, updating to query from swagger would make a lot of sense I think!

aaronsteers

11/23/2021, 9:48 PM

@michel_ebner - As @taylor mentions, programatic access is the main purpose - and that regardless of the exact means the developer uses to declare the schema, the reflection of that schema back to the calling application will always follow the same format. So the

--discover

(aka "discovery") output provides consistent machine-readable output whether the input is static as json files, declarative in python code, or dynamic - using inline schema recognition (such as in the generic

tap-rest-api

) or dynamic schema discovery from the source (such as

tap-redshift

tap-salesforce

). There's also one very important additional benefit of the discovery process - which is to provide metadata which is not part of the core JSON Schema spec: specifically primary key info, incremental key info, and default select/deselect patterns. (Those additional metadata elements described more here.)

aaronsteers

11/23/2021, 9:49 PM

There's been some discussion of trying to get more data from swagger definitions, when available, but I haven't yet seen a POC on this. Would be very interesting to explore more!

matt_arderne

11/28/2022, 4:34 PM

Agreed though that for some APIs it isn’t always useful, but it would be great if APIs were able to declare their catalogs and data available.

Sorry for the massive thread revival, but has there perchance been any improvement on this? I imagine some real overhead across all of the teams building API integrations, and something like a reverse-Swagger-docs would be quite a useful bit of kit, to at least reverse engineer the catalog and data available from APIs

aaronsteers

11/28/2022, 5:36 PM

@edgar_ramirez_mondragon - Do I remember correctly that you had created a tap's

schema

declaration by referencing a Swagger/OpenAPI spec? Re: https://github.com/meltano/meltano/issues/2289

matt_arderne

11/28/2022, 5:41 PM

Thanks @aaronsteers,that is a useful place for me to start

edgar_ramirez_mondragon

11/28/2022, 5:47 PM

@aaronsteers yup, tap-shortcut references the API’s swagger endpoint to extract the schemas

matt_arderne

11/28/2022, 5:49 PM

That is cool, thanks @edgar_ramirez_mondragon

aaronsteers

11/28/2022, 6:39 PM

Thanks, @edgar_ramirez_mondragon! Much appreciated. I pasted that link also into the GitHub issue for future ref. thankyou

3 Views

Open in Slack

Previous Next