Hi community, What is the purpose of the discover ...
# singer-tap-development
m
Hi community, What is the purpose of the discover command if the JSON schema is built and shipped with the tap? Why not directly give the complete catalog itself? I get the use if the API has some sort of exposure of the Endpoints and returned fields but this is pretty rare. APIs are often not well documented, fields are missing in the docu etc... I personnaly skip the discover part, as It does not help me in any way. Not even to check for changes in the API 😞 Happy to hear your thoughts and be convinced by the opposite 😉
t
It’s most useful when introspecting databases, object stores, and files. Agreed though that for some APIs it isn’t always useful, but it would be great if APIs were able to declare their catalogs and data available.
a
@michel_ebner, many taps which have static schemas actually do have those JSON schemas as artifacts within their code repos. Example here. However, as @taylor mentions, there are tons of taps which can have dynamic schemas. Databases fit this category, but also systems like Salesforce where there's a large amount of user customization available.
m
Thanks @taylor and @aaronsteers for your input! I have mostly seens repos with JSON schemas as artifacts which is why I did not really understood the reason of those. If it is static, there is no real use of the discover in my opinion. Do you think it would be usefull to query the information from swagger if available and show those?
t
The other use would be for programmatic access. If you can make a consistent request for discovery, even if it’s currently static it could be updated to be more dynamic in the future. So yeah, updating to query from swagger would make a lot of sense I think!
a
@michel_ebner - As @taylor mentions, programatic access is the main purpose - and that regardless of the exact means the developer uses to declare the schema, the reflection of that schema back to the calling application will always follow the same format. So the
--discover
(aka "discovery") output provides consistent machine-readable output whether the input is static as json files, declarative in python code, or dynamic - using inline schema recognition (such as in the generic
tap-rest-api
) or dynamic schema discovery from the source (such as
tap-redshift
or
tap-salesforce
). There's also one very important additional benefit of the discovery process - which is to provide metadata which is not part of the core JSON Schema spec: specifically primary key info, incremental key info, and default select/deselect patterns. (Those additional metadata elements described more here.)
There's been some discussion of trying to get more data from swagger definitions, when available, but I haven't yet seen a POC on this. Would be very interesting to explore more!
m
Agreed though that for some APIs it isn’t always useful, but it would be great if APIs were able to declare their catalogs and data available.
Sorry for the massive thread revival, but has there perchance been any improvement on this? I imagine some real overhead across all of the teams building API integrations, and something like a reverse-Swagger-docs would be quite a useful bit of kit, to at least reverse engineer the catalog and data available from APIs
a
@edgar_ramirez_mondragon - Do I remember correctly that you had created a tap's
schema
declaration by referencing a Swagger/OpenAPI spec? Re: https://github.com/meltano/meltano/issues/2289
m
Thanks @aaronsteers,that is a useful place for me to start
e
@aaronsteers yup, tap-shortcut references the API’s swagger endpoint to extract the schemas
m
That is cool, thanks @edgar_ramirez_mondragon
a
Thanks, @edgar_ramirez_mondragon! Much appreciated. I pasted that link also into the GitHub issue for future ref. thankyou