https://meltano.com/ logo
#announcements
Title
# announcements
b

brief-accountant-53906

04/15/2021, 1:50 PM
We are off to building a new tap for Hubspot and we decided to go with Meltano Singer SDK. It seems schema has to be derived programmatically with entries into PropertiesList() or manually written out as a json file. My question, is there an operation native to Meltano SDK that works like singer-infer-schema? It takes in the output of singer’s write_records() and outputs a schema. Thanks!
s

salmon-salesclerk-77709

04/15/2021, 1:58 PM
Hey Ayo! @salmon-actor-23953 is the most knowledgeable about the SDK and will be better suited to answer your question. Also note that we have the #sdk channel too for discussion 🙂
thankyou 1
b

blue-continent-72423

04/15/2021, 2:01 PM
Hello @brief-accountant-53906 I also created a new tap for Hubspot. Unfortunately the SDK was not ready yet when I started. I’ll be interested in your version. About your question @salmon-actor-23953 has shared somewhere else a tool to infer a JSON schema
b

brief-accountant-53906

04/15/2021, 2:08 PM
Thanks @blue-continent-72423 I was just curious if the SDK has one that’s built in. Currently I use the singer tool - singer-infer-schema which can sometimes cause dependencies collision with singer-sdk dependencies 😞 . This allows me to auto update the schema before any tap -> target op esp. for full table loads
s

salmon-actor-23953

04/15/2021, 3:32 PM
Hi, @brief-accountant-53906 - Thanks for your inquiry! I have a few ideas which might help, but first to answer your question: no, the SDK does not have an ‘infer schema’ function and if we do add it, we will likely start from the existing
singer-infer-schema
tool which you found (same tool as referenced in my comments to @blue-continent-72423). To your concern about version conflicts, I highly recommend installing the singer-tools using pipx instead of pip, as that will completely eliminate version conflict issues. (Rule of thumb: I recommend installing any executable python libraries with pipx, which fits the case for singer-tools pacakage.) The reason we have not moved quicker to add singer-infer-schema into the SDK is due to the fact that the infer-schema operation requires a full dataset in order to work, and the SCHEMA declaration must be sent to the target before any record messages are sent. (Also, the detection is not perfect as I’ve noted here.) Upshot is that auto-schema detection inline is likely not a realistic possibility any time in the near future. If you wouldn’t mind logging an issue for us here in the SDK issue tracker, it’s certainly an area we can try to streamline over time. One more point is that you can actually use any tool to generate schema as long as it generates a valid JSON Schema (Draft 4) json definition. The
PropertiesList
helper we built into the SDK essentially just streamlines the process of building the JSON Schema so you don’t have to write it by hand. What do you think? Does this help?
b

brief-accountant-53906

04/15/2021, 3:37 PM
Thanks a lot @salmon-actor-23953! My flow was to pip install -e the tap package (containing singer-sdk). Pipenv install singer-tools and then pipx target-stitch as the latter had some conflict with attrs 🙂
s

salmon-actor-23953

04/15/2021, 3:55 PM
@brief-accountant-53906 - Yeah, that makes sense. If you run
pipx install singer-tools
, it will at least make those helper tools easily available for any of your projects:
Copy code
ajsteers@ajs-macbook-pro ~ % pipx install singer-tools
  installed package singer-tools 0.4.1, Python 3.8.7
  These apps are now globally available
    - diff-jsonl
    - singer-check-tap
    - singer-infer-schema
    - singer-release
done! ✨ 🌟 ✨
thankyou 1
👍🏾 1
I logged this for tracking just now.
b

brief-accountant-53906

04/15/2021, 4:00 PM
Perfect! Thanks a lot!