Hello, I looked at what you did with the SDK and ...
# singer-tap-development
p
Hello, I looked at what you did with the SDK and great job! The API is quite lean šŸ˜ƒ The fact that the state management + network calls is almost completely hidden is a nice achievement šŸ‘ Based on my recent experience (or ā€œtraumaā€ šŸ˜… ) on writing a tap for Hubspot, I thought about how I could had used the SDK as thought experiment and I have some questions šŸ¤” Hubspot is a CRM offering a REST API based tap. Streams are Contacts / Deals / Deal Pipeline / Deal Pipeline Stage / Owners (e.g. salesperson). And the APIs are quite a mess so it may not be a good example. a. To extract some streams from Hubspot (ex. contacts, from 0 to 200k records), you have to fetch 2 endpoints in sequence: 1. the first one to fetch the Ids of what you want to extract. This one is paginated. 2. the second one to fetch the detail of each record of the stream (contact with the ā€œcustom propertiesā€), this one requires to pass in the query string the list of Ids fetched in the call 1 to work. (and itā€™s not possible to consume directly this endpoint in a paginated manner, because it wouldnā€™t be funny otherwiseā€¦) Iā€™m not sure of how it could work with
RESTStream
. The closest thing that I saw was: https://gitlab.com/meltano/singer-sdk/-/blob/development/singer_sdk/samples/sample_tap_gitlab/gitlab_rest_streams.py#L160 where 2 streams are used and a sort of state (is it the same that the tap state?) is used to pass data from the 2 streams. In Hubspot, there are thousands or hundred of thousands Contacts Ids to be passed between the 2 streams. Would the recommended solution be the same as the one implemented into the Gitlab tap? b. Some streams in Hubspot are nested into a single API / Endpoint. Ex. Deal Pipeline and Deal Pipeline Stage are returned through 2 nested lists into a single endpoint. How would it be mapped with the SDK? Is it possible to create a Stream that consume the results of the API calls of another Stream without doing API calls? c. What is the concurrency model of the SDK between Streams? Are all streams synced in sequence, or is there some sort of concurrency? My experience shown that Hubspot paginated APIs have high latency so the ā€œtotal sync timeā€ can be quite long: weā€™re spending a lot of time waiting for the next page (and when people have 100k records, that a lot of pagesā€¦). Syncing different paginated streams (ex. Contacts and Deals) at the same time is saving time in this case. How would the SDK have helped? šŸ˜ƒ