Happy weekend! I had a quick question (my 5th this...
# singer-tap-development
s
Happy weekend! I had a quick question (my 5th this week, I'm going for a record). Is it possible to specify that we want to query the data that is in our schema? I'm creating a custom schema with about 500 properties (for a hubspot tap) and I would like to get all the information associated, but if I simply add a concatenation of all the params I want I get a 414
a
Hi, @Stéphane Burwash! I'm not sure I follow your use case. Can you say a bit more about what you are trying to do?
...my 5th this week, I'm going for a record...
Nice 😅 🎖️ 🎉
s
Hi @aaronsteers, thanks for the answer! Use case: I trying to improve the native hubspot tap (reading it makes my brain hurt) but Hubspot now comes with many additional properties, some of which are volatile (dependent on specific values), and it's therefore hard to create a schema for them. The current only way to get all of these properties using the v3 version of the hubspot api is to specify all of the parameters in the header (like any normal query). Except now, I have a 414 URI too long answer because I have too many characters (29000 vs 16000 allowed). So here are my 2 options (as I see it): 1. Meltano has an under-the-hood way of adding params to a query WITHOUT specifying them in the header, just basing itself off the schema 2. Meltano offers a re-query module to make 2-n queries with different params to get all the data. Makes sense?
a
Yeah, I think that makes sense! Do you mind pasting a link to the API docs?
If the API accepts params in the request body rather than in the url, perhaps there's a path forward with option 1...
s
New api: https://developers.hubspot.com/docs/api/crm/deals#endpoint?spec=POST-/crm/v3/objects/deals/merge Legacy (legacy works, but it comes with its headaches: https://legacydocs.hubspot.com/docs/overview Currently I'm syncing deals, but it applies to many other endpoints. Here is also the repo to my tap, if you're interested in seeing the code. The sdk version is in the cookie_cutter branch https://github.com/potloc/tap-hubspot
a
Their
list
API seems to send all args through url params.
But it appears the
bulk
API (a
POST
) allows mix-and-match of params in the URL and also in the request body (
--data
in the below).
I wonder if there's a path forward in either (a) sending params in the request body or the
list
API (
/crm/v3/objects/deals
) instead of passing them in the url, or (b) using the batch API if it can meet the needed use cases. Some testing with
postman
(or
curl
or thunder client) may be helpful in determining if the API can support (a) or not.
Do you have any thoughts on best/preferred approach given what the API can support? The SDK should be able to support any approach the API itself supports. If there's a gap or lack of examples with a specific usage pattern, I/we can help work through those issues.
(cc @edgar_ramirez_mondragon who likely will be interested in this thread and may have additional insight.)
Hmmm... looks like the bulk
search
API is recommended here, for the same reasons as above. The
bulk
/
POST
versions of the API appear to support passing params in the body, whereas it appears the basic endpoints do not support any way of bypassing the 414 issue ("URI too long for the server to process").
In the case that the BULK API endpoints are not workable (aka, if the API just can't support getting all properties in one call), there's another workaround which is to basically send multiple calls, breaking the calls into URL strings each less than the max characters similar to how you suggested above in your option "b". However, this would be messy and should probably be the last resort.
s
Omg thank you so much for the reponse! I will definitely look into this on monday (I think taking sunday off is a healthy choice 😉 ) but I will get back to you on monday with how it went on!
Update: HOLY SHIT IT WORKS @aaronsteers YOU'RE A GENIUS (and yes, I made the unhealthy choice of working on sunday, I was too excited to try this out)
And now, as in all things beautiful, another problem has come up, so yay 😉 the search api has a 10000 element limit per specific query, so Ill need to play with their filters to be able to sync all contacts
a
Blast. 😭
s
Are taps always this much of a pain in my ass?
p
You can use page tokens to handle that right?
s
@pablo_seibelt for the hubspot v3 api specifically, there are 2 endpoints to get objects (ex: deals): the deals endpoint in itself, and the search endpoint. The deals endpoint has parameters in the url, so with 500+ properties, this gives you an automatic 414. On the flip side, there is the search api. This offers properties in the body, so it's technically a dream, but it has a 10000 rows cap per query with paging, which is the current issue I'm trying to solve
a
Are taps always this much of a pain in my ass?
Only when APIs are poorly designed, IMHO 😭
Salesforce has a very similar and extensible model but its API is much more friendly for data retrieval and integration.
Troubling that the "official" answer from Hubspot was "check if there are properties they can leave out": Solved: HubSpot Community - Re: GET all contacts endpoint returning 414 - HubSpot Community
@Stéphane Burwash - We might be coming back to the option to having to make multiple calls... But before we do that, what are your thoughts of still using the search API but making each call specific to a time period (if the API supports it) and then looping through those periods. If we can be sure that a specific period will not overflow the results, then potentially the Search API could still be viable. (A stretch, but I think worth considering.)
It appears that the search API does support returning results incrementally (sorted) and that
hs_lastmodifieddate
might be able to drive the time constraint and perhaps also the sorting. (You'd need to confirm that this is also inclusive of
createdate
and that newly created items don't have a null modified date.)
s
Yeah thats a great idea! Based on that, here's my plan 1. Sort in descending order by lastmodified 2. Create a query based on a filter of 30 days sorted descending 3. If a query returns a result size of 0 (passed all queries) we set our pagination token to None Makes sense?
a
Yes, sounds great. Just one suggested tweak: I think sorting ascending may be a bit better for resumeability. So, starting with the data of the bookmark or the default
start_date
value if no bookmark exists, and then you can mark your stream as sorted=True and potentially benefit from resume-on-interrupt.
This also prevents the case where a record can be missed from extraction if it gets updated while the sync is running and "moves" from an older time window (which has not yet been queried) into a newer time window (which had already been queried).
s
Great idea, thanks!
Update: I have currently integrated the meltano-sdk into our hubspot-tap, and am succesfully querying meetings, companies, deals and owners, as well as the properties of these tables. Here is the link: https://github.com/potloc/tap-hubspot Next steps: For elements with larger URIs that exceed the character limit (ex: emails), implement recurrent filtering. The next_page_token will be changed from a single value to a dict, with the appropriate filter being it's second value (therefore bypassing an issue where multiple iterations with only 1 page would give us pagination issues) If anyone has any questions / feedback I'd love to hear it! Once this plugin is more robustly tested, I'll probably post it in #C013EKWA2Q1