Happy weekend I had a quick question my 5th this week I m go Meltano #singer-tap-development

Happy weekend! I had a quick question (my 5th this...

Stéphane Burwash

04/16/2022, 6:17 PM

Happy weekend! I had a quick question (my 5th this week, I'm going for a record). Is it possible to specify that we want to query the data that is in our schema? I'm creating a custom schema with about 500 properties (for a hubspot tap) and I would like to get all the information associated, but if I simply add a concatenation of all the params I want I get a 414

aaronsteers

04/16/2022, 8:38 PM

Hi, @Stéphane Burwash! I'm not sure I follow your use case. Can you say a bit more about what you are trying to do?

aaronsteers

04/16/2022, 8:39 PM

...my 5th this week, I'm going for a record...

Nice 😅 🎖️ 🎉

Stéphane Burwash

04/16/2022, 9:07 PM

Hi @aaronsteers, thanks for the answer! Use case: I trying to improve the native hubspot tap (reading it makes my brain hurt) but Hubspot now comes with many additional properties, some of which are volatile (dependent on specific values), and it's therefore hard to create a schema for them. The current only way to get all of these properties using the v3 version of the hubspot api is to specify all of the parameters in the header (like any normal query). Except now, I have a 414 URI too long answer because I have too many characters (29000 vs 16000 allowed). So here are my 2 options (as I see it): 1. Meltano has an under-the-hood way of adding params to a query WITHOUT specifying them in the header, just basing itself off the schema 2. Meltano offers a re-query module to make 2-n queries with different params to get all the data. Makes sense?

aaronsteers

04/16/2022, 9:10 PM

Yeah, I think that makes sense! Do you mind pasting a link to the API docs?

aaronsteers

04/16/2022, 9:11 PM

If the API accepts params in the request body rather than in the url, perhaps there's a path forward with option 1...

Stéphane Burwash

04/16/2022, 9:16 PM

New api: https://developers.hubspot.com/docs/api/crm/deals#endpoint?spec=POST-/crm/v3/objects/deals/merge Legacy (legacy works, but it comes with its headaches: https://legacydocs.hubspot.com/docs/overview Currently I'm syncing deals, but it applies to many other endpoints. Here is also the repo to my tap, if you're interested in seeing the code. The sdk version is in the cookie_cutter branch https://github.com/potloc/tap-hubspot

aaronsteers

04/16/2022, 10:46 PM

Their

list

API seems to send all args through url params.

aaronsteers

04/16/2022, 10:49 PM

But it appears the

bulk

API (a

POST

) allows mix-and-match of params in the URL and also in the request body (

--data

in the below).

aaronsteers

04/16/2022, 10:49 PM

I wonder if there's a path forward in either (a) sending params in the request body or the

list

API (

/crm/v3/objects/deals

) instead of passing them in the url, or (b) using the batch API if it can meet the needed use cases. Some testing with

postman

(or

curl

or thunder client) may be helpful in determining if the API can support (a) or not.

aaronsteers

04/16/2022, 10:50 PM

Do you have any thoughts on best/preferred approach given what the API can support? The SDK should be able to support any approach the API itself supports. If there's a gap or lack of examples with a specific usage pattern, I/we can help work through those issues.

aaronsteers

04/16/2022, 10:51 PM

(cc @edgar_ramirez_mondragon who likely will be interested in this thread and may have additional insight.)

aaronsteers

04/16/2022, 11:02 PM

Hmmm... looks like the bulk

search

API is recommended here, for the same reasons as above. The

bulk

POST

versions of the API appear to support passing params in the body, whereas it appears the basic endpoints do not support any way of bypassing the 414 issue ("URI too long for the server to process").

aaronsteers

04/16/2022, 11:05 PM

In the case that the BULK API endpoints are not workable (aka, if the API just can't support getting all properties in one call), there's another workaround which is to basically send multiple calls, breaking the calls into URL strings each less than the max characters similar to how you suggested above in your option "b". However, this would be messy and should probably be the last resort.

Stéphane Burwash

04/16/2022, 11:51 PM

Omg thank you so much for the reponse! I will definitely look into this on monday (I think taking sunday off is a healthy choice 😉 ) but I will get back to you on monday with how it went on!

Stéphane Burwash

04/17/2022, 4:15 PM

Update: HOLY SHIT IT WORKS @aaronsteers YOU'RE A GENIUS (and yes, I made the unhealthy choice of working on sunday, I was too excited to try this out)

Stéphane Burwash

04/17/2022, 5:06 PM

And now, as in all things beautiful, another problem has come up, so yay 😉 the search api has a 10000 element limit per specific query, so Ill need to play with their filters to be able to sync all contacts

aaronsteers

04/19/2022, 4:07 AM

Blast. 😭

Stéphane Burwash

04/19/2022, 2:30 PM

Are taps always this much of a pain in my ass?

pablo_seibelt

04/19/2022, 3:56 PM

You can use page tokens to handle that right?

Stéphane Burwash

04/19/2022, 5:08 PM

@pablo_seibelt for the hubspot v3 api specifically, there are 2 endpoints to get objects (ex: deals): the deals endpoint in itself, and the search endpoint. The deals endpoint has parameters in the url, so with 500+ properties, this gives you an automatic 414. On the flip side, there is the search api. This offers properties in the body, so it's technically a dream, but it has a 10000 rows cap per query with paging, which is the current issue I'm trying to solve

aaronsteers

04/19/2022, 5:26 PM

Are taps always this much of a pain in my ass?

Only when APIs are poorly designed, IMHO 😭

aaronsteers

04/19/2022, 5:28 PM

Salesforce has a very similar and extensible model but its API is much more friendly for data retrieval and integration.

aaronsteers

04/19/2022, 5:30 PM

Troubling that the "official" answer from Hubspot was "check if there are properties they can leave out": Solved: HubSpot Community - Re: GET all contacts endpoint returning 414 - HubSpot Community

aaronsteers

04/19/2022, 5:35 PM

@Stéphane Burwash - We might be coming back to the option to having to make multiple calls... But before we do that, what are your thoughts of still using the search API but making each call specific to a time period (if the API supports it) and then looping through those periods. If we can be sure that a specific period will not overflow the results, then potentially the Search API could still be viable. (A stretch, but I think worth considering.)

aaronsteers

04/19/2022, 5:38 PM

It appears that the search API does support returning results incrementally (sorted) and that

hs_lastmodifieddate

might be able to drive the time constraint and perhaps also the sorting. (You'd need to confirm that this is also inclusive of

createdate

and that newly created items don't have a null modified date.)

Stéphane Burwash

04/19/2022, 5:49 PM

Yeah thats a great idea! Based on that, here's my plan 1. Sort in descending order by lastmodified 2. Create a query based on a filter of 30 days sorted descending 3. If a query returns a result size of 0 (passed all queries) we set our pagination token to None Makes sense?

aaronsteers

04/19/2022, 5:52 PM

Yes, sounds great. Just one suggested tweak: I think sorting ascending may be a bit better for resumeability. So, starting with the data of the bookmark or the default

start_date

value if no bookmark exists, and then you can mark your stream as sorted=True and potentially benefit from resume-on-interrupt.

aaronsteers

04/19/2022, 5:55 PM

This also prevents the case where a record can be missed from extraction if it gets updated while the sync is running and "moves" from an older time window (which has not yet been queried) into a newer time window (which had already been queried).

Stéphane Burwash

04/19/2022, 6:12 PM

Great idea, thanks!

Stéphane Burwash

04/19/2022, 6:14 PM

For posterity: https://community.hubspot.com/t5/APIs-Integrations/Getting-error-while-searching-filtering-contact-object-with/td-p/377931

Stéphane Burwash

04/22/2022, 3:45 PM

Update: I have currently integrated the meltano-sdk into our hubspot-tap, and am succesfully querying meetings, companies, deals and owners, as well as the properties of these tables. Here is the link: https://github.com/potloc/tap-hubspot Next steps: For elements with larger URIs that exceed the character limit (ex: emails), implement recurrent filtering. The next_page_token will be changed from a single value to a dict, with the appropriate filter being it's second value (therefore bypassing an issue where multiple iterations with only 1 page would give us pagination issues) If anyone has any questions / feedback I'd love to hear it! Once this plugin is more robustly tested, I'll probably post it in #C013EKWA2Q1

Open in Slack

Previous Next