I am trying to run the `tap salesforce` with `meltano elt` a Meltano #announcements

I am trying to run the `tap-salesforce` with `melt...

rahul_anand

07/21/2020, 9:02 PM

I am trying to run the

tap-salesforce

with

meltano elt

, and I see it takes a long time to parse the catalog in Meltano. I see many messages in debug output similar to:

Copy code

Visiting metadata node for tap_stream_id 'ServiceContract', breadcrumb '['properties', 'AccountId']'
Setting '.streams[0].metadata[12].metadata.selected' to 'False'
Skipping node at '.streams[0].metadata[12].breadcrumb[0]'
Skipping node at '.streams[0].metadata[12].breadcrumb[1]'
Skipping node at '.streams[0].metadata[12].metadata.inclusion'
Skipping node at '.streams[0].metadata[12].metadata.selected-by-default'
Skipping node at '.streams[0].metadata[12].metadata.selected'

It seems it takes more than few minutes (~ 5 min) before the request is made to Salesforce. Any plan to improve this parsing performance?

douwe_maan

07/21/2020, 9:13 PM

@rahul_anand Is the slow part actually parsing the discovered catalog file and applying selection and metadata rules, or discovering the catalog file itself (

meltano invoke tap-salesforce --discover

douwe_maan

07/21/2020, 9:14 PM

Applying the selection rules involves a single pass over the entire discovered data structure, which I'd expect to be relatively fast. How big a catalog file are we talking here? 🙂

rahul_anand

07/21/2020, 9:15 PM

I will investigate it further. Most likely it seems related to the parsing.

rahul_anand

07/21/2020, 9:15 PM

Catalog is close to 10 MB.

rahul_anand

07/21/2020, 9:16 PM

I remember in one of my run I kept seeing this parsing related debug messages for several minutes.

douwe_maan

07/21/2020, 9:16 PM

Interesting... Enabling logging may actually be slowing things down too, since it involves IO writes. But yeah, the code is probably not optimized for traversing a 10MB data structure

rahul_anand

07/21/2020, 9:19 PM

May be it is some specific case. I just ran it, without discover and it ran ok. In the previous run I had close to 2000 objects of Salesforce in the meltano.yml and it took a long time. I will try reproducing that scenario.

douwe_maan

07/21/2020, 9:22 PM

Ah yeah, that definitely wouldn't help. If there are N attributes in the catalog, and M selection rules, you'd see

N*M

"does this rule match this attribute?" checks

douwe_maan

07/21/2020, 9:22 PM

We could be a little smarter about how those rules are stored and tested against to prevent that explosion of checks

rahul_anand

07/21/2020, 9:26 PM

Sounds like the right explanation of the problem. I saw these debug output for few minutes, left it running and if I am not wrong these checks actually ran for more than 30 mins. But I would like to re-test the scenario to rule out any other issue.

douwe_maan

07/21/2020, 9:43 PM

@rahul_anand Please do, and file an issue once you're sure you've identified the culprit 🙂

Open in Slack

Previous Next