I am trying to run the `tap-salesforce` with `melt...
# announcements
r
I am trying to run the
tap-salesforce
with
meltano elt
, and I see it takes a long time to parse the catalog in Meltano. I see many messages in debug output similar to:
Copy code
Visiting metadata node for tap_stream_id 'ServiceContract', breadcrumb '['properties', 'AccountId']'
Setting '.streams[0].metadata[12].metadata.selected' to 'False'
Skipping node at '.streams[0].metadata[12].breadcrumb[0]'
Skipping node at '.streams[0].metadata[12].breadcrumb[1]'
Skipping node at '.streams[0].metadata[12].metadata.inclusion'
Skipping node at '.streams[0].metadata[12].metadata.selected-by-default'
Skipping node at '.streams[0].metadata[12].metadata.selected'
It seems it takes more than few minutes (~ 5 min) before the request is made to Salesforce. Any plan to improve this parsing performance?
d
@rahul_anand Is the slow part actually parsing the discovered catalog file and applying selection and metadata rules, or discovering the catalog file itself (
meltano invoke tap-salesforce --discover
)?
Applying the selection rules involves a single pass over the entire discovered data structure, which I'd expect to be relatively fast. How big a catalog file are we talking here? 🙂
r
I will investigate it further. Most likely it seems related to the parsing.
Catalog is close to 10 MB.
I remember in one of my run I kept seeing this parsing related debug messages for several minutes.
d
Interesting... Enabling logging may actually be slowing things down too, since it involves IO writes. But yeah, the code is probably not optimized for traversing a 10MB data structure
r
May be it is some specific case. I just ran it, without discover and it ran ok. In the previous run I had close to 2000 objects of Salesforce in the meltano.yml and it took a long time. I will try reproducing that scenario.
d
Ah yeah, that definitely wouldn't help. If there are N attributes in the catalog, and M selection rules, you'd see
N*M
"does this rule match this attribute?" checks
We could be a little smarter about how those rules are stored and tested against to prevent that explosion of checks
r
Sounds like the right explanation of the problem. I saw these debug output for few minutes, left it running and if I am not wrong these checks actually ran for more than 30 mins. But I would like to re-test the scenario to rule out any other issue.
d
@rahul_anand Please do, and file an issue once you're sure you've identified the culprit 🙂