tap-salesforce -> target-snowflake fails with `...
# troubleshooting
j
tap-salesforce -> target-snowflake fails with
KeyError: 'type'
on this line: https://github.com/transferwise/pipelinewise-target-snowflake/blob/master/target_snowflake/stream_utils.py#L69 Interestingly, it passes when tapping test (sandbox) Salesforce, but it fails with this error while running against the production Salesforce instance. Looks like some data fields in our production env cause this issue. Trying to switch from the default (transferwise) variant to meltanolabs (based on SDK). Can we use the meltanolabs (v0.1.0) in production?
Hm, so meltanolabs target-snowflake keeps failing with this error:
Copy code
2023-03-06T08:03:44.213924Z [info     ] sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 002002 (42710): 01aac2a3-0001-008b-0001-aed20033302a: SQL compilation error: cmd_type=elb consumer=True name=target-snowflake-sfdc producer=False stdio=stderr string_id=target-snowflake-sfdc
2023-03-06T08:03:44.214106Z [info     ] Object 'SFDC_INPUT_STAGE' already exists. cmd_type=elb consumer=True name=target-snowflake-sfdc producer=False stdio=stderr string_id=target-snowflake-sfdc
2023-03-06T08:03:44.214274Z [info     ] [SQL: CREATE SCHEMA "SFDC_INPUT_STAGE"] cmd_type=elb consumer=True name=target-snowflake-sfdc producer=False stdio=stderr string_id=target-snowflake-sfdc
2023-03-06T08:03:44.214440Z [info     ] (Background on this error at: <https://sqlalche.me/e/14/f405>) cmd_type=elb consumer=True name=target-snowflake-sfdc producer=False stdio=stderr string_id=target-snowflake-sfdc
I dropped the schema manually and re-executed
meltano run
, but it is still failing.
With this patch, I can confirm it fully works with the production Salesforce instance.
s
Hey @jan_soubusta, on the transferwise variants merges typically take some type (from my experience). Could you create the same fix on the meltanolabs variant? There we can merge within hours 😄 @visch @pat_nadolny maybe you can shine some light on the problem.
j
I can fix the meltanolabs variant, but the variant does not work for me as I described few comments above (CREATE SCHEMA fails)
a
Hi, @jan_soubusta. For context here, can you provide a bit more info into what if anything the tap is generating for
type
on this property? The error implies that there is a missing type definition on the property. I agree that it is better if we can make it non-fatal, but I'm curious if you could provide info on how/if the type is being declared by tap-salesforce. Second question: which tap-salesforce is this one?
j
Copy code
- name: tap-salesforce
      variant: meltanolabs
      pip_url: git+<https://github.com/meltanolabs/tap-salesforce.git>
      config:
        api_type: "BULK"
        select_fields_by_default: false
        start_date: "2021-01-01T00:00:00Z"
My guess is that tap-salesforce almost always generates
type
, but in rare cases, it is missed. Most likely our RevOps team (they are really inventive) created something rare in our production Salesforce, which, crawled by tap-salesforce, does not contain
type
. My understanding of the affected code is that with my fix some fields may not be converted to datetime format, so they are then loaded into a target as VARCHAR. Because almost always I post-process the loaded data with dbt, I can CAST such column. Once we finalize our solution, if I find such an affected field, I share it here. Now, I do not know which field is affected.
a
Thank you!
j
Btw incremental tap-salesforce is running 6 minutes in my pipeline, from which the discovery (in the beginning) takes over 4 minutes. Once the list of entities to be tapped is stable, may I add a cached file (the result of the discovery) into the Docker image, so discovery is not triggered? Any other solution?
a
may I add a cached file (the result of the discovery) into the Docker image, so discovery is not triggered?
Yep!
discovery (in the beginning) takes over 4 minutes
Discovery on Salesforce is classically pretty slow in general. Caching makes sense.