tap salesforce gt target snowflake fails with `KeyError type Meltano #troubleshooting

tap-salesforce -> target-snowflake fails with `...

jan_soubusta

03/06/2023, 7:44 AM

tap-salesforce -> target-snowflake fails with

KeyError: 'type'

on this line: https://github.com/transferwise/pipelinewise-target-snowflake/blob/master/target_snowflake/stream_utils.py#L69 Interestingly, it passes when tapping test (sandbox) Salesforce, but it fails with this error while running against the production Salesforce instance. Looks like some data fields in our production env cause this issue. Trying to switch from the default (transferwise) variant to meltanolabs (based on SDK). Can we use the meltanolabs (v0.1.0) in production?

jan_soubusta

03/06/2023, 8:05 AM

Hm, so meltanolabs target-snowflake keeps failing with this error:

Copy code

2023-03-06T08:03:44.213924Z [info     ] sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 002002 (42710): 01aac2a3-0001-008b-0001-aed20033302a: SQL compilation error: cmd_type=elb consumer=True name=target-snowflake-sfdc producer=False stdio=stderr string_id=target-snowflake-sfdc
2023-03-06T08:03:44.214106Z [info     ] Object 'SFDC_INPUT_STAGE' already exists. cmd_type=elb consumer=True name=target-snowflake-sfdc producer=False stdio=stderr string_id=target-snowflake-sfdc
2023-03-06T08:03:44.214274Z [info     ] [SQL: CREATE SCHEMA "SFDC_INPUT_STAGE"] cmd_type=elb consumer=True name=target-snowflake-sfdc producer=False stdio=stderr string_id=target-snowflake-sfdc
2023-03-06T08:03:44.214440Z [info     ] (Background on this error at: <https://sqlalche.me/e/14/f405>) cmd_type=elb consumer=True name=target-snowflake-sfdc producer=False stdio=stderr string_id=target-snowflake-sfdc

I dropped the schema manually and re-executed

meltano run

, but it is still failing.

jan_soubusta

03/06/2023, 8:20 AM

Regarding transferwise: https://github.com/transferwise/pipelinewise-target-snowflake/issues/357

jan_soubusta

03/06/2023, 8:22 AM

Can you merge a minimal fix: https://github.com/transferwise/pipelinewise-target-snowflake/pull/358

jan_soubusta

03/06/2023, 9:48 AM

With this patch, I can confirm it fully works with the production Salesforce instance.

Sven Balnojan

03/06/2023, 2:56 PM

Hey @jan_soubusta, on the transferwise variants merges typically take some type (from my experience). Could you create the same fix on the meltanolabs variant? There we can merge within hours 😄 @visch @pat_nadolny maybe you can shine some light on the problem.

jan_soubusta

03/06/2023, 3:36 PM

I can fix the meltanolabs variant, but the variant does not work for me as I described few comments above (CREATE SCHEMA fails)

aaronsteers

03/06/2023, 8:29 PM

Hi, @jan_soubusta. For context here, can you provide a bit more info into what if anything the tap is generating for

type

on this property? The error implies that there is a missing type definition on the property. I agree that it is better if we can make it non-fatal, but I'm curious if you could provide info on how/if the type is being declared by tap-salesforce. Second question: which tap-salesforce is this one?

jan_soubusta

03/07/2023, 8:17 AM

Copy code

- name: tap-salesforce
      variant: meltanolabs
      pip_url: git+<https://github.com/meltanolabs/tap-salesforce.git>
      config:
        api_type: "BULK"
        select_fields_by_default: false
        start_date: "2021-01-01T00:00:00Z"

My guess is that tap-salesforce almost always generates

type

, but in rare cases, it is missed. Most likely our RevOps team (they are really inventive) created something rare in our production Salesforce, which, crawled by tap-salesforce, does not contain

type

. My understanding of the affected code is that with my fix some fields may not be converted to datetime format, so they are then loaded into a target as VARCHAR. Because almost always I post-process the loaded data with dbt, I can CAST such column. Once we finalize our solution, if I find such an affected field, I share it here. Now, I do not know which field is affected.

aaronsteers

03/07/2023, 4:47 PM

Thank you!

jan_soubusta

03/07/2023, 7:36 PM

Btw incremental tap-salesforce is running 6 minutes in my pipeline, from which the discovery (in the beginning) takes over 4 minutes. Once the list of entities to be tapped is stable, may I add a cached file (the result of the discovery) into the Docker image, so discovery is not triggered? Any other solution?

aaronsteers

03/07/2023, 8:35 PM

may I add a cached file (the result of the discovery) into the Docker image, so discovery is not triggered?

Yep!

discovery (in the beginning) takes over 4 minutes

Discovery on Salesforce is classically pretty slow in general. Caching makes sense.

Open in Slack

Previous Next