tap-salesforce strikes back. I have: ``` - name: ...
# troubleshooting
j
tap-salesforce strikes back. I have:
Copy code
- name: tap-salesforce
    variant: meltanolabs
    pip_url: git+<https://github.com/meltanolabs/tap-salesforce.git>
    select:
    - LeadHistory.LeadId
    - LeadHistory.CreatedDate
    - LeadHistory.Field
    - LeadHistory.NewValue
    - LeadHistory.OldValue
Columns corresponding to
NewValue
and
OldValue
are not created in the target table. Loading to Snowflake.
meltano select tap-salesforce --list
shows these two properties as available and selected. Any clue what am I doing wrong or how to debug it?
v
to debug I"d run
meltano invoke tap-salesforce
and see if there is any data with oldvalue / newvalue first
j
So: • LeadHistory, ContactHistory - New/OldValue fields are tapped OK, target-jsonl proofs it, but they are not loaded into Snowflake • OpportunityHIstory - New/OldValue fields are NOT tapped at all,
meltano select
does not show them as selected. Not sure if they should exist.
v
LeadHistory, ContactHistory - New/OldValue fields are tapped OK, target-jsonl proofs it, but they are not loaded into Snowflake
Hmm, does the target have a bug or something? Which snowflake target. I'm not a snowflake expert but maybe someone else will be able to see it or maybe I'll get lucky and see it. Another thing you could do just to rule out any caching or something odd is run
meltano invoke tap-salesforce > out
(with leadhistory new/old values, verify they are in this output file)
cat out | meltano invoke target-snowflake
If you still don't get the data into snowflake that you know for sure it's the target causing the issue
j
Thanks! Will try tomorrow, it's quite late in Prague šŸ˜‰
Strange it is.
out
contains:
Copy code
{
  "type": "RECORD",
  "stream": "LeadHistory",
  "record": {
    "Id": "0178J00003gKvuIQAS",
    "LeadId": "00Q8J000002GdFZUA0",
    "CreatedDate": "2023-01-01T19:34:24.000000Z",
    "Field": "LeadSource_Original_Text__c",
    "OldValue": null,
    "NewValue": "GD Cloud Trial Registration"
  },
  "version": 1680246159725,
  "time_extracted": "2023-03-31T07:02:39.734217Z"
}
So Old/NewValue are there. Then I drop
leadhistory
table and run:
Copy code
cat out | meltano invoke target-postgres
debug:
Copy code
time=2023-03-31 10:13:50 name=target_postgres level=INFO message=Table '"leadhistory"' does not exist. Creating... CREATE TABLE IF NOT EXISTS sfdc_input_stage."leadhistory" ("createddate" timestamp without time zone, "field" character varying, "id" character varying, "leadid" character varying, "_sdc_batched_at" timestamp without time zone, "_sdc_deleted_at" character varying, "_sdc_extracted_at" timestamp without time zone, PRIMARY KEY ("id"))
Columns Old/NewValue are not created and obviously not loaded.
Bumping versions of everything, recreating venv, recreating docker volumes/containers.
Did not help ;-( Not sure what more I can do....
OK, I checked the context of the out file (result of tap invoke):
Copy code
{
    "type": "SCHEMA",
    "stream": "LeadHistory",
    "schema": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "Id": {
          "type": "string"
        },
        "LeadId": {
          "type": [
            "null",
            "string"
          ]
        },
        "CreatedDate": {
          "anyOf": [
            {
              "type": "string",
              "format": "date-time"
            },
            {
              "type": [
                "string",
                "null"
              ]
            }
          ]
        },
        "Field": {
          "type": [
            "null",
            "string"
          ]
        },
        "OldValue": {},
        "NewValue": {}
      }
    },
  },
The schema for OldValues and NewValue does not contain a definition of type! Is it a bug in the tap?
I forked the tap. Trying to find out why this is happening...
v
Glad you found the root cause! Yeah it's a tap issue, I didn't think about the schema just being wrong! Could fake this with meltano's schema override if you just need a fix, but the tap really should do this right as you say!
j
Well, not sure. I dived deeper and ended up in Salesforce SDK.
Copy code
def field_to_property_schema(field, mdata):
    ......
    elif sf_type in LOOSE_TYPES:
        return property_schema, mdata  # No type = all types
Old/NewValue properties are of LOOSE_TYPE, they are
anyType
. The tap calls this method:
property_schema, mdata = salesforce.field_to_property_schema(field, mdata)
So the result is produced by the Salesforce SDK.
These properties seem to be anyType by design
v
Coerce it to a string or an object maybe? That's the best idea I have! General fix is probably that, specefic fix for you is probably just meltano schema override the schema with strings and call it good?
j
I am going to create a pull request, and add something more there. I am going to inject "object" type and test it with Postgres and SNowflake
Well, too simple šŸ˜‰
Copy code
NewValue: 13000 does not match {'type': ['object', 'null']}
Have to list all possible types
Eh, it does not work like this
It picks the last type from the list and applies it on target - now my Old/NewValues columns exist but they are BOOLEAN šŸ˜‰
Hm, not sure how to do this, going to ask in the main channel
b
hey @jan_soubusta did you ever get this resolved? My teammate is working with tap-salesforce adn running into the same issue
j
Well, yes and no. I created a pull request handling anyType as strings, but it was not accepted, because it could break something else (I agree). Only the speed-up of the discovery was merged (a new property streams_to_discover introduced). Instead, we agreed on another solution - override schema, define exact types there. Here is the beginning of Salesforce definition in our meltano.yml:
Copy code
plugins:
  extractors:
  - name: tap-salesforce
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-salesforce.git>
    config:
      api_type: BULK
      is_sandbox: ${SALESFORCE_SANDBOX}
      select_fields_by_default: false
      start_date: "2022-01-01T00:00:00Z"
      streams_to_discover:
        - Account
        - Lead
        - LeadHistory
        - Contact
        - ContactHistory
        - Opportunity
        - OpportunityHistory
        - OpportunityFieldHistory
        - Campaign
        - CampaignMember
        - OpportunityContactRole
        - User
        - UserRole
        - Event
        - Task
    schema:
      LeadHistory:
        NewValue:
          type: [ "string", "null" ]
        OldValue:
          type: [ "string", "null" ]
      ContactHistory:
        NewValue:
          type: [ "string", "null" ]
        OldValue:
          type: [ "string", "null" ]
      OpportunityFieldHistory:
        NewValue:
          type: [ "string", "null" ]
        OldValue:
          type: [ "string", "null" ]
b
thanks @jan_soubusta i actually found your PR and added those changes to our forked target-snowflake, which did seem to fix the issue, but I have not done enough testing to see if it will impact anything else. I went ahead with this solution you posted and that seems to get us through the issue. Thank you!!!