hey everyone, my team and I are running into an is...
# troubleshooting
d
hey everyone, my team and I are running into an issue with our
tap-salesforce (meltanolabs variant)
. We recently added a new field (
Owner_Team__c
) on the salesforce side but keep getting this error when we run our meltano EL jobs for the salesforce pipeline. See Error below `WARNING Removed paths list: ['Owner_Team__c\\']`We think it might be an issue with discovery but one thing we notice is that when we dump the
catalog.json
into a local .json file we see the new field. Followed steps in this link
Copy code
2024-06-18T20:32:11.296315Z [info     ] 	Owner_Team__c\                cmd_type=extractor name=tap-salesforce run_id=965dedeb-d6c1-4347-9c3c-4372848192d4 state_id=2024-06-18T202910--tap-salesforce--target-duckdb stdio=stderr
2024-06-18T20:32:11.296491Z [info     ] WARNING Removed paths list: ['Owner_Team__c\\'] cmd_type=extractor name=tap-salesforce run_id=965dedeb-d6c1-4347-9c3c-4372848192d4 state_id=2024-06-18T202910--tap-salesforce--target-duckdb stdio=stderr
2024-06-18T20:32:11.297983Z [info     ] WARNING Removed 1 paths during transforms: cmd_type=extractor name=tap-salesforce run_id=965dedeb-d6c1-4347-9c3c-4372848192d4 state_id=2024-06-18T202910--tap-salesforce--target-duckdb stdio=stderr
2024-06-18T20:32:11.298118Z [info     ] 	Owner_Team__c\                cmd_type=extractor name=tap-salesforce run_id=965dedeb-d6c1-4347-9c3c-4372848192d4 state_id=2024-06-18T202910--tap-salesforce--target-duckdb stdio=stderr
2024-06-18T20:32:11.298240Z [info     ] WARNING Removed paths list: ['Owner_Team__c\\'] cmd_type=extractor name=tap-salesforce run_id=965dedeb-d6c1-4347-9c3c-4372848192d4 state_id=2024-06-18T202910--tap-salesforce--target-duckdb stdio=stderr
2024-06-18T20:32:11.299635Z [info     ] WARNING Removed 1 paths during transforms: cmd_type=extractor name=tap-salesforce run_id=965dedeb-d6c1-4347-9c3c-4372848192d4 state_id=2024-06-18T202910--tap-salesforce--target-duckdb stdio=stderr
e
Can you try re-installing the tap with
meltano install --clean
?
d
i will try that and get back to you, thanks Edgar
👍 1
@Edgar RamĂ­rez (Arch.dev) i'm still running into the issue after running that command
e
What does your
meltano.yml
look like?
d
Copy code
version: 1
send_anonymous_usage_stats: false
project_id: 7d46dd40-62fc-40ce-9425-900b7dc1970a
include_paths:
- ./config/**/*.yml
plugins:
  loaders:
  - name: target-duckdb
    namespace: target_duckdb
    pip_url: target-duckdb
    executable: target-duckdb
  utilities:
  - name: dbt-redshift
    variant: dbt-labs
    pip_url: dbt-core==1.7.13 dbt-redshift==1.7.7 pytz==2021.1 git+<https://github.com/meltano/dbt-ext.git@main>
    settings:
    - name: target_schema
      label: Target Schema
      env: DBT_TARGET_REDSHIFT_SCHEMA
      value: ds
    - name: target_schema_prefix
      label: Target Schema PREFIX
      env: DBT_TARGET_SCHEMA_PREFIX
      value: ${USER_PREFIX}
    config:
      target: redshift
  - name: airflow
    variant: apache
    pip_url: git+<https://github.com/meltano/airflow-ext.git@f763fd788b2d10c57f25132adc635583a85a7c05> apache-airflow==2.6.3 --constraint <https://raw.githubusercontent.com/apache/airflow/constraints-2.6.3/constraints-${MELTANO__PYTHON_VERSION}.txt> apache-airflow-providers-postgres apache-airflow-providers-amazon apache-airflow-providers-slack pandas boto3 requests
    settings:
    - name: core.executor
      label: Core Executor
      value: LocalExecutor
      env: AIRFLOW__CORE__EXECUTOR    
    - name: core.dags_folder
      label: Dags Folder
      value: $MELTANO_PROJECT_ROOT/orchestrate/dags
      env: AIRFLOW__CORE__DAGS_FOLDER    
    - name: database.sql_alchemy_conn
      label: Database SQL Alchemy Connection
      value: ${AIRFLOW__DATABASE__SQL_ALCHEMY_CONN}
      env: AIRFLOW__DATABASE__SQL_ALCHEMY_CONN    
    - name: smtp_host
      label: SMTP Host
      value: ${AIRFLOW__SMTP__SMTP_HOST}
      env: AIRFLOW__SMTP__SMTP_HOST
    - name: smtp_starttls
      label: SMTP Starttls
      value: ${AIRFLOW__SMTP__SMTP_STARTTLS}
      env: AIRFLOW__SMTP__SMTP_STARTTLS
    - name: smtp_ssl
      label: SMTP SSL
      value: ${AIRFLOW__SMTP__SMTP_SSL}
      env: AIRFLOW__SMTP__SMTP_SSL
    - name: smtp_user
      label: SMTP User
      value: ${AIRFLOW__SMTP__SMTP_USER}
      env: AIRFLOW__SMTP__SMTP_USER
    - name: smtp_password
      label: SMTP Password
      value: ${AIRFLOW__SMTP__SMTP_PASSWORD}
      env: AIRFLOW__SMTP__SMTP_PASSWORD
    - name: smtp_port
      label: SMTP Port
      value: ${AIRFLOW__SMTP__SMTP_PORT}
      env: AIRFLOW__SMTP__SMTP_PORT
    - name: smtp_mail_from
      label: SMTP Mail From
      value: ${AIRFLOW__SMTP__SMTP_MAIL_FROM}
      env: AIRFLOW__SMTP__SMTP_MAIL_FROM
    - name: webserver.base_url
      label: Base URL of Website
      value: ${AIRFLOW__WEBSERVER__BASE_URL}
      env: AIRFLOW__WEBSERVER__BASE_URL  
  - name: sqlfluff
    variant: sqlfluff
    pip_url: sqlfluff==2.3.5 sqlfluff-templater-dbt==2.3.5 dbt-core==1.7.0 dbt-redshift==1.7.7
    settings:
    - name: target_schema_prefix
      label: Target Schema PREFIX
      env: DBT_TARGET_SCHEMA_PREFIX
      value: ${USER_PREFIX}
    - name: target_schema
      label: Target Schema
      env: DBT_TARGET_REDSHIFT_SCHEMA
      value: ds    
    - name: user
      env: DBT_REDSHIFT_USER
      value: ${DBT_REDSHIFT_USER}
    - name: host
      env: TARGET_REDSHIFT_HOST
    - name: port
      env: TARGET_REDSHIFT_PORT
    - name: password
      env: DBT_REDSHIFT_PASSWORD
      value: ${DBT_REDSHIFT_PASSWORD}
    - name: dbname
      env: TARGET_REDSHIFT_DBNAME
schedules:
- name: postgres-salesforce-to-redshift-portal
  extractor: tap-salesforce
  loader: target-redshift--salesforce
  transform: skip
  interval: '@hourly'
  start_date: 2021-09-24 17:39:28.048203
our specific
salesforce.yml
is this
Copy code
plugins:
  extractors:
  - name: tap-salesforce
    pip_url: -e extract/tap-salesforce
    config:
      api_type: BULK
      start_date: '2019-01-01T00:00:00Z'
    capabilities:
    - properties
    - discover
    - state
    settings:
    - name: client_id
      env: TAP_SALESFORCE_CLIENT_ID
    - name: client_secret
      env: TAP_SALESFORCE_CLIENT_SECRET
      kind: password
    - name: refresh_token
      env: TAP_SALESFORCE_REFRESH_TOKEN
      kind: password
    select:
    - Account.*
    - AccountHistory.*
we forked the meltano labs project
e
Gotcha. Can you try explicitly selecting the new field?
Copy code
select:
    - Account.*
    - Account.Owner_Team__c
    - AccountHistory.*
d
ok let me give that a go too
i tried and still encountering the same issue 😞 . •
meltano install --clean
• updated
salesforce.yml
with
Account.Owner_Team__c
• ran my meltano elt command but still encountering issues • warning -
WARNING Removed paths list: ['Owner_Team__c\\']
;
WARNING Removed 1 paths during transforms: cmd_type=extractor name=tap-salesforce run_id=224a0b3a-1df0-4530-86f3-9a7864d1dbb4 state_id=2024-06-20T132724--tap-salesforce--target-duckdb stdio=stderr
•
2024-06-20T13:29:55.142462Z [info     ] 	Owner_Team__c\
e
Ok, can you share here the catalog output:
Copy code
meltano invoke tap-salesforce > catalog.json
d
ok give me a sec
e
It may have to be
meltano invoke --dump=catalog tap-salesforce > catalog.json
👍 1
d
here you go @Edgar RamĂ­rez (Arch.dev)
catalog2.json
hey so that catalog.json has a bunch of customer data so best not to share that but i believe this part is what you are looking for ?
does the above
.json
work ?
e
Yeah. So the
Owner_Team__c
field does seem to be present there but it's the
Opportunity
stream? 🤔
d
yep it's for the opportunity stream
and yes also noticed it's in the stream
e
You may have to select that then
Copy code
select:
    - Account.*
    - AccountHistory.*
    - Opportunity.*
d
oh i did, i only sent you a sample before
e
gotcha
d
😞
e
let's try
Copy code
plugins:
  extractors:
  - name: tap-salesforce
    schema:
      Opportunity:
        Owner_Team__c:
          type: ["string", "null"]
d
ok let me try
do you think this is a data type issue from the salesforce side ? maybe singer is expecting a particular length for string data type ?
e
> do you think this is a data type issue from the salesforce side ? maybe singer is expecting a particular length for string data type ? I don't think it's anything like that. Sounds more like a caching issue. Specially if the tap generates a SCHEMA message with the field but it's still removed from the records • https://github.com/singer-io/singer-python/blob/d6f0d2026645d7cc45b01a6116701e3564b42628/singer/transform.py#L216-L222 • https://github.com/singer-io/singer-python/blob/d6f0d2026645d7cc45b01a6116701e3564b42628/singer/transform.py#L111-L116
d
still same issue after trying what you recommended above btw
let me take a look at that link
yeah we've actually seen that link too but unsure how to 'bust' the cache
e
Do you see a file in your local
.meltano/run/tap-salesforce
directory?
d
yep here's the json in that file
e
Ok the field is there and it seems to be select. I'm really at a loss here 😕
d
i know 😞