Hi all, I've just started using meltano and testi...
# getting-started
l
Hi all, I've just started using meltano and testing, but I'm at a loss. I'm using https://github.com/MeltanoLabs/tap-salesforce and been able to set this up and get it to work, but not the way I expect it to and can't figure out why. Please see my meltano yml for this tap below, using this as is without a catalog results into an empty run as if nothing is selected. If I point it to a catalog by uncommenting it, it tells me that it doesn't have the capabilities for which reason I added it manually. Running a simple el still gives me nothing and nothing is selected from the catalog. Only when I point it at a different catalog I made named filtered_tap-salesforce.catalog.json and run el it gives me an output, but not as expected as it outputs both Account and Opportunity. Difference between the 2 catalogs is that the filtered only includes Account and Opportunity and in the metadata for each property "selected": true is added, whereas the other catalog doesn't have this. I'm very confused why only this setup will let me extract data as my understanding is that I should be able to use the select block to set this and have meltano deal with the discovery, but it doesn't select anything if I run it this way. And even when I use a overall catalog, why does it ignore the select block and return with nothing. Would like to be able to run the tap without having to define for all 1000+ objects and many more properties "selected": true and be able to handle new fields. Thanks!
Copy code
# --- Salesforce ---
  - name: tap-salesforce
    namespace: tap_salesforce_meltanolabs
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-salesforce.git@v1.9.0>
    capabilities:
    - catalog
    - discover
    - state
    #catalog: catalogs/tap-salesforce.catalog.json
    config:
      username: ${SF_USERNAME}
      password: ${SF_PASSWORD}
      security_token: ${SF_SECURITY_TOKEN}
      api_type: REST
      start_date: '2025-07-01T00:00:00Z'
      select_fields_by_default: true
      #streams_to_discover:
      #- Account
      #- Opportunity
      select:
      #- Account.*
      - Opportunity.*
✅ 1
m
If you want all fields and objects you don't need the
select_fields_by_default
just use:
Copy code
select:
  - '*.*'
There will be some objects that Salesforce includes in the metadata catalog that won't properly sync via the tap, so you can exclude them manually, or just exclude them if you don't want them:
Copy code
select:
  - '*.*'
  - '!IgnoredObject.*'
  - '!IncludedObject.IgnoredField'
Also keep in mind as you're testing that the tap does incremental syncs so if you change things and re-run without clearing your state, it might not pull in records simply because no new records are available between your last run and the current one. You can use
--full-refresh
to have it ignore the previous run state while you're testing
l
Hi Mark, Thank you for your reply! Sorry for the confusion and to be clear I don't want all objects, I want to be able to simply select the tables using the below
Copy code
select:
- Account.*
- Opportunity.*
And not having to create a custom catalog. I have ran it where it selected the complete catalog, but my problem is that I want this select part to work and can't figure out why it doesn't for me. I've been using
meltano el
to make sure that state doesn't play a role, or have I also not understood this right?
m
I'd try it with just the essential fields (the pip URL is only needed if you want to pin the version):
Copy code
- name: tap-salesforce
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-salesforce.git@v1.9.0>
    config:
      username: ${SF_USERNAME}
      password: ${SF_PASSWORD}
      security_token: ${SF_SECURITY_TOKEN}
      start_date: '2025-07-01T00:00:00Z'
    select:
    - Account.*
    - Opportunity.*
And run it with:
meltano run --full-refresh tap-salesforce target-[...]
Oh I just noticed... select isn't indented correctly 😅
🫠 1
l
Thank you for noticing, that was a rookie mistake haha. Even when I changed this and ran it with the below settings it still not giving me anything, see output below. Only when I run
meltano --environment=prod invoke tap-salesforce --discover
with
capabilities:
defined and then look at
select tap-salesforce --list
it gives me a big list of all objects and properties, but all selected. So when I run
el
or
run
it then ignores the
select:
again and starts syncing all 1100 objects. I'm wondering what is going on as from looking around here and the docs, it should work with the below. Very much open for suggestions on next steps as really hope there is a more dynamic way of selecting other than a manual catalog. ps. only dev difference is a overwrite for the date, nothing else
Copy code
# --- Salesforce ---
  - name: tap-salesforce
    namespace: tap_salesforce_meltanolabs
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-salesforce.git@v1.9.0>
    #capabilities:
    #- catalog
    #- discover
    #- state
    #catalog: catalogs/tap-salesforce.catalog.json
    config:
      username: ${SF_USERNAME}
      password: ${SF_PASSWORD}
      security_token: ${SF_SECURITY_TOKEN}
      # Use BULK for large historical loads, REST for smaller incrementals
      api_type: REST
      start_date: '2025-07-01T00:00:00Z'
      select_fields_by_default: true
      #streams_to_discover:
      #- Account
      #- Opportunity
    select:
      #- Account.*
      - Opportunity.*
Copy code
(.wsl-venv) lennartklomp@UK-L-LKLOMP:/mnt/c/Users/LennartKlomp/data-platform$ meltano --environment=dev el tap-salesforce target-jsonl
2025-07-17T08:20:46.519741Z [info     ] Environment 'dev' is active   
2025-07-17T08:20:47.841309Z [info     ] Running extract & load...      name=meltano run_id=fee0a499-7e2d-41f4-a725-16395a453b70 state_id=2025-07-17T082047--tap-salesforce--target-jsonl
2025-07-17T08:20:51.818638Z [info     ] INFO Parsed start date '2025-07-15T00:00:00+00:00' from value '2025-07-15' cmd_type=extractor name=tap-salesforce run_id=fee0a499-7e2d-41f4-a725-16395a453b70 state_id=2025-07-17T082047--tap-salesforce--target-jsonl stdio=stderr
2025-07-17T08:20:52.643465Z [info     ] Extract & load complete!       name=meltano run_id=fee0a499-7e2d-41f4-a725-16395a453b70 state_id=2025-07-17T082047--tap-salesforce--target-jsonl
2025-07-17T08:20:52.644263Z [info     ] Transformation skipped.        name=meltano run_id=fee0a499-7e2d-41f4-a725-16395a453b70 state_id=2025-07-17T082047--tap-salesforce--target-jsonl
Copy code
(.wsl-venv) lennartklomp@UK-L-LKLOMP:/mnt/c/Users/LennartKlomp/data-platform$ meltano run --full-refresh tap-salesforce target-jsonl
2025-07-17T08:22:16.446775Z [info     ] Environment 'dev' is active   
2025-07-17T08:22:17.483212Z [info     ] Marked stale run that started at 2025-07-16 15:35:05.243537+00:00 as failed: No heartbeat recorded for 5 minutes. The process was likely killed unceremoniously.
2025-07-17T08:22:17.809863Z [warning  ] A catalog file was found, but it will be ignored as the extractor does not advertise the `catalog` or `properties` capability
2025-07-17T08:22:21.213447Z [info     ] INFO Parsed start date '2025-07-15T00:00:00+00:00' from value '2025-07-15' cmd_type=elb consumer=False job_name=dev:tap-salesforce-to-target-jsonl name=tap-salesforce producer=True run_id=4241f369-7795-4c6e-a84b-a1cac3a26fa7 stdio=stderr string_id=tap-salesforce
2025-07-17T08:22:22.054044Z [info     ] Block run completed.           block_type=ExtractLoadBlocks err=None set_number=0 success=True
Copy code
(.wsl-venv) lennartklomp@UK-L-LKLOMP:/mnt/c/Users/LennartKlomp/data-platform$ meltano --environment=dev select tap-salesforce --list 
2025-07-17T09:11:07.908188Z [info     ] Environment 'dev' is active   
2025-07-17T09:11:08.997655Z [warning  ] A catalog file was found, but it will be ignored as the extractor does not advertise the `catalog` or `properties` capability
Legend:
        selected
        excluded
        automatic
        unsupported

Enabled patterns:
        Opportunity.*

Selected attributes:
m
Well I'm out of ideas other than I've use this config before and it works just fine. It's curious though your log says
INFO Parsed start date '2025-07-15T00:00:00+00:00' from value '2025-07-15'
when your start date is not that. If you run
meltano config tap-salesforce list
do you see all the expected settings from your YML file or is it pulling some from elsewhere (DB or environment)?
l
Interesting, really hope there is someone that can help me understand the select behavior and how I can find out what it should be doing. Might also try a complete clean re-install of everything to rule anything else out. The date difference was because of dev env overwrite for this so yes this is as expected
What were the exact steps you took and settings you used to get this running with
select:
just for me to test. Did you run any discovery or did it work without running this for example?
m
I run Meltano in a Docker container so I can just spin up a copy, drop in the YML (I have our secrets in a .env that just uses the tap default names), and then run it from the CLI. I don't use a central DB so containers will make their own local meltano.db for state.
Also my work PC is cursed with Windows and trying to run Meltano natively usually runs into tons of weird problems, so Docker is also a great way to get consistent behavior between my local tests and running it on my infrastructure 😅
e
maybe
Copy code
meltano --environment=dev select tap-salesforce --list --all
reveals a bit more of the stream name patterns
l
@Edgar Ramírez (Arch.dev) after I done clean install it did give me a list, see below. When I then use
meltano --environment=dev run tap-salesforce target-jsonl
it again ignores
select:
and starts syncing all tables. Thinking it is because of
Copy code
Enabled patterns:
	Opportunity.*
	*.*
I ran meltano --environment=dev select tap-salesforce --exclude "*." resulting in the attached new list. It has excluded all as expected and thought this would maybe run the actual Opportunity. but
meltano --environment=dev run tap-salesforce target-jsonl
gave me an completely empty run this time, again ignoring my
select:
. I'm at a complete loss here! I also noticed others don't seem to use the
select_fields_by_default: true
in config, but if I do this I get the below error
Copy code
(.wsl-venv) lennartklomp@UK-L-LKLOMP:/mnt/c/Users/LennartKlomp/data-platform$ meltano --environment=dev el tap-salesforce target-jsonl 
2025-07-18T09:50:03.415432Z [info     ] Environment 'dev' is active   
2025-07-18T09:50:04.452826Z [info     ] Running extract & load...      name=meltano run_id=493954d4-43ff-479b-b99a-c28fffaf7a97 state_id=2025-07-18T095004--tap-salesforce--target-jsonl
2025-07-18T09:50:07.439879Z [info     ] CRITICAL Config is missing required keys: ['select_fields_by_default']
Other experiment was trying it with and without, as I don't see this anywhere else either
Copy code
capabilities:
    - catalog
    - discover
    - state
Running it without it gives me an empty run like above, but running it with this capabilities it does only select Opportunity, but not all fields. Then running it again with
Copy code
select:
    - Account.*
    #- Opportunity.*
It does the same run on Opportunities again, thus ignoring my
select:
again.
meltano --environment=dev select tap-salesforce --list
shows why, see 'current_selected_fields'. I've made a mess of the selection here with my commands, but don't know how to clean this up again. More crucially, this is still not ideal behavior for prod, as here I can't manually change the selection for this. Would love to hear your thoughts on how to get
select:
or how to work with this limitation in prod, guess I would need to keep a filtered catalog up to date which is not ideal if fields are changed/added (which it does a lot in Salesforce). Is there any other dynamic way to deal with this in a catalog, for example selected:True but for the whole table, not also all separate properties? Thanks again!
Managed to solve it, was a mistake in the yml on my side. Thanks all
🙌 1
m
Glad you sorted it out! This made me think of this meme:
😅 1
e
I'm curious what the yml mistake was. I'm sure Meltano could do a better job of detecting it 🙂