Rafał
06/05/2025, 10:36 AMdef discover_streams
unclear, if it's called even if --catalog is passed. I'd expect it to be called only without the catalog or with --discover, when a discovery is actually neededEdgar Ramírez (Arch.dev)
06/05/2025, 1:59 PMregister_streams
.Rafał
06/05/2025, 2:40 PMEdgar Ramírez (Arch.dev)
06/05/2025, 2:44 PMdiscover_streams
when --catalog
is passed that you're noticing? We have to register the streams at some point, either to generate the catalog for --discover
or to sync them.Edgar Ramírez (Arch.dev)
06/05/2025, 2:44 PMRafał
06/05/2025, 2:48 PMregister_streams
suggest a side effect. Is it to pass its streams somewhere?
I thought the tap would do actual discovery when either in discovery mode or sync mode without input catalog, but I find it's called in sync mode with catalog, in which case the streams are not really discovered, but instead tap must return it.
I thought in this case SDK would infer them from catalogEdgar Ramírez (Arch.dev)
06/05/2025, 3:03 PMdiscover_streams
has to run regardless of whether the tap is running in DISCOVERY or SYNC mode.
But I'm curious if there's any unexpected behavior or bug that you're running into. Perhaps you're expecting streams that are not in the catalog to not be instantiated?Rafał
06/05/2025, 4:30 PMsync_all
(either directly or via -> _set_compatible_replication_methods
) -> streams
-> load_streams
-> discover_streams
I don't see any conditions in that call hierarchy that would avoid itEdgar Ramírez (Arch.dev)
06/05/2025, 5:59 PMRafał
06/05/2025, 6:05 PMReuben (Matatika)
06/05/2025, 6:12 PMdiscover_streams
isn't actually responsible for running the discovery process, despite the method name (IIRC that falls to discover_catalog_entries
). I do agree it is a bit confusing.Rafał
06/05/2025, 6:14 PMReuben (Matatika)
06/05/2025, 6:49 PMRafał
06/05/2025, 6:50 PMReuben (Matatika)
06/05/2025, 6:52 PMstreams.py
). I guess you could think of it as a reflection of the upstream source.Rafał
06/05/2025, 6:54 PMRafał
06/05/2025, 6:54 PMReuben (Matatika)
06/05/2025, 7:06 PMExcelStream
(or whatever client stream class you have) for each sheet in the file. I imagine it would be very similar to our tap-google-sheets
implementation: https://github.com/Matatika/tap-google-sheets/blob/7994d3ad40c0bb5f1400b1539a4cf4a8d5f59880/tap_google_sheets/tap.py#L94:L125Rafał
06/05/2025, 7:07 PMReuben (Matatika)
06/05/2025, 7:22 PMRafał
06/05/2025, 7:29 PMReuben (Matatika)
06/05/2025, 8:07 PMdef discover_streams(self) -> list[streams.ExcelStream]:
if self.input_catalog:
streams = []
for name, entry in self.input_catalog.items():
stream = streams.ExcelStream(self, name, entry.schema)
stream.apply_catalog(self.input_catalog)
streams.append(stream)
return streams
# else dynamic discovery logic
Rafał
06/05/2025, 8:16 PMif self.input_catalog:
# build streams from catalog
else:
# discover streams
looks to me it should be done in SDK and discover_streams
only used for discoveryReuben (Matatika)
06/05/2025, 8:17 PMEdgar Ramírez (Arch.dev)
06/05/2025, 8:28 PMRafał
06/09/2025, 6:11 PM