If a data source has columns which are variable in...
# singer-tap-development
h
If a data source has columns which are variable in nature, specific to the caller of the API (the account specifically), what is the best way to handle this in Stream implementation? Some options I've thought of: • Make the user specify all account specific columns as part of the configuration • Dynamically produce the schema based on the response from the data source
d
The simplest example that comes to my mind is tap-csv. CSV-files by it’s nature depends in the content itself and each file could be different. So the tap generates schema each time it sees a new file.
h
In the case of the tap-csv it is producing the header row by opening the file and checking the data. In my case this would be an additional API call. Is it not possible to just produce the schema upon the first real API call, or does the schema have to be returned before any call to
get_records
is made?
d
The
schema
property is called during
init
of the
Stream
(https://github.com/meltano/sdk/blob/main/singer_sdk/streams/core.py#L200) That’s why
tap-csv
calls
get_rows()
in
schema
just to get the first row and return schema based on the file’s header (https://github.com/MeltanoLabs/tap-csv/blob/main/tap_csv/client.py#L132) There is multiple other approaches: https://github.com/search?o=desc&q=tap+singer+sdk&s=updated&type=Repositories