I have been thinking about this one a bit, and a couple of things occur to me. One thing we could do is to accept an optional “schema” config, that would basically be the output of the discover command. But I’m not sure that takes us where we want to be.
The thing with this API is that schema discovery also answers if there is new data that should be loaded, so removing the schema api call means we probably have to do a call for the actual data. In the current implementation I haven’t found how to stop the run from inside the schema method, so it ends up loading at least the latest batch of data, but if we figure that one out, we would substantially reduce the number of API requests. Maybe @Edgar Ramírez (Arch.dev) has some ideas?
tl;dr: today, every invocation creates two calls: one call is “get the schema and find out if there is new data”, the other call is to get the new data. If we can avoid getting data if there is nothing new, that would be positive wrt the rate limiting.
Update: reading through the thinking about select and schema discovery, yes schema discovery in the sense of discovering which tables are available takes a looong time the way that has to be done. It hadn’t really occurred to me to do it like that. So I was basically addressing a different and much smaller issue regarding discovering the columns in a given dataset.