my main concern, Chip brings in this part
“”Moving prediction to streaming processing while leaving model training as a batch process, however, means that there are now two data pipelines which must be maintained, and this is a common source of errors. Huyen pointed out that because static data is bounded, it can be considered a subset of streaming data, and so can also be handled by streaming processing; thus the two pipelines can be unified. To support this, she recommended an event-driven microservice architecture, where instead of using REST API calls to communicate, microservices use a centralized event bus or stream to send and receive messages.
Once model training is also converted to a streaming process, the stage is set for continual learning. Huyen pointed out several advantages of frequent model updates. First, it is well known that model accuracy in production tends to degrade over time: as real-world conditions change, data distributions drift. This can be due to seasonal factors, such as holidays, or sudden world-wide events such as the COVID-19 pandemic. There are many available solutions for monitoring accuracy in production, but Huyen claimed these are often “shallow”; they may point out a drop in accuracy but they do not provide a remedy. The solution is to continually update and deploy new models.“”