If you want to run multiple
models with different logics at once, it makes sense to use an
orchestrated container platform like Kubernetes. It allows you to handle
multiple working streams at once.
The key benefit is scaling: If you take a typical
model, such as a fraud detection model, and the model is deployed in a
banking business process, the volume of transactions that are coming in
for scoring may be very high during the day. However, the volume of
transactions may drop during the night. So you need computational
resources that can handle both peak loads while not costing you an arm
and a leg when the load goes down.
Kubernetes is a great container orchestration platform that can take a
specific container and seamlessly scale up and down with a few
command-line inputs. For example, serving 10 instances of the banking
model on a regular day and then scaling it down to maybe 5 or 6
instances over the weekend.
Orchestration also aids with zero-downtime deployment: there are many
scenarios when your model must be operating 24 hours a day, seven days a
week, and 365 days a year, and you may have a new version of the model
that you want to update with no downtime. What Kubernetes provides by
default is a multi-deployment model, which includes blue and green
deployment (either green or blue serving the production version,
allowing you to deploy your new version to the other environment, and
then switch which production environment serves production) and canary