Last year I began toying around with scikit learn, I'm far from a statistician though - and my aim was largely to understand the workflows required to support machine-learning work. It was fun and I learnt a lot, but I was still intimidated by one particular idea: putting a model in to production.
The stateful nature of an ML model sounded like a nightmare, and work commitments changed so I barely gave it much thought... until now. So lets look at how I'd design a deployable machine learning model!
In this example we're going to write a service that's capable of making predictions based upon the Wisconsin Breast Cancer Database; a readily available dataset that's commonly used for experimenting with machine-learning techniques.
The API will be defined using OpenAPI, and implemented using the connexion library for Python. This drastically reduces the amount of boilerplate required, and comes with validation and failure handling out of the box.
Simple BDD style (i.e. Gherkin) tests will be written against the API to verify the overall functionality, such as: