Praedictio is developed as a cloud native application. It’s Containerbased approach helps in scaling the system easily andimplementing a cloud neutral deployment pattern which can be deployed inKubernetes.Deployment Architecture Event engine is designed to provide actionable insights fromthe predictions. The events engine provide a simple interface to defineactionable insight alerts. Events Engine Following is the Design diagram of Prediction ServicesEngine. Python Django framework will be used to implement the component ? Serverlogging? APIsecurity? Praedictiouser management? PredictionAPI request serving Prediction Services Enginehandles following functions,API GatewayPraedictio ServicesEngine To achieve process isolation,each model is managed in a separate Docker container.. Using this mechanism itis expected that performance variabilities and instability of novel and immaturemachine learning frameworks has no interference with the overall availabilityof Clipper.
The state of a model such as it’s model parameters would beprovided to the container at the time it is being initialized and the containeritself would be stateless afterwords. Hence machine learning frameworks which are identifiedas being resource intensive can be replicated over multiple machines or can begiven GPU access. Model Containers enable encapsulation of rangeof Machine Learning frameworks and Models within a single API. To add a newtype of model to Clipper, model builders only need to implement the standardbatch prediction interface. Clipper supports language specific containerbindings for Python Java and c++. Building a model container is super easy andit is done by inheriting the base container and adding the requireddependencies and encapsulating the prediction invocation with the commonwrapper function.
Model Containers It also supports several most widely used machine learningframeworks: Apache Spark MLLib , Scikit-Learn, Caffe , TensorFlow, and HTK.While these frameworks span multiple application domains, programminglanguages, and system requirements, each was added using fewer than 25 lines ofcode. Consequently, models can bemodified or swapped transparently to the application. To achieve low latency,high throughput predictions, Clipper implements a range of optimizations. Inthe model abstraction layer, Clipper caches predictions on a permodel basis andimplements adaptive batching to maximize throughput given a query latencytarget. In the model selection layer, Clipper implements techniques to improveprediction accuracy and latency. The prediction pipeline utilizes the clipper predictionserving system as the core technology. It has a model abstraction layer responsiblefor providing a common prediction interface, ensuring resource isolation, andoptimizing the query workload for batch oriented machine learning frameworks.
The first layer exposes a common API that abstracts away the heterogeneity ofexisting ML frameworks and models. Prediction Pipeline The training pipeline composesdata through the ETL engine and the models will trained. The trained models areupdated frequently.
A model repository is maintained to provide versioning ofthe models. The model is serialized and the hyper parameters and accuracies arealso logged for analyzing experiments. We will discuss the training pipelineextensively in the Alpha Release. Training PipeLineFigure 1 shows a high-level component overview andarchitecture of the machine learning platform and highlights the componentsdiscussed in the following sections: Architecture andOverviewBeta – Action EngineAlpha – Training pipeline , Admin panelMVP – Prediction Serving System and API gateway The Praedictio platforms road map has been envisioned todeliver the core components in an iterative manner.
Product Road MapOnly a small fraction of amachine learning platform is the actual code implementing the trainingalgorithm. If the platform handles and encapsulates the complexity of machinelearning deployment, engineers and scientists have more time to focus on the modelingtasks. Production-level reliability and scalability. Providing an admin andconfiguration framework is only possible if components also share utilitiesthat allow them to communicate and share assets. A Praedictio user is onlyexposed to one admin panel to manage all components.Easy-to-use configuration and tools.
Most machine learning pipelinesexecute the components in a sequential manner leading to all the components tobe re-executed with the growth of datato be fed. This becomes a bottlenecksince most of the real world use cases require continuous training. Preadictiosupports several continuation strategies that result from the interactionbetween data visitation and warm-starting options.
Continuous training. We chose to use TensorFlow andScikit Learn as the trainer but the platform design is not limited to thesespecific librarariesy.One factor in choosing (or dismissing) a machine learning platform is itscoverage of existing algorithms.
Scikit holds a wide variety of pre implementedML algorithms and TensorFlow provides full flexibility for implementing anytype of model architecture. There is a large and growingnumber of machine learning frameworks. Each framework has strengths andweaknesses and many are optimized for specific models or application domains(e.
g., computer vision). Thus, there is no dominant framework and oftenmultiple frameworks may be used for a single application. In a situation where training data growsrequirement arises for a framework with distributed training leading to changeof frameworks once selected as the best available in Machine Learning. Even though common model exchange formats hadbeen introduced in the past due to the rapid technological advancements andfact that additional errors arising from parallel implementations for trainingand serving these common message formats didn’t gain popularity.
One machine learning platform for many learning tasks. The Praedictio design adopts thefollowing principles: Platform Design andAnatomyPraedictio introduces a modulararchitecture to simplify model development and deployment across frameworks andapplications. Furthermore, by introducing caching, batching, and adaptive modelselection techniques, Praedictio reduces prediction latency and improvesprediction throughput, accuracy, and robustness without modifying theunderlying machine learning frameworks. The platform also can be Integratedwith enterprise systems, while satisfying stringent data security, privacy, orregulatory requirements. Machine Learning is growing it’spopularity in a wide spectrum of business domains to cater the need ofproviding customer focused, accurate and robust business insights.
One of thebiggest challenges in creating and maintaining a Machine Learning basedprediction system is orchestrating the Model Creation, Learning, ModelValidation and Deployment and Infrastructure Maintenance in Productionenvironment. With the high volatility of data and improvedlearning models deploying fresh models become trickier. Most machine learning frameworks and systemsonly address model training or deployment and connectivity between different components is done ad hocvia glue code or custom scripts. Praedictio integrates the aforementioned compoentsinto one platform simplifying the platform configuration and reducing time toproduction while increasing scalability. IntroductionPraedictio can run on-premise oron any cloud platform and serve highly accurate business predictions that willenable the business owners and decision makers to make timely decisions ontheir business.
Praedictio is a businesspredictions framework that provides powerful predictive analytics by analysingthe business data scattered across different repositories in an organization.Praedictio framework will enable developers and data scientists to integratedata driven ML models to business applications quickly and easily with powerfultools to aggregate data, do data modeling, training and deploying. AbstractPraedictio : Machine LearningPlatform