QoA4ML consists of a set of measurement probes, utilities and specs for supporting quality of analytics in ML and data intensive (micro)services. Especially, we focus on services and systems of services across edge-cloud continuum, which are built as a composition of (micro)services.
The design of QoA4ML specification is in language
We include different probes for measuring quality of data, computing resource performance, etc.
Developers can call many functions from a QoAClient and QoA4ML's utilities to evaluate/report ML-specific attributes (e.g., data quality, inference performance), build the quality reports, and send them to the observation services. The QoAClient can be initiated with various configurations for specifying observation server and communication protocols (e.g., messaging) in different formats (e.g., json and yaml).
QoA Reports are implemented in QoA4ML Utilities, an object supports developers in reporting metrics, computation graphs, and inference graphs of ML services in a concrete format.
Examples are in examples.
The code is in observability
QoA4ML Monitor is a component monitoring QoA for a ML model which is deployed in a serving platform.
- Monitoring Service: third party monitoring service used for managing monitoring data.
- We use Prometheus and other services: provide information on how to configure them.
- QoA4MLObservabilityService: a service reads QoA4ML specifications and real time monitoring data and detect if any violation occurs
OPA engine is used to implement the service for checking violation under qoa4mlopa
Another new engine is currently developed under rohe_ObService
- Hong-Linh Truong, Minh-Tri Nguyen, "QoA4ML -A Framework for Supporting Contracts in Machine Learning Services", The 2021 IEEE International Conference on Web Services (ICWS 2021), to appear.
- Minh-Tri Nguyen, Hong-Linh Truong Demonstration Paper: Monitoring Machine Learning Contracts with QoA4ML, Companion of the 2021 ACM/SPEC International Conference on Performance Engineering (ICPE'21), Apr. 19-23, 2021
- https://www.researchgate.net/publication/341762862_R3E_-An_Approach_to_Robustness_Reliability_Resilience_and_Elasticity_Engineering_for_End-to-End_Machine_Learning_Systems