People | Locations | Statistics |
---|---|---|
Naji, M. |
| |
Motta, Antonella |
| |
Aletan, Dirar |
| |
Mohamed, Tarek |
| |
Ertürk, Emre |
| |
Taccardi, Nicola |
| |
Kononenko, Denys |
| |
Petrov, R. H. | Madrid |
|
Alshaaer, Mazen | Brussels |
|
Bih, L. |
| |
Casati, R. |
| |
Muller, Hermance |
| |
Kočí, Jan | Prague |
|
Šuljagić, Marija |
| |
Kalteremidou, Kalliopi-Artemi | Brussels |
|
Azam, Siraj |
| |
Ospanova, Alyiya |
| |
Blanpain, Bart |
| |
Ali, M. A. |
| |
Popa, V. |
| |
Rančić, M. |
| |
Ollier, Nadège |
| |
Azevedo, Nuno Monteiro |
| |
Landes, Michael |
| |
Rignanese, Gian-Marco |
|
Yan, Feng
University of Manchester
in Cooperation with on an Cooperation-Score of 37%
Topics
Publications (9/9 displayed)
- 2024Understanding the Surface Chemistry of SnO 2 Nanoparticles for High Performance and Stable Organic Solar Cellscitations
- 2024Use of carbon electrodes to reduce mobile ion concentration and improve reliability of metal halide perovskite photovoltaicscitations
- 2024Understanding the Surface Chemistry of SnO2 Nanoparticles for High Performance and Stable Organic Solar Cellscitations
- 2023Temperature-responsive and biocompatible nanocarriers based on clay nanotubes for controlled anti-cancer drug releasecitations
- 2023Effect of intermolecular interactions on the glass transition temperature of chemically modified alternating polyketonescitations
- 2023Effect of intermolecular interactions on the glass transition temperature of chemically modified alternating polyketonescitations
- 2022Optimizing inference serving on serverless platformscitations
- 2018p‐Doping of Copper(I) Thiocyanate (CuSCN) Hole‐Transport Layers for High‐Performance Transistors and Organic Solar Cellscitations
- 2014Physicochemical properties of 1,2,4-triazolium perfluorobutanesulfonate as an archetypal pure protic organic ionic plastic crystal electrolyte
Places of action
Organizations | Location | People |
---|
article
Optimizing inference serving on serverless platforms
Abstract
<jats:p>Serverless computing is gaining popularity for machine learning (ML) serving workload due to its autonomous resource scaling, easy to use and pay-per-use cost model. Existing serverless platforms work well for image-based ML inference, where requests are homogeneous in service demands. That said, recent advances in natural language processing could not fully benefit from existing serverless platforms as their requests are intrinsically heterogeneous.</jats:p><jats:p>Batching requests for processing can significantly increase ML serving efficiency while reducing monetary cost, thanks to the pay-per-use pricing model adopted by serverless platforms. Yet, batching heterogeneous ML requests leads to additional computation overhead as small requests need to be "padded" to the same size as large requests within the same batch. Reaching effective batching decisions (i.e., which requests should be batched together and why) is non-trivial: the padding overhead coupled with the serverless auto-scaling forms a complex optimization problem.</jats:p><jats:p>To address this, we develop Multi-Buffer Serving (MBS), a framework that optimizes the batching of heterogeneous ML inference serving requests to minimize their monetary cost while meeting their service level objectives (SLOs). The core of MBS is a performance and cost estimator driven by analytical models supercharged by a Bayesian optimizer. MBS is prototyped and evaluated on AWS using bursty workloads. Experimental results show that MBS preserves SLOs while outperforming the state-of-the-art by up to 8 x in terms of cost savings while minimizing the padding overhead by up to 37 x with 3 x less number of serverless function invocations.</jats:p>