Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms (2024)

\DocumentMetadata

Serhii Ivanenko0000-0002-4961-2679serhii.ivanenko@tecnico.ulisboa.ptINESC-ID/Técnico, ULisboaLisbonPortugal,Jovan Stevanovic0009-0008-7715-977Xjovan.j.stevanovic@oracle.comOracle LabsBelgradeSerbia,Vojin Jovanovic0009-0002-4233-2401vojin.jovanovic@oracle.comOracle LabsZurichSwitzerlandandRodrigo Bruno0000-0003-1578-5149rodrigo.bruno@tecnico.ulisboa.ptINESC-ID/Técnico, ULisboaLisbonPortugal

Abstract.

Serverless is an attractive computing model that offers seamless scalability and elasticity; it takes the infrastructure management burden away from users and enables a pay-as-you-use billing model. As a result, serverless is becoming increasingly popular to support highly elastic and bursty workloads. However, existing platforms are supported by bloated virtualization stacks which, combined with bursty and irregular invocations, leads to high memory and latency overheads.

To reduce the virtualization stack bloat, we propose Hydra, a virtualized multi-language serverless runtime capable of handling multiple invocations of functions written in different languages. To measure its impact in large platforms, we build a serverless platform that optimizes scheduling decisions to take advantage of Hydra by consolidating function invocations on a single instance, reducing the total infrastructure tax. Hydra improves the overall function density (ops/GB-sec) by 4.47 $\times$ on average compared NodeJS, JVM, and CPython, the state-of-art single-language runtimes used in most serverless platforms. When reproducing the Azure Functions trace, Hydra reduces the overall memory footprint by 2.1 $\times$ and reduces the number of cold starts between 4 and 48 $\times$ .

1. Introduction

Serverless emerged as a new attractive computing model in which applications are composed of lightweight and fast-executing snippets of code commonly referred to as functions. Serverless platforms offer great scalability and elasticity, taking the infrastructure management burden away from users, and enabling a pay-as-you-use billing model where users pay only for the time the function is running(catro:2019, ; smith:2021, ). As a result, serverless is becoming increasingly popular to deploy elastic and bursty workloads such as video and image processing(fouladi:2017, ), Machine Learning(carreira:2019, ), linear algebra(shankar:2020, ), data analytics(perron:2020, ; mueller:2020, ), and web-apps/microservices(gan:2019, ). However, as the utilization of serverless grows, platforms are faced with high memory overheads and long tail latencies. We attribute these to a combination of two factors:

Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms (1)

1. Serverless runs on bloated virtualization stacks. A container (e.g., Docker(docker, )) or a Virtual Machine (e.g., QEMU(qemu, ), Firecracker(agache:2020, )), and a language runtime (e.g., NodeJS, CPython, or JVM) running on top of it is the most commonly used virtualization stack. For each invocation being handled by the platform, a separate virtualization stack is required. State-of-the-art container/VM implementations require a long startup time and a high memory footprint compared to the average mean function invocation latency and memory footprint(sharad:2020, ) (see Figure1). The same applies to language runtimes such as CPython, JVM, and NodeJS, the three most popular native language implementations that, according to a recent study(newrelic, ), account for approximately 90% of the total number of function invocations in Amazon Lambda(lambda, ). These runtimes were developed to host long-running applications and, therefore, employ effective but time and resource-consuming techniques to optimize for long-term performance(carreira:2021, ). The combined memory footprint and startup latency of container/VM and runtime can easily dominate the total memory footprint and startup latency for lightweight serverless functions.

2. Serverless workloads are bursty. Such workloads include periods of very low activity with only a few invocations, followed by periods of higher activity with multiple function invocations(sharad:2020, ). Figure2 shows how sparse the invocations of each function are; only 11% of the functions will receive on average at least 1 invocation per minute, meaning that most virtualization stacks are highly under-utilized. As a result of these factors, platforms resort to a tradeoff between high memory overheads to cache virtualization stacks (waiting for the next invocation), and long tail invocation latencies inflicted by the cost of launching a new virtualization stack (cold start). Figure3 depicts the effect of caching functions that have been executed recently. Due to the sparsity of invocations of each function, the number of cached functions is much higher than the number of invocations.

Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms (6)

Checkpoint/Restore(shin:2022, ; ustiugov:2021, ; silva:2020, ) (C/R) and runtime forking(akkus:2018, ; oakes:2018, ) have been extensively studied as a way to reduce memory footprint and tail latency. However, while both C/R and runtime forking reduce memory and startup latency by sharing a common stack (see Figure1), both techniques only optimize invocations of a single function requiring separate virtualization stacks when multiple functions are invoked. Other techniques supporting lightweight sandboxes(dukic:2020, ) still only allow single function invocations to co-execute on the same runtime, therefore requiring separate virtualization stacks for different functions.

We propose Hydra ¹¹1Hydra is a Greek mythology creature with multiple heads that we use as an analogy for the multi-function and multi-language support., a runtime capable of sharing a single virtualization stack across concurrent invocations of multiple functions, even if written in different languages. Hydra supports both AOT-compiled and JIT-compiled functions, and function invocations are executed in a lightweight sandbox that can be snapshotted and restored. Opposed to previous VM or process C/R that snapshot the entire system or process, Hydra supports C/R of a single sandbox, allowing cached state and compiled code of a single function to be restored at any point.

On a number of established serverless function benchmarks(dukic:2020, ; copik:2021, ), we demonstrate how Hydra’s lightweight sandboxes increase function density (ops/GC-sec) in single-function scenarios compared to VM C/R, process forking, and running functions in separate stacks. To study the impact of Hydra on real-world platforms, we built a platform that schedules invocations of each tenant to the same Hydra instance. We show that a tenant-oriented scheduling algorithm significantly reduces overall memory consumption by 2.1 $\times$ and reduces the number of cold starts by 4/48 $\times$ when reproducing a public serverless trace(sharad:2020, ).

Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms (7)

In summary, our contributions are the following:

•
we demonstrate that serverless cache utilization can be improved by re-using the same virtualization stack for different functions (section1);
•
we build the first (to the best of our knowledge) serverless runtime capable of sharing a single virtualization stack across multiple functions written in different languages while offering C/R support for individual sandboxes (section3);
•
we build a serverless platform that incorporates a tenant-aware scheduling algorithm to show how Hydra could be deployed at scale (section4);
•
we measure the performance impact of Hydra both at a micro-scale by comparing its density with single-language runtimes, and also at a macro-scale by measuring its impact while replaying the Azure Functions trace on a realistic platform (section5).

2. Background

Serverless platforms inherit much of the virtualization techniques that were initially designed to support long-running monolithic applications and microservices. We argue that this infrastructure is not a good fit for lightweight, short-running, and bursty serverless functions.

2.1. Traditional Virtualization Stacks

Existing virtualization stacks can be divided into two main groups: system-level virtualization, Virtual Machines (such as Xen(barham:2003, ), QEMU(qemu, ), and Firecracker(agache:2020, )), and Operating System-level virtualization, containers (such as Docker(docker, ) and gVisor(gvisor, )). VMs and containers virtualize the hardware and the Operating System, respectively. These technologies were designed to host long-running applications with high memory footprint demands that do not match most serverless functions(sharad:2020, ). As a consequence, the slow start and high memory demands of such virtualization stacks easily dominate the function startup time and memory consumption.

There have been multiple attempts to fit traditional virtualization systems into serverless. Runtime reuse is a popular strategy to reduce the number of runtime startups by keeping a serverless worker (virtualization stack composed of a VM/container and a runtime) alive after a function invocation finishes(aws:warmstart, ; azure:warmstart, ; wang:2018, ). If another invocation is received within a timeout limit, the worker is reused and no startup/initialization is required. A warm start refers to an invocation that is served by an already existing worker as opposed to a cold start which requires starting a new worker. Runtime reuse has, however, several limitations. First, it does not reduce resource redundancy as multiple concurrent requests still need to be served by independent workers. Second, only a small portion of serverless functions will benefit from runtime reuse as the vast majority of functions have very sparse invocations (see Figure2). Finally, it creates additional memory pressure to keep workers in memory after an invocation finishes (see Figure3).

Snapshotting and forking have also received much attention as a promising technique to reduce the memory footprint and startup latency of serverless functions. Restoring from a previous snapshot significantly reduces the startup time and memory footprint compared to a cold start(du:2020, ; ustiugov:2021, ; cadden:2020, ) but still requires a separate virtualization stack for each concurrent invocation and imposes storage/network overhead to manage snapshots(ao:2022, ). Similarly, runtime forking(oakes:2018, ; akkus:2018, ) also reduces both memory footprint via copy-on-write and startup time by simply forking instead of launching a completely new virtualization stack. However, state-of-the-art runtimes such as CPython, NodeJS, and JVM do not support forking out-of-the-box. For example, Garbage Collector and Just-In-Time compiler threads do not survive the forkin and require careful revival in the child process.

2.2. Lightweight Execution Environment

Allowing multiple functions to run in a single runtime reduces the startup latency and memory footprint of serverless functions by using a single virtualization stack. However, it requires enforcing resource isolation. CPU, file system, and network isolation can be guaranteed using Operating System-level primitives such as Linux Namespaces and Control Groups(cgroups, ). Memory isolation requires function sandboxing, which can then be further strengthened with hardware features such as MPK (mpk, ; libmpk, ; erim, ; kotni:2021, ) and CHERI(woodruff:2014, ; vasily:2022, ). In this paper, we focus on how to sandbox functions efficiently.

There are two main widely available sandboxing techniques: WebAssembly and Memory Isolates. WebAssembly(haas:2017, ) is a binary-code format with memory safety guarantees that resulted from the evolution of previous Software Fault Isolation techniques such as NaCL(nacl, ). These guarantees are enforced by constraining memory access to a single linear byte array with efficient bounds checks during compilation and run-time. Runtimes supporting the execution of WebAssembly functions could use it to run multiple functions in a memory-isolated environment(shillaker:2020, ). However, for functions that are interpreted (such as Python), the function code would still have to include an interpreter (CPython, for example), re-introducing resource redundancy as invocations do not share the same interpreter(shillaker:2020, ).

Language runtimes, such as V8(v8, ) and GraalVM Native Image(graalvm, ), offer Memory Isolates, a memory segment that can be used to host the execution of a function. Similarly to WebAssembly sandboxes, an isolate is a linear memory segment used to keep application objects. References are resolved within the linear memory segment and the sandbox limits are enforced by the compiler and runtime. Isolates are, however, restricted to executing functions written in the language supported by the runtime, JavaScript for V8, and Java for GraalVM Native Image.

2.3. Multi-Language Support

Executing functions developed in different languages on top of a single runtime instance is also a key ingredient to reducing virtualization stack redundancy. WebAssembly accomplishes this by allowing multiple languages to target its binary-code format. On the other hand, memory isolates alone cannot offer multi-language capabilities on itself. An alternative approach is enabled by using Truffle(wuerthinger:2013, ), a language implementation framework that allows developers to quickly write interpreters for a specific language. Truffle interpreters are then automatically processed and optimized, taking full advantage of Truffle’s underlying JIT compiler. Truffle is supported by recent advances in language runtime and compiler literature(wimmer:2019, ; grimmer:2015a, ; latifi:2021, ; grimmer:2015b, ; zhang:2014, ; grimmer:2014, ; wuerthinger:2013, ; wimmer:2012, ; wuerthinger:2017, ; wuerthinger:2013, ; larisch:2018, ) and is designed to efficiently execute dynamic languages such as Python, JavaScript, and Ruby but also has support for LLVM bitcode(rigger:2016, ) and even WebAssembly(salim:2020, ). Truffle allows functions written in different languages to execute in a Java runtime. However, Truffle code execution greatly suffers from long initialization times and high memory overhead (section5.1.3), which easily nullifies any benefit obtained by virtualizing the runtime.

3. Virtualized Multi-Language Runtime

Hydra is a serverless runtime capable of running multiple functions in isolation by confining each function inside a lightweight sandbox. Figure4 depicts a Hydra instance with 3 active sandboxes of 2 different functions. Each sandbox runs invocations of a single function, one invocation at a time. At startup, Hydra opens a network socket to serve REST requests to register, deregister, and invoke functions (section3.1). Upon receiving an invocation request, Hydra loads the function into memory if not already loaded, creates a new sandbox if no warm sandbox is available, and invokes the function inside the sandbox. Sandboxes of the same function share the same code cache and the initial heap state (section3.2). Hydra introduces a function framework that automatically uses Truffle (section3.3), and supports launching new sandboxes or restoring one from a previous checkpoint (section3.4).

3.1. Function Interface

Hydra exposes a minimal canonical interface to the outside through a single end-point. Only three methods are required, function registration, invocation, and deregistation:

{minted}

[frame=lines,framesep=2mm,baselinestretch=1.2,fontsize=,]javaboolean regFunction(byte[] code,String functionID,String functionEntrypoint);String invkFunction(String funID, String arguments);boolean delFunction(String fuctionID);

{listing*}

[!ht]{minted}[frame=lines,framesep=2mm,baselinestretch=1.2,fontsize=,]c

int create_isolate(create_isolate_params_t* params, isolate_t** isolate);int attach_thread(isolate_t* isolate, isolatethread_t** thread);int copy(isolate_t* isolate, char* buffer, long size, handle_t** handle);int invoke_function(isolate* guest, isolate* host, handle_t* entrypoint, handle_t* arguments, handle_t** result);int detach_thread(isolate_t* isolate, isolatethread_t* thread);int tear_down_isolate(isolate_t* isolate);

Hydra Sandbox ABI in C.

This function interface is heavily inspired by existing function interfaces available in both commercial (Amazon Lambda(lambda, ), Azure Functions(functions, ), and Google Cloud Functions(gcf, )), and open-source Serverless platforms (such as OpenFaaS(openfaas, ), OpenWhisk(openwhisk, ), kative(knative, )). We expect that Hydra requires minor to no function source code modifications to run existing serverless functions in Hydra.

3.2. Function Sandboxes

Hydra sandboxes are based on GraalVM’s Native Image(wimmer:2019, ) isolates which we extended to support on-demand sandbox creation, payload copying, function invocation, and checkpoint/restore. Native Image is an Ahead-Of-Time Java compiler that we use to compile user functions into shared libraries. During the Native Image compilation process (from Java bytecode into a shared native library), we introduce compiler extensions that inject additional Application Binary Interface (ABI) functions into the final shared library.

Upon receiving a request to invoke a function, Hydra loads the function shared library using dlopen, if not already loaded. The shared library loads into memory the function code cache and the initial heap state (see Figure4). The code cache contains the compiled user code that will be invoked through the registered entrypoint and the initial heap state contains a set of global values which are copied into the isolate memory heap upon modification(wimmer:2019, ) (copy-on-write). Both the code cache and the initial heap state are shared among all memory isolate sandboxes of a single function. Different functions have separate code caches and isolate heap states.

Listing3.1 shows the most relevant functions exposed by shared libraries. The create_isolate creates an isolate sandbox and returns a pointer to the new isolate (through the second function argument). Once an isolate is created, threads can be attached to it through the attach_thread function. This function creates a new thread stack in the given isolate and returns a pointer to the thread data structure (which can be later used to detach the thread from the isolate by calling detach_thread). Note that threads can be created in any isolate and will start executing in the same isolate as the parent thread. To copy the invocation arguments and return values, we use the copy function that copies a serialized buffer of characters into a destination isolate and returns a handle to it. This handle can later be used to retrieve the copied values once inside the destination isolate.

Functions are invoked through invoke_function, passing the target isolate where the request will be handled, the isolate where the result value should be copied into (Hydra isolate), the entrypoint, and arguments handle (previously copied with copy). The function returns a handle that can be used to retrieve the result value back in Hydra.

3.3. Multi-Language Support in Hydra

To run functions written in languages other than Java, Hydra takes advantage of Truffle(wuerthinger:2017, ), a Java-based language implementation framework. We extend the Truffle framework to i) transparently load and configure the correct Truffle language, ii) load the function code into the Truffle engine, and iii) integrate it with Hydra’s sandbox ABI (introduced in Listing3.1). We AOT compile the Truffle language implementation bundled together with the function code into a dynamic library that is then registered into Hydra. The resulting library supports Hydra’s sandbox ABI. The function entry point jumps to a call gate injected by us that enforces the correct translation of Java types into the target language types (used to pass invocation arguments and return values).

Truffle language implementations are interpreted and Just-In-Time (JIT) compiled. As a consequence, the code cache will be updated by the Truffle JIT compile that comes built-in the shared library. Since multiple isolate sandboxes of the same function share the code cache, the optimized JIT code will be shared across all active isolate sandboxes for functions written in Truffle languages.

3.4. Function Snapshotting

Function sandboxes will become bloated when JIT-compiled languages are utilized (via Truffle). To reduce the startup time and memory footprint of these sandboxes, we propose snapshotting function sandboxes after the first initial requests (for example, when a sandbox is to be shut down due to inactivity). The snapshot needs to include not only the memory heap but also the code cache and initial heap state. This way, all the code interpretation and JIT compilation can be bypassed when the sandbox is restored.

Snapshotting a single sandbox is not trivial as it requires saving all memory regions and resources that pertain to a single sandbox. Taking CRIU(criu, ) as a reference, C/R is done at the process level by inspecting the entire set of resources utilized by a process and dumping them to one or more files to be later used to restore the entire process. In Hydra, this is not possible as we want to snapshot a single sandbox, and it is not possible to determine which memory segments and files belong to a particular sandbox at checkpoint time. Moreover, Hydra does not control where and how the sandbox memory heap grows, nor where the code cache lives, as this would require invasive changes to the Native Image and Truffle.

To overcome this challenge, Hydra tracks the execution of a function invocation deemed for checkpoint. This is done by intercepting system calls and tracking their output. By compiling all the resources utilized by the sandbox, it is possible to dump its state and restore it later. Our current implementation focuses on tracking memory mappings and open files, which we found to be sufficient to support functions in Hydra. Nevertheless, the proposed design is generic and can be easily extended to monitor more system calls and checkpoint/restore resources other resources.

Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms (9)

3.4.1. Tracing and Dumping a Function

Upon checkpointing a function, Hydra attaches a seccomp-bpf(seccomp, ) filter to the thread handling the function invocation. After installing the filter, the thread loads the function library (dlopen), creates an isolate, and invokes the function code. During this entire process, another thread stays behind receiving system call notifications. We monitor system calls related to memory (such as mmap, munmap, and mprotect) and files (such as openat, dup, and dup).

Upon receiving a system call notification, the monitor thread registers the arguments and the return value. This information is kept in a list in memory. In addition to keeping track of the received system calls, the monitor thread also keeps in memory the current state of memory mappings and open file descriptors belonging to the target sandbox.

For memory-related operations, the monitor keeps a list of memory mappings including not only the range but also their current permission and if the memory in that mapping was ever writable. We keep this information at the page granularity level. A similar table is kept for file-related operations, tracking which file descriptors are in use.

The monitor thread waits until the monitored thread exists the function code and detaches from the isolate to proceed with the checkpoint operation. At this moment, the monitor starts by iterating over the list of system calls to remove unnecessary entries. For example, a file may be opened, its contents read into memory, and then closed. In this case, we do not need to keep the open and close system calls as we are only trying to restore the final stage of execution of the function code. The same principle can be applied to memory operations where mmap system calls whose return address was already unmmaped can be safely ignored. The final list of system calls (including arguments and return values) is then written to a file used during the restore operation.

The second checkpoint step is to iterate over all memory mappings and copy the memory contents to a file. We can also optimize this process by copying only the mappings that could have been modified (i.e., write permission was enabled at the point in time). Note that all the read-only memory will be restored when the system calls that loaded that memory are replayed. Two files result from a checkpoint operation: i) a file containing the set of system calls that need to be replayed upon a restore, and ii) the memory contents that need to be restored.

3.4.2. Restoring a Function from a Snapshot

Hydra can restore a function sandbox (including the code cache and the initial heap state) from the snapshot produced during a checkpoint operation. Hydra starts by loading the file containing the system calls that need to be replayed. For each system call, the restoring thread replays the system call and confirms that the output is the one expected. Note that for mmap system calls we force address to be the same as the one returned by the original mmap system call by passing the target address and the MAP_FIXED flag. After restoring all system calls, memory contents and open files should match the state of the original state.

After this initial step, the restoring thread mmaps the memory mappings file contents to the expected memory positions. This technique significantly speeds up the restoring operation at the expense of some additional page faults during the first invocation of the function (note that this could be further optimized by pre-fetching the pages estimated to be necessary as done in previous work(ustiugov:2021, ).

3.4.3. Supporting Multiple Functions

The C/R mechanism described so far can easily fail if two function snapshots happen to require the same virtual address range. This can happen, for example, if two calls to mmap return the same address in two different executions of Hydra where each function was individually checkpointed. Without a reliable mechanism to address this issue, restore operations can easily fail if the memory is already occupied.

3.5. Forked Function Sandbox

Hydra also supports creating sandboxes with stronger isolation by handling function invocations in sub-processes. Such sandboxes are necessary to isolate functions that execute unmanaged code, i.e., code that executes outside the sandbox and that may not be ready to receive multiple concurrent invocations. For example, multiple functions executing managed code (Python code for example) can have their execution properly contained using memory isolates as the runtime controls in which memory heap objects are allocated, which threads belong to which isolate, etc. If a particular thread invokes a native method, the isolation guarantees enforced by the runtime no longer apply and the thread may be able to access memory that belongs to other isolates. Furthermore, external libraries may also contain state that is not properly isolated across concurrent function invocations. Deciding if a function should execute in a forked sandbox is outside the scope of this work but could be done by checking if a function includes native dependencies at compile or deployment time.

Functions ran in forked sandboxes are initially loaded into the parent Hydra process. This ensures that both the code cache and the initial heap state are shared in a copy-on-write manner with all sandboxes deployed in child processes. Upon creating a new sandbox, Hydra invokes the Native Sandbox Interface (NSI in Figure4); if process-isolation is enabled, a forked process is created. To communicate between the parent and child processes we use two pipes (returned by pipe): one to send invocation requests from the parent to the child, and one to receive the invocation result from the child to the parent. Both pipes are created right before forking. Upon creating a new Fork sandbox, the child process closes parent file descriptors (e.g., the file descriptor used to receive REST requests) and resets signal handlers. Then it creates a new sandbox isolate and finally blocks the reading pipe waiting for arguments for a function invocation. The parent side simply creates a process handle with references to both pipes.

4. High-Density Serverless Platform

In addition to Hydra, we also built a simple serverless platform to evaluate how co-location could affect scheduling policies and deployment density. The top-level component of our platform is the scheduler, which acts as an entry point for the platform. The scheduler controls worker nodes running Node Managers, which in turn manage Hydra instances capable of executing the actual function code. Figure6 shows the high-level architecture of the serverless platform.

4.1. Node Manager

The node manager governs Hydra instances in a local node. It is responsible for managing the lifetime of Hydra instances, scheduling invocations across those instances, and forwarding invocation requests to them. When scheduling invocations, the node manager attempts to prioritize co-locating invocations coming from the same tenant in a single Hydra instance to benefit from sharing runtime components and avoiding cold starts. If the instance cannot accommodate another incoming invocation, a new instance is picked to serve this invocation. After invocation execution finishes, the node manager keeps the Hydra instance alive for a configurable amount of time to maintain it warm for a potential subsequent invocation.

Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms (10)

Instead of creating a Hydra instance on-demand right before serving an incoming invocation, the node manager fetches a Hydra instance from a fixed configurable pool of pre-allocated Hydra instances. The node manager is responsible for maintaining this pool. A periodic background job monitors the state of this pool to prevent the pool from complete depletion. If the pool load reaches 30% of the initial capacity, this job reclaims 15% of least-recently-used warm Hydra instances to replenish the pool with fresh instances. The rest of the warm Hydra instances remain untouched and ready to serve future invocations. Such an approach reduces the chances of experiencing a cold start on a critical path of a function invocation, thus reducing the tail latency of the function requests.

4.2. Scheduler

The scheduler governs a fixed number of nodesand keeps track of the state of each node to make appropriate scheduling decisions and balance the load. The state of a node includes up-to-date information about the node’s active invocations and registered tenants and functions. The scheduler is aware of the Hydra’s co-location capabilities. Instead of merely balancing the load, it tries to group functions coming from the same tenant in the same nodes to make the best use of Hydra at the node level.

The scheduler estimates the memory utilization of each node and ensures that by sending a new invocation into a node, it will not exceed the node’s memory limit. The platform supports three main runtime modes: tenant-based co-location (default for Hydra), single-function co-location (only invocations of the same function can be co-located), and no co-location. For each mode, memory utilization is calculated in a different way.Due to space constraints, we focus on tenant-based co-location mode, which can be calculated as follows:

mem_{node}=(\sum_{t}^{T}\lceil\sum_{f}^{F}(size_{f}\times inv_{f})/size_{i}%\rceil)\times size_{i}

where

$mem_{node}$ :	memory consumption of the node
$size_{f}$ :	maximum memory for the function f
$inv_{f}$ :	number of active invocations of the function f
$size_{i}$ :	maximum memory footprint of the instance

First, the overall memory consumption of active functions for each tenant t is calculated. Then, this value is divided by the size of the Hydra instance and rounded up to get the number of Hydra instances for each tenant t. Finally, the total number of all tenants’ instances is multiplied by the size of the instance to get the node’s memory consumption.

5. Evaluation

This section studies the performance of Hydra and compares it to other state-of-the-art runtimes. We start by measuring the density of different runtimes using a variety of serverless benchmarks (section5.1). Then, we replay a real serverless invocation trace on a local platform (section5.2) to assess the platform-wide impact of Hydra and our tenant-based scheduling algorithm. Table1 lists the benchmarks used to evaluate Hydra. We combine benchmarks from Sebs(copik:2021, ) (Python and JavaScript benchmarks) and Photons(dukic:2020, ) (Java benchmarks).

Language	Benchmark
JavaScript (js)	helloworld (hw), dynamic-html (html), upload, thumbnail
Python (py)	helloworld (hw), dynamic-html (html), thumbnail, upload, compress, video-processing (video), minimum spanning tree (mst), breadth-first search (bfs), pagerank (pr), dna
Java (jv)	helloworld (hw), filehashing (hash), restapi (rest), video-processing (video), classify

5.1. Function Sandbox Density

Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms (11)

Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms (12)

This experiment analyzes the density of four different types of serverless runtimes:

•
OpenWhisk(openwhisk, ) runtimes used to handle Java (JVM), JavaScript (NodeJS), and Python (CPython) invocations. These runtimes are state-of-the-art, native language implementations. Each of these runtimes can handle invocations of a single function, one invocation at a time;
•
NIFT, a variant of Hydra that does not allow multiple functions to execute in the same instance. This baseline mimics what is possible to achieve today by combining GraalVM Native Image and Truffle. Similarly to OpenWhisk runtimes, this variant can handle invocations of a single function, one invocation at a time;
•
Forking, a variant of Hydra inspired by SAND(akkus:2018, ). In this system, multiple invocations of a single function can be served concurrently. Each invocation will be handled in a fork of the parent process that pre-loaded the function code. We use this runtime to compare the overheads of process forking compared to isolate sandboxes;
•
Hydra, capable of handling multiple function invocations from different functions at the same time. We limit Hydra to use isolate memory sandboxes.

Our goal is to measure how much throughput can be produced on a single core with 2 GBs of memory (we use a 1:2 core-to-memory ratio inspired by the ratio used in Amazon Lambda(lambda, )). Our target metric, density, is defined as the ratio of throughput per memory (ops/GB-sec). Since several benchmarks are I/O-bound (e.g., jv/hash, js/uploader, py/hw), we allow concurrent requests to share those resources to improve efficiency. To do so, we add additional concurrent requests until the efficiency of the runtime stops increasing. For OpenWhisk and NIFT, handling multiple requests concurrently means having several instances running with a fraction of a core and a fraction of 2 GBs. For Hydra, and Forking, we use a single instance with multiple sandboxes.

Experiments are conducted on a single cluster node running Linux kernel 5.15.0 equipped with an Intel(R) Xeon(R) Gold 532 and 188GB of DDR4 DRAM. CPU frequency scaling, hyper-threading are disabled, and runtimes are pinned to a single core. Each runtime runs inside a Docker container with the set limits in terms of CPU and memory. Throughput is measured using Apache Bench,²²2https://httpd.apache.org/docs/2.4/programs/ab.html and memory represents sum of the RSS of the runtime (and children processes if any). Results are presented in Figure7. Each bar represents an average of 5 iterations where each iteration includes 25 requests.

5.1.1. Java Benchmarks

Hydra presents higher efficiency compared to other runtimes. Compared to OpenWhisk, for Java IO-bound benchmarks (hw, hash, and rest), Hydra reduces the number of runtime instances to one, therefore significantly reducing the memory footprint. In addition, CPU multiplexing improves by having a full core shared across several isolates compared to having multiple quotas of a core for each container.Forking has the closest performance compared to Hydra. The slight density degradation comes mostly from frequent context switching and the increased memory footprint that comes from duplicated pages in the child process.

Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms (13)

5.1.2. JavaScript and Python Benchmarks

Similarly to Java benchmarks, Hydra is a system with higher throughput and lower memory footprint in most benchmarks. It allows multiple isolates to share the same Truffle JIT compiler and benefits from sandbox snapshotting to bypass JIT-compilation overheads. Hydra is followed by Forking and by NIFT. In several Python benchmarks, however, OpenWhisk registers higher density. After analyzing these runs we conclude that these results stem from some performance problems from Truffle’s Python implementation that is still in experimental mode. These problems lead mainly to low throughput and high memory footprint in particular in py/html, py/thumbnail, py/compress, and py/video.Overall, Hydra improves the average density by 4.47 $\times$ compared OpenWhisk runtimes.

5.1.3. Function Snapshotting Impact on Density

We now evaluate the impact of function snapshotting in Hydra by comparing the latency and memory footprint required to handle a cold start (launching a new sandbox and invoking the function). Note that for sandboxes created from scratch, this may require code interpretation and JIT-compilation.

Results are depicted in Figure8. Bot plots include Cold (vanilla sandboxes that suffer from a cold start), and Snap (restored sandboxes that restore the execution past a cold start). The latency in Snap includes both the restore latency and the function request latency. We omit some benchmarks (such as jv/video, jv/classify) that require process forking, something that our snapshotting mechanism does not support yet. Also note the log-scale on both y-axis. Overall, snapshotting reduces the latency to serve the first request by 36.62 $\times$ and reduces the memory footprint by 3.15 $\times$ .

5.2. Real-World Experiment

So far, we have analyzed the performance of different serverless runtimes when executing a single function. However, in real environments, platforms deal with concurrent requests from different functions. To realize such an environment, we take advantage of the public Azure Functions trace(sharad:2020, ). The trace provides data about function invocations, including the function identifier, the duration, the memory footprint, and the tenant associated with the function.

5.2.1. Top-Level Scheduling Setup

Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms (14)

Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms (15)

Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms (16)

As stated in section4.2, the top-level component of the platform schedules requests in an attempt to group functions from the same tenant in the same node. To show that grouping tenants is more beneficial when using Hydra compared to a traditional round-robin scheduler, we simulated the execution of the Azure trace for 10 minutes with 200 nodes; each node has the capacity for running concurrent 40 Hydra instances, and each instance can co-locate eight parallel invocations. Figure9 depicts the difference between the Hydra-aware and round-robin scheduling with regard to the number of cold starts and the memory consumption by the active invocations across all nodes. We can observe that the number of cold starts has decreased by 2.8 times, and the memory footprint of active invocations has decreased by 3.6 times on average.

According to the Azure Functions study(sharad:2020, ), a minority of functions generates a vast majority of all invocations. Therefore, with the tenant-grouping scheduling, the distribution of invocations and registered functions across the nodes of the serverless platform may be non-uniform, as some functions generate more load than others. Figure10 demonstrates that the nodes with the smaller indexes tend to have few functions registered, albeit these functions get more invocations than others. The workers with the bigger indexes observe the opposite situation: they tend to have more different functions registered, but these functions are invoked infrequently.

5.2.2. Reproducing a Real-World Trace

We select a 1-hour window that best represents the average user activity in serverless. We reproduce this segment on a local cluster machine running Ubuntu 18.04.6 LTS (Linux kernel 5.8.5-050805) equipped with an Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz and 128GB of DDR4 DRAM. CPU frequency scaling and hyper-threading are disabled. As the trace does not describe the actual function code executed, we use a Java-based function that performs hashing of the input data.

In this experiment, we replay the same serverless invocation trace with four execution modes: Hydra (co-locating all functions from the same tenant), Hydra single-function (co-locating only parallel invocations of the same function), Hydra single-invocation (not co-locating invocations), and the OpenWhisk(openwhisk, ) runtime. With the Hydra and Hydra single-function modes, each node of the serverless platform runs up to 40 Hydra instances, each instance can co-locate up to 8 concurrent isolated invocations, and each instance uses up to one core and two GBs of memory. With the Hydra single-invocation and OpenWhisk modes, each node runs up to 320 Hydra or OpenWhisk instances respectively, and each instance uses up to 1/8 of a core and 256 MBs of memory.

Results in Figure11 include the number of active instances (top left), the total memory consumption of the node (bottom left), the number of hot (picking a previously used instance), warm (fetching an instance from the pool described in section4.1), and cold (launching a new instance) starts of the instances (top right), and the user request latency (bottom right). The left part of each plot shows the results of the trace execution from the node that has few frequently invoked functions (Node #1), and the right part shows the results from the node with hundreds of infrequently invoked functions (Node #2).

Overall, by being able to consolidate invocations of different functions in Hydra instances, it significantly reduces the number of active instances without degrading throughput, leading to memory reduction of up to 2.1 times on average compared to OpenWhisk in both nodes. In Node #2, Hydra single-function mode requires more instances than plain Hydra since the latter mode offers wider opportunities for co-location. The memory footprint of the Hydra single-invocation mode is comparable to the OpenWhisk execution as these modes use the same number of instances, but Hydra instances have a smaller footprint.

In Node #1, plain Hydra and Hydra single-function have similar opportunities to co-locate as there are only a few popular functions. Thus, these modes experience fewer cold starts compared to the execution modes that do not support co-location. However, since Node #2 has hundreds of infrequently invoked functions registered, the 40 Hydra instances are not fully utilized and are constantly re-created to accommodate all functions and tenants. The lower number of launched stacks results in reduced tail latency. Compared to OpenWhisk, Hydra reduced the number of cold starts by 48 times in Node #1 and by four times in Node #2. Figure11 shows that user request latency is reduced by Hydra in both nodes.

5.3. Discussion

Hydra presents key advantages compared to other runtimes. First, it deploys AOT-compiled function code, which is faster to boot, has no warmup time, and requires less memory footprint (section5.1.1). It also allows Truffle languages to reuse profiles and optimize code across invocations, further reducing redundancy to profile and optimize code (section5.1.2). Finally, it can host multiple invocations concurrently, ensuring that the virtualization infrastructure is not duplicated, resulting in lower memory utilization and fewer cold starts (section5.2).

6. Related Work

Runtime Forking and Snapshotting have been widely studied in the context of Serverless. SAND(oakes:2018, ), SOCK(akkus:2018, ) rely on process forking to minimize cold start invocation latency and reduce memory footprint by sharing runtime libraries. Other systems such as FaaSnap(ao:2022, ), SEUSS(cadden:2020, ), Catalyzer(du:2020, ), vHive(ustiugov:2021, ), Prebaking(silva:2020, ), Fireworks(shin:2022, ), and Medes(saxena:2022, ) proposed different snapshotting techniques to reduce cold start latency and memory footprint.

Hydra aggregates multiple function invocations inside lightweight sandboxes in a single virtualization stack. Hydra allows applications to use process-level isolation, but can also support in-process isolate sandboxes which are faster to launch, use less memory, and deliver higher throughput (section5.1). Snapshotting is orthogonal to Hydra’s contributions and it can be used together with Hydra (as we do in section5.1) to reduce the cold start latency of a new Hydra runtime. Snapshotting alone, however, does not reduce resource redundancy as multiple virtualization stacks are still launched from a common snapshot.

Runtime Pre-warmup has also been studied with the goal of launching runtimes just before invocations are received(mahgoub:2022, ) and thus hiding the cold start latency. Others have studied how to reduce the pre-warmup cost by taking advantage of heterogeneous resources(roy:2022, ). Mohan et al.(mohan:2019, ) proposed a pool of containers that could be utilized when function invocations arrive. Runtime pre-warmup is yet another technique that could be used in combination with Hydra.

Replacing multiple runtimes that implement a single language with a single runtime that implements multiple languages makes it easier to manage runtime images. By doing so, predicting which runtime to use for the next invocation becomes a lesser problem. In addition, a pool of Hydra runtimes could be used to accommodate any function invocation as long there is a Truffle implementation for the language the function is implemented in.

Virtualizing Runtimes allows one or more functions to share a single runtime. Sharing, however, can be achieved with different levels of isolation. Crucial(pons:2019, ) and Nightcore(jia:2021, ) handle multiple function invocations in a single runtime using different threads. Photons(dukic:2020, ) also runs concurrent function invocations in a single runtime but applies Java bytecode transformations to ensure isolation (i.e., concurrent function invocations could not interact with other co-executing invocations). Boucher et al.(boucher:2018, ) rely on Rust’s memory safety guarantees to handle multiple invocations concurrently in isolation. Cloudfare Workers(cloudfare, ) rely on V8 memory isolates to run concurrent JavaScript functions in the same V8 runtime instance. Faasm(shillaker:2020, ) and Gackstatter et al.(gackstatter:2022, ) proposed using WebAssembly(haas:2017, ) to execute multiple invocations in isolation. WebAssembly however, still presents several limitations that prevent modern high-level languages (Java, JavaScript, and Python) from being supported (seesection2.2 for more details). As a result, runtimes such as Faasm need to compile a Python runtime such as CPython to WebAssembly in order to run Python functions. If concurrent invocations are to be handled, multiple runtimes will run concurrently in different WebAssembly execution environments leading to resource redundancy.

Hydra enforces isolation through memory isolates. However, compared to Cloudfare Workers, Hydra is not restricted to JavaScript functions, it supports multi-thread function code,³³3Multi-threading was disabled as a security measure resulting from designing the system to accommodate multiple tenants(workerd, ). it proposes AOT-compilation for user functions, and offers built-in process-level isolation. Opposed to Faasm, Hydra relies on Truffle to support multiple language support and therefore, concurrent invocations always share the underlying runtime. In addition, to further share the JIT compilation and profiling, Hydra shares the code caches among concurrent invocations of the same function.

Minimal Runtimes Although we use Firecracker VMs to isolate different tenants, the proposed techniques are not specific to Firecracker nor to system-VMs. Thus, container-based solutions such as Cntrl(thalheim:2018, ) and gVisor(gvisor, ) could also be used. Unikernel VMs(manco:2017, ; koller:2017, ; zhang:2018, ) are also a possible alternative to reduce Hydra’s cold start. However, supporting functions whose dependencies are only known at run-time creates additional challenges to reduce the components to include at VM image generation time. As a consequence, Hydra includes a full Linux kernel and requires functions to package any user-level library dependencies along with the function code.

Profile and Code Cache Sharing reduce code profiling and compilation overheads. Profile sharing has been proposed by Arnold et al.(arnold:2005, ) and Ottoni et al.(ottoni:2021, ) as a way to re-use profiling information. By doing so, future invocations do not need to repeat all the code profiling steps and can benefit from the profiles gathered by previous invocations. A similar technique has also been used to share Just-in-Time compiled code (khrabrov:2022, ; xu:2018, ) among different invocations of the same application/function. Such techniques could be integrated into Hydra to share Truffle code caches. However, solutions such as JITServer(khrabrov:2022, ), ShareJIT(xu:2018, ) are designed to optimize long-running applications and are not prepared to deal with the challenges inherent to fast-executing and lightweight Serverless functions(carreira:2021, ).

7. Conclusions

Hydra is a multi-language runtime designed to run serverless functions in an ultra lightweight sandbox. By doing so, it reduces virtualization stack redundancy minimizing memory footprint and the number of cold starts. We demonstrate that real serverless workloads are extremely sparse and that aggregating functions from the same user can unlock significant savings in serverless platforms. We are collaborating with a large cloud provider to incorporate Hydra into their serverless platform.

References

[1]Understanding container reuse in aws lambda.https://aws.amazon.com/blogs/compute/container-reuse-in-lambda/, 2014.Accessed: 2023-04-20.
[2]Understanding serverless cold start.https://azure.microsoft.com/en-us/blog/understanding-serverless-cold-start/,2018.Accessed: 2023-04-20.
[3]Amazon lambda.https://aws.amazon.com/lambda/, 2022.Accessed: 2023-04-20.
[4]Azure functions.https://azure.microsoft.com/en-us/services/functions, 2022.Accessed: 2023-04-20.
[5]Cloudfare workers.https://workers.cloudflare.com/, 2022.Accessed: 2023-04-20.
[6]The cloudflare workers security model.https://blog.cloudflare.com/mitigating-spectre-and-other-security-threats-the-cloudflare-workers-security-model/,2022.Accessed: 2023-04-20.
[7]Control groups.https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/cgroups.html,2022.Accessed: 2023-04-20.
[8]Docker.https://www.docker.com/, 2022.Accessed: 2023-04-20.
[9]For the love of serverless: Lambda adoption by runtime.https://newrelic.com/resources/ebooks/serverless-benchmark-report-aws-lambda-2020,2022.Accessed: 2023-04-20.
[10]Google cloud functions.https://cloud.google.com/functions, 2022.Accessed: 2023-04-20.
[11]Graalvm.https://www.graalvm.org/, 2022.Accessed: 2023-04-20.
[12]gvisor.https://gvisor.dev/, 2022.Accessed: 2023-04-20.
[13]Intel 64 and ia-32 architectures.https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software.pdf,2022.Accessed: 2023-04-20.
[14]Openfaas - serverless functions, made simple.https://www.openfaas.com/, 2022.Accessed: 2023-04-20.
[15]Openwhisk - open source serverless cloud platform.https://openwhisk.apache.org/, 2022.Accessed: 2023-04-20.
[16]Qemu.https://www.qemu.org/, 2022.Accessed: 2023-04-20.
[17]Serverless containers in kubernetes environments.https://knative.dev/, 2022.Accessed: 2023-04-20.
[18]V8.https://v8.dev/, 2022.Accessed: 2023-04-20.
[19]Criu.https://criu.org/, 2024.Accessed: 2024-03-2.
[20]Seccomp bpf (secure computing with filters).https://www.kernel.org/doc/html/v4.19/userspace-api/seccomp_filter.html,2024.Accessed: 2024-03-02.
[21]Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony Liguori, RolfNeugebauer, Phil Piwonka, and Diana-Maria Popa.Firecracker: Lightweight virtualization for serverless applications.In 17th USENIX Symposium on Networked Systems Design andImplementation (NSDI 20), pages 419–434, Santa Clara, CA, February 2020.USENIX Association.
[22]IstemiEkin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus Satzke,Andre Beck, Paarijaat Aditya, and Volker Hilt.SAND: Towards High-Performance serverless computing.In 2018 USENIX Annual Technical Conference (USENIX ATC 18),pages 923–935, Boston, MA, July 2018. USENIX Association.
[23]Lixiang Ao, George Porter, and GeoffreyM. Voelker.Faasnap: Faas made fast using snapshot-based vms.In Proceedings of the Seventeenth European Conference onComputer Systems, EuroSys ’22, page 730–746, New York, NY, USA, 2022.Association for Computing Machinery.
[24]Matthew Arnold, Adam Welc, and V.T. Rajan.Improving virtual machine performance using a cross-run profilerepository.In Proceedings of the 20th Annual ACM SIGPLAN Conference onObject-Oriented Programming, Systems, Languages, and Applications, OOPSLA’05, page 297–311, New York, NY, USA, 2005. Association for ComputingMachinery.
[25]Daniel Barcelona-Pons, Marc Sánchez-Artigas, Gerard París, PierreSutra, and Pedro García-López.On the faas track: Building stateful distributed applications withserverless architectures.In Proceedings of the 20th International Middleware Conference,Middleware ’19, page 41–54, New York, NY, USA, 2019. Association forComputing Machinery.
[26]Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho,Rolf Neugebauer, Ian Pratt, and Andrew Warfield.Xen and the art of virtualization.In Proceedings of the Nineteenth ACM Symposium on OperatingSystems Principles, SOSP ’03, page 164–177, New York, NY, USA, 2003.Association for Computing Machinery.
[27]Sol Boucher, Anuj Kalia, DavidG. Andersen, and Michael Kaminsky.Putting the ”micro” back in microservice.In 2018 USENIX Annual Technical Conference (USENIX ATC 18),pages 645–650, Boston, MA, July 2018. USENIX Association.
[28]James Cadden, Thomas Unger, Yara Awad, Han Dong, Orran Krieger, and JonathanAppavoo.Seuss: Skip redundant paths to make serverless fast.In Proceedings of the Fifteenth European Conference on ComputerSystems, EuroSys ’20, New York, NY, USA, 2020. Association for ComputingMachinery.
[29]Joao Carreira, Pedro Fonseca, Alexey Tumanov, Andrew Zhang, and Randy Katz.Cirrus: A serverless framework for end-to-end ml workflows.In Proceedings of the ACM Symposium on Cloud Computing, SoCC’19, page 13–24, New York, NY, USA, 2019. Association for ComputingMachinery.
[30]Joao Carreira, Sumer Kohli, Rodrigo Bruno, and Pedro Fonseca.From warm to hot starts: Leveraging runtimes for the serverless era.In Proceedings of the Workshop on Hot Topics in OperatingSystems, HotOS ’21, page 58–64, New York, NY, USA, 2021. Association forComputing Machinery.
[31]Paul Castro, Vatche Ishakian, Vinod Muthusamy, and Aleksander Slominski.The rise of serverless computing.Commun. ACM, 62(12):44–54, nov 2019.
[32]Marcin Copik, Grzegorz Kwasniewski, Maciej Besta, Michal Podstawski, andTorsten Hoefler.Sebs: A serverless benchmark suite for function-as-a-servicecomputing.In Proceedings of the 22nd International Middleware Conference,Middleware ’21, page 64–78, New York, NY, USA, 2021. Association forComputing Machinery.
[33]Dong Du, Tianyi Yu, Yubin Xia, Binyu Zang, Guanglu Yan, Chenggang Qin, QixuanWu, and Haibo Chen.Catalyzer: Sub-millisecond startup for serverless computing withinitialization-less booting.In Proceedings of the Twenty-Fifth International Conference onArchitectural Support for Programming Languages and Operating Systems,ASPLOS ’20, page 467–481, New York, NY, USA, 2020. Association forComputing Machinery.
[34]Vojislav Dukic, Rodrigo Bruno, Ankit Singla, and Gustavo Alonso.Photons: Lambdas on a diet.In Proceedings of the 11th ACM Symposium on Cloud Computing,SoCC ’20, page 45–59, New York, NY, USA, 2020. Association for ComputingMachinery.
[35]Sadjad Fouladi, RiadS. Wahby, Brennan Shacklett, KarthikeyanVasukiBalasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, GeorgePorter, and Keith Winstein.Encoding, fast and slow: Low-latency video processing using thousandsof tiny threads.In Proceedings of the 14th USENIX Conference on NetworkedSystems Design and Implementation, NSDI’17, page 363–376, USA, 2017.USENIX Association.
[36]Philipp Gackstatter, PantelisA. Frangoudis, and Schahram Dustdar.Pushing serverless to the edge with webassembly runtimes.In 2022 22nd IEEE International Symposium on Cluster, Cloud andInternet Computing (CCGrid), pages 140–149, 2022.
[37]YuGan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki,Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, Kelvin Hu, MeghnaPancholi, Yuan He, Brett Clancy, Chris Colen, f*ckang Wen, Catherine Leung,Siyuan Wang, Leon Zaruvinsky, Mateo Espinosa, Rick Lin, Zhongling Liu, JakePadilla, and Christina Delimitrou.An open-source benchmark suite for microservices and theirhardware-software implications for cloud & edge systems.ASPLOS ’19, page 3–18, New York, NY, USA, 2019. Association forComputing Machinery.
[38]Matthias Grimmer.High-performance language interoperability in multi-languageruntimes.SPLASH ’14, page 17–19, New York, NY, USA, 2014. Association forComputing Machinery.
[39]Matthias Grimmer, Roland Schatz, Chris Seaton, Thomas Würthinger, andHanspeter Mössenböck.Memory-safe execution of c on a java vm.In Proceedings of the 10th ACM Workshop on Programming Languagesand Analysis for Security, PLAS’15, page 16–27, New York, NY, USA, 2015.Association for Computing Machinery.
[40]Matthias Grimmer, Chris Seaton, Roland Schatz, Thomas Würthinger, andHanspeter Mössenböck.High-performance cross-language interoperability in a multi-languageruntime.In Proceedings of the 11th Symposium on Dynamic Languages, DLS2015, page 78–90, New York, NY, USA, 2015. Association for ComputingMachinery.
[41]Andreas Haas, Andreas Rossberg, DerekL. Schuff, BenL. Titzer, Michael Holman,Dan Gohman, Luke Wagner, Alon Zakai, and JFBastien.Bringing the web up to speed with webassembly.In Proceedings of the 38th ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation, PLDI 2017, page 185–200, New York, NY,USA, 2017. Association for Computing Machinery.
[42]Zhipeng Jia and Emmett Witchel.Nightcore: Efficient and scalable serverless computing forlatency-sensitive, interactive microservices.In Proceedings of the 26th ACM International Conference onArchitectural Support for Programming Languages and Operating Systems,ASPLOS ’21, page 152–166, New York, NY, USA, 2021. Association forComputing Machinery.
[43]Alexey Khrabrov, Marius Pirvu, Vijay Sundaresan, and Eyal deLara.JITServer: Disaggregated caching JIT compiler for the JVM inthe cloud.In 2022 USENIX Annual Technical Conference (USENIX ATC 22),pages 869–884, Carlsbad, CA, July 2022. USENIX Association.
[44]Ricardo Koller and Dan Williams.Will serverless end the dominance of linux in the cloud?In Proceedings of the 16th Workshop on Hot Topics in OperatingSystems, HotOS ’17, page 169–173, New York, NY, USA, 2017. Association forComputing Machinery.
[45]Swaroop Kotni, Ajay Nayak, Vinod Ganapathy, and Arkaprava Basu.Faastlane: Accelerating Function-as-a-Service workflows.In 2021 USENIX Annual Technical Conference (USENIX ATC 21),pages 805–820. USENIX Association, July 2021.
[46]James Larisch, James Mickens, and Eddie Kohler.Alto: Lightweight vms using virtualization-aware managed runtimes.In Proceedings of the 15th International Conference on ManagedLanguages and Runtimes, ManLang ’18, New York, NY, USA, 2018. Associationfor Computing Machinery.
[47]Florian Latifi, David Leopoldseder, Christian Wimmer, and HanspeterMössenböck.Compgen: Generation of fast jit compilers in a multi-language vm.In Proceedings of the 17th ACM SIGPLAN International Symposiumon Dynamic Languages, DLS 2021, page 35–47, New York, NY, USA, 2021.Association for Computing Machinery.
[48]Ashraf Mahgoub, EdgardoBarsallo Yi, Karthick Shankar, Sameh Elnikety, SomaliChaterji, and Saurabh Bagchi.ORION and the three rights: Sizing, bundling, and prewarming forserverless DAGs.In 16th USENIX Symposium on Operating Systems Design andImplementation (OSDI 22), pages 303–320, Carlsbad, CA, July 2022. USENIXAssociation.
[49]Filipe Manco, Costin Lupu, Florian Schmidt, Jose Mendes, Simon Kuenzer, SumitSati, Kenichi Yasukata, Costin Raiciu, and Felipe Huici.My vm is lighter (and safer) than your container.In Proceedings of the 26th Symposium on Operating SystemsPrinciples, SOSP ’17, page 218–233, New York, NY, USA, 2017. Associationfor Computing Machinery.
[50]Anup Mohan, Harshad Sane, Ksh*tij Doshi, Saikrishna Edupuganti, Naren Nayak,and Vadim Sukhomlinov.Agile cold starts for scalable serverless.In 11th USENIX Workshop on Hot Topics in Cloud Computing(HotCloud 19), Renton, WA, July 2019. USENIX Association.
[51]Ingo Müller, Renato Marroquín, and Gustavo Alonso.Lambada: Interactive data analytics on cold data using serverlesscloud infrastructure.In Proceedings of the 2020 ACM SIGMOD International Conferenceon Management of Data, SIGMOD ’20, page 115–130, New York, NY, USA, 2020.Association for Computing Machinery.
[52]Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Harter, AndreaArpaci-Dusseau, and Remzi Arpaci-Dusseau.SOCK: Rapid task provisioning with Serverless-Optimizedcontainers.In 2018 USENIX Annual Technical Conference (USENIX ATC 18),pages 57–70, Boston, MA, July 2018. USENIX Association.
[53]Guilherme Ottoni and Bin Liu.Hhvm jump-start: Boosting both warmup and steady-state performance atscale.In Proceedings of the 2021 IEEE/ACM International Symposium onCode Generation and Optimization, CGO ’21, page 340–350. IEEE Press, 2021.
[54]Soyeon Park, Sangho Lee, Wen Xu, HyunGon Moon, and Taesoo Kim.libmpk: Software abstraction for intel memory protection keys (intelMPK).In 2019 USENIX Annual Technical Conference (USENIX ATC 19),pages 241–254, Renton, WA, July 2019. USENIX Association.
[55]Matthew Perron, Raul CastroFernandez, David DeWitt, and Samuel Madden.Starling: A scalable query engine on cloud functions.In Proceedings of the 2020 ACM SIGMOD International Conferenceon Management of Data, SIGMOD ’20, page 131–141, New York, NY, USA, 2020.Association for Computing Machinery.
[56]Manuel Rigger, Matthias Grimmer, Christian Wimmer, Thomas Würthinger, andHanspeter Mössenböck.Bringing low-level languages to the jvm: Efficient execution of llvmir on truffle.In Proceedings of the 8th International Workshop on VirtualMachines and Intermediate Languages, VMIL 2016, page 6–15, New York, NY,USA, 2016. Association for Computing Machinery.
[57]RohanBasu Roy, Tirthak Patel, and Devesh Tiwari.Icebreaker: Warming serverless functions better with heterogeneity.In Proceedings of the 27th ACM International Conference onArchitectural Support for Programming Languages and Operating Systems,ASPLOS ’22, page 753–767, New York, NY, USA, 2022. Association forComputing Machinery.
[58]SalimS. Salim, Andy Nisbet, and Mikel Luján.Trufflewasm: A webassembly interpreter on graalvm.In Proceedings of the 16th ACM SIGPLAN/SIGOPS InternationalConference on Virtual Execution Environments, VEE ’20, page 88–100, NewYork, NY, USA, 2020. Association for Computing Machinery.
[59]VasilyA. Sartakov, Lluís Vilanova, David Eyers, Takahiro Shinagawa, andPeter Pietzuch.CAP-VMs: Capability-Based isolation and sharing in the cloud.In 16th USENIX Symposium on Operating Systems Design andImplementation (OSDI 22), pages 597–612, Carlsbad, CA, July 2022. USENIXAssociation.
[60]Divyanshu Saxena, Tao Ji, Arjun Singhvi, Junaid Khalid, and Aditya Akella.Memory deduplication for serverless computing with medes.In Proceedings of the Seventeenth European Conference onComputer Systems, EuroSys ’22, page 714–729, New York, NY, USA, 2022.Association for Computing Machinery.
[61]Johann Schleier-Smith, Vikram Sreekanti, Anurag Khandelwal, Joao Carreira,NeerajaJ. Yadwadkar, RalucaAda Popa, JosephE. Gonzalez, Ion Stoica, andDavidA. Patterson.What serverless computing is and should become: The next phase ofcloud computing.Commun. ACM, 64(5):76–84, apr 2021.
[62]Mohammad Shahrad, Rodrigo Fonseca, Inigo Goiri, Gohar Chaudhry, Paul Batum,Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and RicardoBianchini.Serverless in the wild: Characterizing and optimizing the serverlessworkload at a large cloud provider.In 2020 USENIX Annual Technical Conference (USENIX ATC 20),pages 205–218. USENIX Association, July 2020.
[63]Vaishaal Shankar, Karl Krauth, Kailas Vodrahalli, Qifan Pu, Benjamin Recht, IonStoica, Jonathan Ragan-Kelley, Eric Jonas, and Shivaram Venkataraman.Serverless linear algebra.In Proceedings of the 11th ACM Symposium on Cloud Computing,SoCC ’20, page 281–295, New York, NY, USA, 2020. Association for ComputingMachinery.
[64]Simon Shillaker and Peter Pietzuch.Faasm: Lightweight isolation for efficient stateful serverlesscomputing.In Proceedings of the 2020 USENIX Conference on Usenix AnnualTechnical Conference, USENIX ATC’20, USA, 2020. USENIX Association.
[65]Wonseok Shin, Wook-Hee Kim, and Changwoo Min.Fireworks: A fast, efficient, and safe serverless framework usingvm-level post-jit snapshot.In Proceedings of the Seventeenth European Conference onComputer Systems, EuroSys ’22, page 663–677, New York, NY, USA, 2022.Association for Computing Machinery.
[66]Paulo Silva, Daniel Fireman, and ThiagoEmmanuel Pereira.Prebaking functions to warm the serverless cold start.In Proceedings of the 21st International Middleware Conference,Middleware ’20, page 1–13, New York, NY, USA, 2020. Association forComputing Machinery.
[67]Jörg Thalheim, Pramod Bhatotia, Pedro Fonseca, and Baris Kasikci.Cntr: Lightweight OS containers.In 2018 USENIX Annual Technical Conference (USENIX ATC 18),pages 199–212, Boston, MA, July 2018. USENIX Association.
[68]Dmitrii Ustiugov, Plamen Petrov, Marios Kogias, Edouard Bugnion, and BorisGrot.Benchmarking, analysis, and optimization of serverless functionsnapshots.In Proceedings of the 26th ACM International Conference onArchitectural Support for Programming Languages and Operating Systems,ASPLOS ’21, page 559–572, New York, NY, USA, 2021. Association forComputing Machinery.
[69]Anjo Vahldiek-Oberwagner, Eslam Elnikety, NunoO. Duarte, Michael Sammler,Peter Druschel, and Deepak Garg.ERIM: Secure, efficient in-process isolation with protection keys(MPK).In 28th USENIX Security Symposium (USENIX Security 19), pages1221–1238, Santa Clara, CA, August 2019. USENIX Association.
[70]Liang Wang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and Michael Swift.Peeking behind the curtains of serverless platforms.In 2018 USENIX Annual Technical Conference (USENIX ATC 18),pages 133–146, Boston, MA, July 2018. USENIX Association.
[71]Christian Wimmer, Codrut Stancu, Peter Hofer, Vojin Jovanovic, PaulWögerer, PeterB. Kessler, Oleg Pliss, and Thomas Würthinger.Initialize once, start fast: Application initialization at buildtime.Proc. ACM Program. Lang., 3(OOPSLA), oct 2019.
[72]Christian Wimmer and Thomas Würthinger.Truffle: A self-optimizing runtime system.In Proceedings of the 3rd Annual Conference on Systems,Programming, and Applications: Software for Humanity, SPLASH ’12, page13–14, New York, NY, USA, 2012. Association for Computing Machinery.
[73]Jonathan Woodruff, RobertN.M. Watson, David Chisnall, SimonW. Moore, JonathanAnderson, Brooks Davis, Ben Laurie, PeterG. Neumann, Robert Norton, andMichael Roe.The cheri capability model: Revisiting risc in an age of risk.ISCA ’14, page 457–468. IEEE Press, 2014.
[74]Thomas Würthinger, Christian Wimmer, Christian Humer, Andreas Wöß,Lukas Stadler, Chris Seaton, Gilles Duboscq, Doug Simon, and MatthiasGrimmer.Practical partial evaluation for high-performance dynamic languageruntimes.In Proceedings of the 38th ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation, PLDI 2017, page 662–676, New York, NY,USA, 2017. Association for Computing Machinery.
[75]Thomas Würthinger, Christian Wimmer, Andreas Wöß, Lukas Stadler,Gilles Duboscq, Christian Humer, Gregor Richards, Doug Simon, and MarioWolczko.One vm to rule them all.In Proceedings of the 2013 ACM International Symposium on NewIdeas, New Paradigms, and Reflections on Programming and Software, Onward!2013, page 187–204, New York, NY, USA, 2013. Association for ComputingMachinery.
[76]Xiaoran Xu, Keith Cooper, Jacob Brock, Yan Zhang, and Handong Ye.Sharejit: Jit code cache sharing across processes and its practicalimplementation.Proc. ACM Program. Lang., 2(OOPSLA), oct 2018.
[77]Bennet Yee, David Sehr, Gregory Dardyk, J.Bradley Chen, Robert Muth, TavisOrmandy, Shiki Okasaka, Neha Narula, and Nicholas Fullagar.Native client: A sandbox for portable, untrusted x86 native code.In 2009 30th IEEE Symposium on Security and Privacy, pages79–93, 2009.
[78]Wei Zhang, Per Larsen, Stefan Brunthaler, and Michael Franz.Accelerating iterators in optimizing ast interpreters.In Proceedings of the 2014 ACM International Conference onObject Oriented Programming Systems Languages & Applications, OOPSLA ’14,page 727–743, New York, NY, USA, 2014. Association for Computing Machinery.
[79]Yiming Zhang, Jon Crowcroft, Dongsheng Li, Chengfen Zhang, Huiba Li, YaozhengWang, Kai Yu, Yongqiang Xiong, and Guihai Chen.KylinX: A dynamic library operating system for simplified andefficient cloud virtualization.In 2018 USENIX Annual Technical Conference (USENIX ATC 18),pages 173–186, Boston, MA, July 2018. USENIX Association.