FAST, NSDI, and Vault 2019: Full Schedule

8:00am EST

Continental Breakfast

Monday February 25, 2019 8:00am - 9:00am EST
Grand Ballroom Foyer

FAST '19

8:00am EST

Continental Breakfast

Monday February 25, 2019 8:00am - 9:00am EST
Grand Ballroom Foyer

Vault '19

9:00am EST

Morning Tutorial 1: Understanding Large Scale Storage Systems

This tutorial is oriented toward administrators and developers who manage and use large-scale storage systems. An important goal of the tutorial is to give the audience the foundation for effectively comparing different storage system options, as well as a better understanding of the systems they already have.

Cluster-based parallel storage technologies are used to manage millions of files, thousands of concurrent jobs, and performance that scales from 10s to 100s of GB/sec. This tutorial will examine current state-of-the-art high-performance file systems and the underlying technologies employed to deliver scalable performance across a range of scientific and industrial applications.

The tutorial starts with a look at storage devices including traditional hard drives, SSD, and new non-volatile memory devices. Next, we look at how a file system is put together, comparing and contrasting SAN file systems, scale-out NAS, object-based parallel file systems, and cloud-based storage systems.

Topics include:

SSD technology
NVRAM
Scaling the data path
Scaling metadata
Fault tolerance
Manageability
Cloud storage

Speakers

Brent Welch

Google

Brent Welch is a senior staff software engineer at Google, where he works on their public cloud system. He was Chief Technology Officer at Panasas and has also worked at Xerox-PARC and Sun Microsystems Laboratories. Brent has experience building software systems from the device driver... Read More →

Monday February 25, 2019 9:00am - 12:30pm EST
Constitution Ballroom A

FAST '19

9:00am EST

Morning Tutorial 2: Blockchain and Storage

This tutorial will cover the basics of blockchain, this issues blockchain has concerning storage, database usage with blockchain and the solutions to the storage issues.

Topics include:

The Basics of Blockchain
Storage Issues for Blockchain
Using Databases for offchain storage
Blockchain deployment & Backup/Recovery

Speakers

Mike Ault

IBM

Mike Ault began work in the nuclear navy and moved into the civilian nuclear field in 1979. He has been working with computers and databases since 1980. In 1990 Mike started working with the Oracle database system. Mike has worked with flash systems since 2007 when he began consulting... Read More →

Monday February 25, 2019 9:00am - 12:30pm EST
Constitution Ballroom B

FAST '19

9:00am EST

Introduction to Storage for Containers

Containers and related technologies allow to manage computational resources at fine granularity, increase the pace of software development, testing, and deployment, and at the same time improve the efficiency of infrastructure utilization. Recognizing these benefits, many enterprises are upgrading their technology by incorporating containers in their infrastructure and workflows.

As containerization technologies enter enterprise market they meet new functional demands. Providing and managing persistent, highly-available, yet nimble storage is a particularly important requirement. A number of new and existing companies and open-source projects are aggressively entering this arena. We expect that in the coming years the demand for professionals who are fluent in storage for containers will rise dramatically.

In our tutorial we plan to cover all major topics of storage for containers. We will first describe the structure of Docker's layered images, its local CoW-based storage, and Docker registry. We will then present the concept of persistent volumes and dynamic provisioning in Kubernetes. As part of the tutorial, we will use the insights and examples that we accumulated while working on adapting IBM's Spectrum Scale for containerization environments.

Speakers

Vasily Tarasov

IBM Research

Vasily Tarasov is a Research Staff Member at IBM. His current research projects include storage for containers and high-performance file systems as a service. Vasily worked extensively on storage, file systems, data deduplication, performance and workload analysis. Vasily is an author... Read More →

Dimitris Skourtis

IBM Research

Dimitris Skourtis is a Research Staff Member at IBM. His current work is around cloud orchestrators and persistent storage for containers. Prior to IBM, he worked on resource management at VMware, where he prototyped and shipped SIOCv2, a policy-driven storage scheduling solution... Read More →

Ted Anderson

IBM Research

Ted Anderson is a Senior Software Engineer with IBM Research. Ted has extensive experience with several distributed file systems, most recently Spectrum Scale/GPFS. His recent work utilizes concurrency, caching, and delegation that guarantee correctness using distributed coherency... Read More →

Ali Anwar

IBM Research

Ali Anwar is a research staff member at IBM Research. He received his Ph.D. in Computer Science from Virginia Tech. In his earlier years he worked as a tools developer (GNU GDB) at Mentor Graphics. Ali's research interests are in distributed computing systems, cloud storage management... Read More →

Monday February 25, 2019 9:00am - 12:30pm EST
Commonwealth Room

Vault '19

10:30am EST

Break with Refreshments

Monday February 25, 2019 10:30am - 11:00am EST
Grand Ballroom Foyer

Vault '19

12:30pm EST

Tutorial Luncheon

Monday February 25, 2019 12:30pm - 1:30pm EST
Back Bay Ballroom

FAST '19

12:30pm EST

Tutorial Luncheon

Monday February 25, 2019 12:30pm - 1:30pm EST
Back Bay Ballroom

Vault '19

1:30pm EST

Monday February 25, 2019 3:00pm - 3:30pm EST
Grand Ballroom Foyer

FAST '19

3:00pm EST

Break with Refreshments

Monday February 25, 2019 3:00pm - 3:30pm EST
Grand Ballroom Foyer

Vault '19

3:30pm EST

Performance Analysis in Linux Storage Stack with BPF

How can we deeply analyze and trace performance issues in the Linux Storage Stack?

Many monitoring and benchmark tools help us to find bottlenecks and problems through system profiling. However, it is pretty tricky to dig deeper into the root cause on code/function level because of complex execution flow (e.g. multiple contexts or async flow). In this tutorial, we introduce in-kernel BPF technology and practice analyzing performance issues in the Linux Storage Stack using several tracing tools (BPF, uftrace, ctracer, and perf) with attendees step-by-step. This session is targeted towards administrators, researchers, and developers.

BPF is a technology that allows safely injecting and executing custom code in the kernel at runtime, an unprecedented functionality. With BPF, by leveraging the injected custom code in a kernel, profiling and tracing is low overhead, and makes more richer introspection available.

Speakers

Taeung Song

KossLab

Taeung is a Software Engineer in KOSSLAB (Korea Opensource Software Developers Lab) and Opensource Contributor in regard to Tracing & Profiling technology such as perf, uftrace, BPF, etc.

Daniel T. Lee

The University of Soongsil

Daniel T. Lee is a Bachelor's degree student at the University of Soongsil and has a deep enthusiasm for Linux. He has been contributing to uftrace: Function (graph) tracer since 2018. He is passionate about tracing and profiling, and he really loves cloud engineering. He has a deep... Read More →

Monday February 25, 2019 3:30pm - 5:00pm EST
Commonwealth Room

Vault '19

6:00pm EST

FAST '19 Happy Hour

Kick off the conference by meeting with your colleagues over snacks and drinks.

Monday February 25, 2019 6:00pm - 7:00pm EST
Back Bay Ballroom AB

FAST '19

7:00pm EST

All Things Ceph BoF

Moderators

Ric Wheeler

Facebook

Monday February 25, 2019 7:00pm - 8:30pm EST
Commonwealth Room

Vault '19

7:00pm EST

NSDI '19 Preview Session

Are you new to NSDI? Are you a networking expert but feel bewildered when talk turns to security? Are you interested in engaging more deeply with paper presentations outside your research area? Join us for the NSDI preview session, where area experts will give short introductions to the Symposium's major technical sessions.

Host Networking: Brent Stephens, University of Illinois at Chicago
Distributed Systems: Aurojit Panda, New York University
Modern Network Hardware: Akshay Narayan, Massachusetts Institute of Technology
Analytics: Aurojit Panda, New York University
Data Center Network Architecutre: Anirudh Sivaraman, New York University
Wireless Technologies: Dinesh Bharadia, University of California, San Diego
Operating Systems: Amy Ousterhout, Massachusetts Institute of Technology
Monitoring and Diagnosis: Anurag Khandelwal, University of California, Berkeley
Improving Machine Learning: Junchen Jiang, University of Chicago
Network Functions: Radhika Mittal, Massachusetts Institute of Technology and University of Illinois at Urbana–Champaign
Network Characterization: David Choffnes, Northeastern
Privacy and Security: Vyas Sekar, Carnegie Mellon University
Network Modeling: Costin Raicu, University Politehnica of Bucharest
Wireless Applications: Shaddi Hasan, Facebook

Monday February 25, 2019 7:00pm - 9:30pm EST
Grand Ballroom

NSDI '19

8:30pm EST

Kubernetes & Storage Oh My BoF

Moderators

Erik Riedel

Monday February 25, 2019 8:30pm - 10:00pm EST
Commonwealth Room

Vault '19

7:30am EST

Continental Breakfast

Tuesday February 26, 2019 7:30am - 8:30am EST
Grand Ballroom Foyer

NSDI '19

7:30am EST

Continental Breakfast

Tuesday February 26, 2019 7:30am - 8:30am EST
Grand Ballroom Foyer

Vault '19

7:30am EST

Continental Breakfast

Tuesday February 26, 2019 7:30am - 8:45am EST
Grand Ballroom Foyer

FAST '19

8:30am EST

Opening Remarks and Best Paper Awards

Speakers

Jay Lorch

Microsoft Research

Minlan Yu

Harvard University

Tuesday February 26, 2019 8:30am - 8:45am EST
Constitution Ballroom

NSDI '19

8:30am EST

Making Ceph Fast in the Face of Failure

Ceph Luminous and Mimic improve the impact of recovery on client I/O. In this talk, we'll discuss the key features that affect this, and how Ceph users can take advantage of them.

Speakers

Neha Ojha

Red Hat

Neha is a Senior Software Engineer at Red Hat. She is the project technical lead for the core team focusing on RADOS. Neha holds a Master's degree in Computer Science from the University of California, Santa Cruz.Her most recent talks have been at Mountpoint, co-located with Open... Read More →

Tuesday February 26, 2019 8:30am - 9:00am EST
Independence Ballroom

Vault '19

8:45am EST

Opening Remarks and Awards

Speakers

Arif Merchant

Google

Hakim Weatherspoon

Cornell University

Tuesday February 26, 2019 8:45am - 9:00am EST
Grand Ballroom

FAST '19

8:45am EST

Datacenter RPCs can be General and Fast

It is commonly believed that datacenter networking software must sacrifice generality to attain high performance. The popularity of specialized distributed systems designed specifically for niche technologies such as RDMA, lossless networks, FPGAs, and programmable switches testifies to this belief. In this paper, we show that such specialization is not necessary. eRPC is a new general-purpose remote procedure call (RPC) library that offers performance comparable to specialized systems, while running on commodity CPUs in traditional datacenter networks based on either lossy Ethernet or lossless fabrics. eRPC performs well in three key metrics: message rate for small messages; bandwidth for large messages; and scalability to a large number of nodes and CPU cores. It handles packet loss, congestion, and background request execution. In microbenchmarks, one CPU core can handle up to 10 million small RPCs per second, or send large messages at 75 Gbps. We port a production-grade implementation of Raft state machine replication to eRPC without modifying the core Raft source code. We achieve 5.5 microseconds of replication latency on lossy Ethernet, which is faster than or comparable to specialized replication systems that use programmable switches, FPGAs, or RDMA.

Speakers

Anuj Kalia

Carnegie Mellon University

Michael Kaminsky

Intel Labs

Georgia Institute of Technology

Yimeng Zhao

Georgia Institute of Technology

Nandita Dukkipati

Google

Ellen Zegura

Georgia Institute of Technology

Mostafa Ammar

Georgia Institute of Technology

Khaled Harras

Carnegie Mellon University

Amin Vahdat

Google Inc.

Tuesday February 26, 2019 9:10am - 9:35am EST
Constitution Ballroom

NSDI '19

9:25am EST

Experiences with Fuse in the Real World

The Filesystem in Userspace (FUSE) module provides a simple way to create user-space file systems. The shortcomings of this approach to implementing file systems have been debated many times in the past, a few times even with data to back up the arguments. In this talk, we will revisit the topic in the context of a distributed Software-Defined Storage (SDS) solution, gluster. We will present our experiences based on users deploying it in production over the years, with FUSE access as the primary interface. In this context, we will discuss some of the problem areas like memory management, and demonstrate trade-offs in implementing important caches in the user-space versus relying on kernel caches.

As gluster expands to newer use-cases like persistent storage for container platforms, it needs to efficiently handle a wide variety of workloads and more frequently handle smaller, single-client volumes. In this context, we see the need to absorb more recent FUSE performance enhancements like write-back caching, and we will present our characterization of the performance benefits obtained from these enhancements.

Speakers

Manoj Pillai

Red Hat

Manoj Pillai is part of the Performance and Scale Engineering Group at Red Hat. His focus is on storage performance, particularly around gluster, and he has presented on these topics at Open Source Summit, FOSDEM, Vault 2017, Red Hat Summit and Gluster Summit.

Raghavendra Gowdappa

Red Hat

Raghavendra Gowdappa is one of the maintainers of Glusterfs and is currently employed by Red Hat. He has worked on interfacing Glusterfs with FUSE, caching, network and file distribution aspects of Glusterfs. His earlier presentations were at FOSDEM, Vault 2017 and Gluster Summit... Read More →

Csaba Henk

Red Hat

Csaba Henk has worked on the fuse layer of Glusterfs from the early times on. He has been involved in augmentative and integration projects, like geo-replication and OpenStack Manila glusterfs drivers. These days he's back at core Glusterfs and works on caches and fuse.

Tuesday February 26, 2019 9:25am - 9:50am EST
Independence Ballroom

Vault '19

9:35am EST

Loom: Flexible and Efficient NIC Packet Scheduling

In multi-tenant cloud data centers, operators need to ensure that competing tenants and applications are isolated from each other and fairly share limited network resources. With current NICs, operators must either 1) use a single NIC queue and enforce network policy in software, which incurs high CPU overheads and struggles to drive increasing line-rates (100Gbps), or 2) use multiple NIC queues and accept imperfect isolation and policy enforcement. These problems arise due to inflexible and static NIC packet schedulers and an inefficient OS/NIC interface.

To overcome these limitations, we present Loom, a new NIC design that moves all per-flow scheduling decisions out of the OS and into the NIC. The key aspects of Loom's design are 1) a new network policy abstraction: restricted directed acyclic graphs (DAGs), 2) a programmable hierarchical packet scheduler, and 3) a new expressive and efficient OS/NIC interface that enables the OS to precisely control how the NIC performs packet scheduling while still ensuring low CPU utilization. Loom is the only multiqueue NIC design that is able to efficiently enforce network policy. We find empirically that Loom lowers latency, increases throughput, and improves fairness for collocated applications and tenants.

Speakers

Brent Stephens

UIC

Aditya Akella

UW-Madison

Michael Swift

UW-Madison

Tuesday February 26, 2019 9:35am - 10:00am EST
Constitution Ballroom

NSDI '19

9:50am EST

SMB3 Linux/POSIX Protocol Extensions: Overview and Update on Current Implementations

The SMB3 POSIX Extensions, a set of protocol extensions to allow for optimal Linux and Unix interoperability with Samba, NAS and Cloud file servers, have evolved over the past year, with test implementations in Samba and now merged into the Linux kernel. These extensions address various compatibility problems for Linux and Unix clients (such as case sensitivity, locking, delete semantics and mode bits among others). This presentation will review the state of the protocol extensions, what was learned in the implementations in Samba and also in the Linux kernel (including from running exhaustive Linux file system functional tests to try to better match local file system behavior over SMB3 mounts) and what it means for real applications.

With the deprecation of older less secure dialects like CIFS (which had standardized POSIX Extensions documented by SNIA), these SMB3 POSIX Extensions are urgently needed to be more broadly deployed to avoid functional or security problems and to optimally access Samba from Linux.

Speakers

Tuesday February 26, 2019 10:45am - 11:10am EST
Independence Ballroom

Stanford University

Tuesday February 26, 2019 10:55am - 11:20am EST
Constitution Ballroom

NSDI '19

11:00am EST

Optimizing Systems for Byte-Addressable NVM by Reducing Bit Flipping

New byte-addressable non-volatile memory (BNVM) technologies such as phase change memory (PCM) enable the construction of systems with large persistent memories, improving reliability and potentially reducing power consumption. However, BNVM technologies only support a limited number of lifetime writes per cell and consume most of their power when flipping a bit's state during a write; thus, PCM controllers only rewrite a cell's contents when the cell's value has changed. Prior research has assumed that reducing the number of words written is a good proxy for reducing the number of bits modified, but a recent study has suggested that this assumption may not be valid. Our research confirms that approaches with the fewest writes often have more bit flips than those optimized to reduce bit flipping.

To test the effectiveness of bit flip reduction, we built a framework that uses the number of bits flipped over time as the measure of "goodness" and modified a cycle-accurate simulator to count bits flipped during program execution. We implemented several modifications to common data structures designed to reduce power consumption and increase memory lifetime by reducing the number of bits modified by operations on several data structures: linked lists, hash tables, and red-black trees. We were able to reduce the number of bits flipped by up to 3.56× over standard implementations of the same data structures with negligible overhead. We measured the number of bits flipped by memory allocation and stack frame saves and found that careful data placement in the stack can reduce bit flips significantly. These changes require no hardware modifications and neither significantly reduce performance nor increase code complexity, making them attractive for designing systems optimized for BNVM.

Speakers

Daniel Bittman

UC Santa Cruz

Darrell D. E. Long

UC Santa Cruz

Peter Alvaro

UC Santa Cruz

Ethan L. Miller

UC Santa Cruz

Tuesday February 26, 2019 11:00am - 11:30am EST
Grand Ballroom

FAST '19

11:10am EST

New Techniques to Improve Small I/O Workloads in Distributed File Systems

Distributed file systems work well with high throughput applications that are parallelizable. Due to network overhead, they tend to perform less well with workloads that are meta-data or small-file intensive. This problem has been closely studied, resulting in many innovative ideas. For example, researchers have proposed storing inodes in column-store databases to speed up directory reads. Another idea is to have file systems publish “snapshots” visible to a subset of clients during metadata creation, which are later subscribed to by the rest of the system.

Are these techniques practical outside university labs? To answer this question, we introduce software that makes the original implementations much easier to use, by acting as a layer on top of Ceph object storage. The talk will walk through how to set up and run the configuration in realistic environments. The original research will be described in detail, explaining how improved performance comes with some loss of Posix generality, along with a small number of new operational steps outside of traditional file system workflows. The talk will show how this solution could be a good fit for analytics use cases where file system semantics are needed and there is flexibility at the application level.

Speakers

Dan Lambright

Huawei

Dan has worked in open source storage at Red Hat and also at AWS. Today he is building distributed storage at Huawei. He has spoken at Vault, LinuxCon, OpenStack, LISA, and other venues. He also enjoys teaching at the University of Massachusetts Lowell.

Tuesday February 26, 2019 11:10am - 11:35am EST
Independence Ballroom

Vault '19

11:20am EST

Size-aware Sharding For Improving Tail Latencies in In-memory Key-value Stores

This paper introduces the concept of size-aware sharding to improve tail latencies for in-memory key-value stores, and describes its implementation in the Minos key-value store. Tail latencies are crucial in distributed applications with high fan-out ratios, because overall response time is determined by the slowest response. Size-aware sharding distributes requests for keys to cores according to the size of the item associated with the key. In particular, requests for small and large items are sent to disjoint subsets of cores. Size-aware sharding improves tail latencies by avoiding head-of-line blocking, in which a request for a small item gets queued behind a request for a large item. Alternative size-unaware approaches to sharding such as keyhash-based sharding, request dispatching and stealing do not avoid head-of-line blocking, and therefore exhibit worse tail latencies. The challenge in implementing size-aware sharding is to maintain high throughput by avoiding the cost of software dispatching and by achieving load balancing between different cores. Minos uses hardware dispatch for all requests for small items, which form the very large majority of all requests. It achieves load balancing by adapting the number of cores handling requests for small and large items to their relative presence in the workload. We compare Minos to three state-of-the-art designs of in-memory KV stores. Compared to its closest competitor, Minos achieves a 99th percentile latency that is up to two orders of magnitude lower. Put differently, for a given value for the 99th percentile latency equal to 10 times the mean service time, Minos achieves a throughput that is up to 7.4 times higher.

Speakers

Diego Didona

EPFL

Willy Zwaenepoel

EPFL and University of Sydney

Tuesday February 26, 2019 11:20am - 11:45am EST
Constitution Ballroom

NSDI '19

11:30am EST

Write-Optimized Dynamic Hashing for Persistent Memory

Low latency storage media such as byte-addressable persistent memory (PM) requires rethinking of various data structures in terms of optimization. One of the main challenges in implementing hash-based indexing structures on PM is how to achieve efficiency by making effective use of cachelines while guaranteeing failure-atomicity for dynamic hash expansion and shrinkage. In this paper, we present Cacheline-Conscious Extendible Hashing (CCEH) that reduces the overhead of dynamic memory block management while guaranteeing constant hash table lookup time. CCEH guarantees failure-atomicity without making use of explicit logging. Our experiments show that CCEH effectively adapts its size as the demand increases under the fine-grained failure-atomicity constraint and reduces the maximum query latency by over two-thirds compared to the state-of-the-art hashing techniques.

Speakers

Moohyeon Nam

UNIST (Ulsan National Institute of Science and Technology)

Hokeun Cha

Sungkyunkwan University

Young-ri Choi

UNIST (Ulsan National Institute of Science and Technology)

Sam H. Noh

UNIST

Beomseok Nam

Sungkyunkwan University

Tuesday February 26, 2019 11:30am - 12:00pm EST
Grand Ballroom

FAST '19

11:35am EST

Optimizing Storage Performance for 4–5 Million IOPs

New workloads and Storage Class Memory (SCM) are demanding a new level of IOPs, bandwidth, and driver optimizations in Linux for storage networks. James Smart will discuss how the lpfc driver was recently reworked to achieve a new level of driver performance reaching 5+ Million IOPs. James will discuss hardware parallelization, per-core WQs, interrupt handling, and shared resource management that will benefit both SCSI and NVMe over Fabrics performance. James will show performance curves, discuss Linux OS issues encountered, and work yet to do in Linux to improve performance even more.

Speakers

James Smart

Broadcom

James Smart is currently a Distinguished Engineer at Broadcom responsible for the architecture of Broadcom's Fibre Channel Linux stack. James has worked in storage software and firmware development for 32 years. James is a member of T11 and the NVM Express standards groups. James... Read More →

Tuesday February 26, 2019 11:35am - 12:00pm EST
Independence Ballroom

FAST '19

12:00pm EST

Conference Luncheon

Tuesday February 26, 2019 12:00pm - 1:30pm EST
Back Bay Ballroom D

Vault '19

12:10pm EST

Symposium Luncheon and Test of Time Award Presentation

Tuesday February 26, 2019 12:10pm - 1:30pm EST
Back Bay Ballroom ABC

NSDI '19

12:30pm EST

Lunch (on your own)

Tuesday February 26, 2019 12:30pm - 2:00pm EST
N/A

Drivescale Inc.

Brian Pawlowski is currently CTO of Drivescale Inc. where he is involved in the design of software to support cluster computing and developing a platform for composable infrastructure.As Vice President and Chief Architect at Pure Storage, he was focused on product and architecture... Read More →

Tuesday February 26, 2019 1:30pm - 1:55pm EST
Independence Ballroom

The Storage Architecture of Intel's Data Management Platform (DMP)

This talk will discuss the Storage Architecture employed by Intel's Data Management Platform (DMP). The DMP is a rack-centric, cluster design that employs an Ethernet-based fabric as its cluster interconnect. The default is a 3-stage Clos topology. The cluster's storage provides no redundancy and instead puts the burden on stateful micro-services to deal with their own redundancy requirements.

We will provide an overview of the DMP. Next, we'll drill into the details of the Storage subsystem, which is composed of Intel's RSD Pod Manager along with LINBIT's LINBIT storage orchestrator. In this section of the talk, we will include a performance characterization of the two volume types using FIO.

A DMP cluster is managed by Kubernetes with network and storage resources managed by Container Network and Storage Interface (CNI/CSI) providers. While DMP volumes provide no redundancy they are persistent and have a zone label attached to them. This use of the Kubernetes zone label concept is a key aspect of the DMP storage implementation as it ensures stateful micro-services being hosted on the platform are distributed across the cluster's fault domains. The stateful micro-service is then responsible for providing sufficient data redundancy to satisfy its availability and durability requirements.

(i) NVMe-over-Fabric (NVMe-oF) based Remote Logical Volumes Optimized for large Sequential I/O The DMP disaggregates physical storage devices from compute servers to allow storage capacity to scale independent of compute. The disaggregated storage devices are then pooled by an open-source, cluster-wide, volume manager called LINSTOR. LINBIT's framework is integrated with the cluster's k8s-based Orchestration/Scheduler function via LINBIT's Container Storage Interface (CSI) implementation. Logical volumes are provisioned from this pool and made available via NVMe-over-Fabric (NVMe-oF) to k8s-managed Pods running on the compute servers. These logical volumes are optimized for large sequential I/Os and are used to replace HDDs.

(ii) Local Logical Volumes Optimized for Optane DC Persistent Memory (DCPM) Compute servers in DMP are outfitted with Optane DCPM. These persistent DIMMs are also pooled by the LINBIT and made available with Kubernetes as logical volumes. In the case of Optane DCPM, LINBIT uses LVM to carve/provision logic volumes out of an NVDIMM Namespace.

After we review the Storage subsystem we will provide overviews of two workloads that are priorities for initial DMP deployments. The first of these is a Spark-based AI/Analytics Pipeline that uses Minio's s3-compatible object store as a replacement for HDFS. The second of these workloads is a MySQL/MariaDB transactional database on shared storage. To the best of our knowledge, this is the first open source transactional database that supports shared storage.

Finally, we'll conclude with an update on the status of the DMP effort, review preliminary performance results, and provide a few parting thoughts on the next steps for the DMP.

Speakers

David Cohen

Intel

Phillip Reisner

LINBIT

Tuesday February 26, 2019 1:55pm - 2:20pm EST
Independence Ballroom

Vault '19

2:00pm EST

Storage Gardening: Using a Virtualization Layer for Efficient Defragmentation in the WAFL File System

As a file system ages, it can experience multiple forms of fragmentation. Fragmentation of the free space in the file system can lower write performance and subsequent read performance. Client operations as well as internal operations, such as deduplication, can fragment the layout of an individual file, which also impacts file read performance. File systems that allow sub-block granular addressing can gather intra-block fragmentation, which leads to wasted free space. This paper describes how the NetApp® WAFL® file system leverages a storage virtualization layer for defragmentation techniques that physically relocate blocks efficiently, including those in read-only snapshots. The paper analyzes the effectiveness of these techniques at reducing fragmentation and improving overall performance across various storage media.

Speakers

Ram Kesavan

NetApp

Matthew Curtis-Maury

NetApp

Vinay Devadas

NetApp

Kesari Mishra

NetApp

Tuesday February 26, 2019 2:00pm - 2:30pm EST
Grand Ballroom

FAST '19

2:20pm EST

Stardust: Divide and Conquer in the Data Center Network

Building scalable data centers, and network devices that fit within these data centers, has become increasingly hard. With modern switches pushing at the boundary of manufacturing feasibility, being able to build suitable, and scalable network fabrics becomes of critical importance. We introduce Stardust, a fabric architecture for data center scale networks, inspired by network-switch systems. Stardust combines packet switches at the edge and disaggregated cell switches at the network fabric, using scheduled traffic. Stardust is a distributed solution that attends to the scale limitations of network-switch design, while also offering improved performance and power savings compared with traditional solutions. With ever-increasing networking requirements, Stardust predicts the elimination of packet switches, replaced by cell switches in the network, and smart network hardware at the hosts.

Speakers

Noa Zilberman

University of Cambridge

Gabi Bracha

Broadcom

Golan Schzukin

Broadcom

Tuesday February 26, 2019 2:20pm - 2:45pm EST
Constitution Ballroom

NSDI '19

2:20pm EST

scoutfs: Large Scale POSIX Archiving

scoutfs is an open source clustered POSIX file system built to support archiving of very large file sets. This talk will quickly summarize the challenges faced by sites that are managing large archives. We'll then explore the technical details of the persistent structures and network protocols that allow scoutfs to efficiently update and index file system metadata concurrently across a cluster. We'll see the interfaces that scoutfs provides on top of these mechanisms which allow management software to track the life cycle of billions of archived files.

Speakers

Zach Brown

Versity, Inc.

Zach Brown has been working on the Linux kernel for a while now and has most recently focused on file systems, particularly Lustre, OCFS2, and btrfs. He's also helped organize previous Linux storage workshops and has given talks at Linux conferences including OLS, LCA, and LinuxT... Read More →

Tuesday February 26, 2019 2:20pm - 2:45pm EST
Independence Ballroom

Blink: Fast Connectivity Recovery Entirely in the Data Plane

In this paper, we explore new possibilities, created by programmable switches, for fast rerouting upon signals triggered by Internet traffic disruptions. We present Blink, a data-driven system exploiting TCP-induced signals to detect failures. The key intuition behind Blink is that a TCP flow exhibits a predictable behavior upon disruption: retransmitting the same packet over and over, at epochs exponentially spaced in time. When compounded over multiple flows, this behavior creates a strong and characteristic failure signal. Blink efficiently analyzes TCP flows, at line rate, to: (i) select flows to track; (ii) reliably and quickly detect major traffic disruptions; and (iii) recover data-plane connectivity, via next-hops compatible with the operator’s policies.

We present an end-to-end implementation of Blink in P4 together with an extensive evaluation on real and synthetic traffic traces. Our results indicate that Blink: (i) can achieve sub-second rerouting for realistic Internet traffic; (ii) prevents unnecessary traffic shifts, in the presence of noise; and (iii) scales to protect large fractions of realistic Internet traffic, on existing hardware. We further show the feasibility of Blink by running our system on a real Tofino switch.

Speakers

Thomas Holterbach

University of California, Santa Cruz

Jeff LeFevre is an Assistant Adjunct Professor of Computer Science and Engineering at UC Santa Cruz where he does data management research and leads the Skyhook project within the Center for Research on Open Source Software (CROSS). He received his PhD from UC Santa Cruz with work... Read More →

Noah Watkins

Red Hat

Microsoft

Bill Ramsey

Microsoft

Raghu Ramakrishnan

Microsoft

Tuesday February 26, 2019 3:40pm - 4:05pm EST
Constitution Ballroom

NSDI '19

3:45pm EST

Deep Dive into Ceph Block Storage

Ceph's object storage system allows users to mount Ceph as a thin-provisioned block device known as RADOS block Device (RBD). This talk aims to delve deep into the RBD, its design and features. In this session, we will discuss:

What entails creating an RBD image—rbd data and metadata
Prominent features like Striping, Snapshots, and Cloning
How RBD is configured in a virtualized setup using libvirt/qemu

Speakers

Mahati Chamarthy

Intel

Mahati Chamarthy has been contributing to storage technologies for the past few years. She was a core developer for OpenStack Object Storage (Swift) and now an active contributor to Ceph. She works as a Cloud Software Engineer with Intel's Open Source Technology Center focusing on... Read More →

Tuesday February 26, 2019 3:45pm - 4:10pm EST
Independence Ballroom

Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure

Serverless computing is poised to fulfill the long-held promise of transparent elasticity and millisecond-level pricing. To achieve this goal, service providers impose a finegrained computational model where every function has a maximum duration, a fixed amount of memory and no persistent local storage. We observe that the fine-grained elasticity of serverless is key to achieve high utilization for general computations such as analytics workloads, but that resource limits make it challenging to implement such applications as they need to move large amounts of data between functions that don’t overlap in time. In this paper, we present Locus, a serverless analytics system that judiciously combines (1) cheap but slow storage with (2) fast but expensive storage, to achieve good performance while remaining cost-efficient. Locus applies a performance model to guide users in selecting the type and the amount of storage to achieve the desired cost-performance trade-off. We evaluate Locus on a number of analytics applications including TPC-DS, CloudSort, Big Data Benchmark and show that Locus can navigate the cost-performance trade-off, leading to 4×-500× performance improvements over slow storage-only baseline and reducing resource usage by up to 59% while achieving comparable performance on a cluster of virtual machines, and within 1.99× slower compared to Redshift.

Speakers

Qifan Pu

UC Berkeley

Shivaram Venkataraman

University of Wisconsin, Madison

Ion Stoica

UC Berkeley

Tuesday February 26, 2019 4:05pm - 4:30pm EST
Constitution Ballroom

NSDI '19

4:10pm EST

Mindcastle.io: Secure Distributed Block Device for Edge and Cloud

Camera-based smart IoT sensors are soon going to be everywhere. The recent success of Deep Neural Networks (DNNs) has opened the door to a new computer vision and AI applications. While initial deployments are using high-end server class hardware with expensive and power-hungry GPUs, optimizations and algorithmic improvements will soon make running the inference side of DNNs on low-cost Edge Computing devices commonplace. These devices will need software, and this software needs to be continually updated, both to keep track with the rapid development within machine learning/AI methods and datasets, and to keep their operating system and middleware installs tamper-proof and secure. To this end, we have been building Mindcastle, a serverless distributed block storage system with strong cryptographic integrity, built-in compression, and incremental atomic updates. Mindcastle is based on a highly performant and flash friendly LSM-like data structure, first developed at Bromium where it served as the storage foundation Bromium's Xen-derived uXen hypervisor, and has hosted millions of strongly isolated Micro-VMs across many security-sensitive installations worldwide.

Speakers

Jacob Gorm Hansen

Vertigo.ai

Jacob Gorm Hansen is the founder of Vertigo.ai, an AI startup that focuses on AI for Edge computing. Jacob has a long track record of innovative computer systems development and research. After cutting his teeth as a senior programmer on the Hitman games francise, he returned to academia... Read More →

Tuesday February 26, 2019 4:10pm - 4:35pm EST
Independence Ballroom

Microsoft

Tuesday February 26, 2019 4:30pm - 4:55pm EST
Constitution Ballroom

NSDI '19

4:30pm EST

Finesse: Fine-Grained Feature Locality based Fast Resemblance Detection for Post-Deduplication Delta Compression

In storage systems, delta compression is often used as a complementary data reduction technique for data deduplication because it is able to eliminate redundancy among the non-duplicate but highly similar chunks. Currently, what we call 'N-transform Super-Feature' (N-transform SF) is the most popular and widely used approach to computing data similarity for detecting delta compression candidates. But our observations suggest that the N-transform SFis compute-intensive: it needs to linearly transform each Rabin fingerprint of the data chunks N times to obtain N features, and can be simplified by exploiting the fine-grained feature locality existing among highly similar chunks to eliminate time-consuming linear transformations. Therefore, we propose Finesse, a fine-grained feature-locality-based fast resemblance detection approach that divides each chunk into several fixed-sized subchunks, computes features from these subchunks individually, and then groups the features into super-features. Experimental results show that, compared with the state-of-the-art N-transform SF approach, Finesse accelerates the similarity computation for resemblance detection by 3.2× ~ 3.5× and increases the final throughput of a deduplicated and delta compressed prototype system by 41% ~ 85%, while achieving comparable compression ratios.

Speakers

Yucheng Zhang

Hubei University of Technology

Wen Xia

Harbin Institute of Technology, Shenzhen & Peng Cheng Laboratory

Dan Feng

WNLO, School of Computer, Huazhong University of Science and Technology

Hong Jiang

University of Texas at Arlington

Yu Hua

WNLO, School of Computer, Huazhong University of Science and Technology

Qiang Wang

WNLO, School of Computer, Huazhong University of Science and Technology

Tuesday February 26, 2019 4:30pm - 5:00pm EST
Grand Ballroom

FAST '19

4:35pm EST

IO and cgroups, the Current and Future Work

Resource isolation when it comes to IO has been incomplete for years, making it very hard to build a completely isolated solution for containers in Linux. Recently with the development of blk iolatency this has started to change, and hopefully marks the start of being able to build systems with complete resource isolation.

Tuesday February 26, 2019 4:35pm - 5:00pm EST
Independence Ballroom

Vault '19

4:55pm EST

Short Break

Tuesday February 26, 2019 4:55pm - 5:10pm EST
Grand Ballroom Foyer

NSDI '19

5:00pm EST

Self-Encrypting Drive (SED) Standardization Proposal for NVDIMM-N Devices

A non-volatile DIMM (NVDIMM) is a Dual In-line Memory Module (DIMM) that maintains the contents of Synchronous Dynamic Random Access Memory (SDRAM) during power loss. An NVDIMM-N class of device can be integrated into a standard compute or storage platforms to provide non-volatility of the data in DIMM. NVDIMM relies on byte addressable energy backed function to preserve the data in case of power failure. A Byte Address Energy Backed Function is backed by a combination of SDRAM and non-volatile memory (e.g., NAND flash) on the NVDIMM-N. JESD245C Byte-Addressable Energy Backed Interface (BAEBI) defines the programming interface for NVDIMM-N class of devices.

An NVDIMM-N achieves non-volatility by:

performing a Catastrophic Save operation to copy SDRAM contents into NVM when host power is lost using an Energy Source managed by either the module or the host
performing a Restore operation to copy contents from the NVM to SDRAM when power is restored

An NVDIMM-N device may be a of self-encrypting device (SED) type that protects data at rest. This means the NVDIMM-N controller:

encrypts data during a Catastrophic Save operation
decrypts data during a Restore operation and the data is:
- plaintext while sitting in SDRAM
- ciphertext while sitting in NVM (e.g., flash memory)

Typically, an NVDIMM-N device may be used within the storage controller for performance acceleration against storage workloads or as a sundry storage to preserve debug information in case of power failure. When NVDIMM-N device is used as a caching layer, transient data is staged in NVDIMM-N device before the data is persisted/committed to the storage media. NVDIMM-N devices are also used as persistent storage media for staging memory dump files when critical failures occur at storage subsystem level before the system goes down.

The NVDIMM-N encryption standardization proposal involves cross-pollination between JEDEC (proposed BAEBI extensions to define security protocols in conjunction with encryption capability on the device) and TCG standards (proposed TCG Storage Interface Interactions Specifications content for handling self-encrypting NVDIMM-Ns plus adapting TCG Ruby SSC for NVDIMM-N devices) with industry sponsorship from HPE and NetApp.

The talk will begin with brief overview of NVDIMM-N device and associated storage-centric use cases followed by an overview of NVDIMM-N encryption scheme, and proposed self-encrypting device standardization approach for NVDIMM-N devices, which involves the following:

Extensions to BAEBI specification to accommodate security protocol definitions in consequence with encryption capability in NVDIMM-N devices
Extensions to TCG Storage Interface Specifications defining the Security Protocol Typed Block for handling interactions with NVDIMM-N devices
Adapting TCG Ruby SSC standard for accommodating NVDIMM-N class devices

The talk will conclude by summarizing current state of the standardization proposal and approval process with JEDEC and TCG WG's.

Speakers

Frederick Knight

NetApp

Frederick Knight is a Principal Standards Technologist at NetApp Inc. Fred has over 40 years of experience in the computer and storage industry. He currently represents NetApp in several National and International Storage Standards bodies and industry associations, including T10 (SCSI... Read More →

Sridhar Balasubramanian

NetApp

Sridhar Balasubramanian is a Principal Security Architect within Product Security Group @ NetApp RTP. With over 25 years in the software industry, Sridhar is inventor/co-inventor for 16 US Patents and published 5 Conference papers till date. Sridhar's area of expertise includes Storage... Read More →

Tuesday February 26, 2019 5:00pm - 5:25pm EST
Independence Ballroom

Vault '19

5:00pm EST

Sliding Look-Back Window Assisted Data Chunk Rewriting for Improving Deduplication Restore Performance

Data deduplication is an effective way of improving storage space utilization. The data generated by deduplication is persistently stored in data chunks or data containers (a container consisting of a few hundreds or thousands of data chunks). The data restore process is rather slow due to data fragmentation and read amplification. To speed up the restore process, data chunk rewrite (a rewrite is to store a duplicate data chunk) schemes have been proposed to effectively improve data chunk locality and reduce the number of container reads for restoring the original data. However, rewrites will decrease the deduplication ratio since more storage space is used to store the duplicate data chunks.

To remedy this, we focus on reducing the data fragmentation and read amplification of container-based deduplication systems. We first propose a flexible container referenced count based rewrite scheme, which can make a better tradeoff between the deduplication ratio and the number of required container reads than that of capping which is an existing rewrite scheme. To further improve the rewrite candidate selection accuracy, we propose a sliding look-back window based design, which can make more accurate rewrite decisions by considering the caching effect, data chunk localities, and data chunk closeness in the current and future windows. According to our evaluation, our proposed approach can always achieve a higher restore performance than that of capping especially when the reduction of deduplication ratio is small.

NSDI '19

5:25pm EST

Dinner (on your own)

Tuesday February 26, 2019 5:25pm - 7:00pm EST
N/A

Vault '19

5:35pm EST

Understanding Lifecycle Management Complexity of Datacenter Topologies

Most recent datacenter topology designs have focused on performance properties such as latency and throughput. In this paper, we explore a new dimension, life cycle management, which attempts to capture operational costs of topologies. Specifically, we consider costs associated with deployment and expansion of topologies and explore how structural properties of two different topology families (Clos and expander graphs as exemplified by Xpander) affect these. We also develop a new topology that has the wiring simplicity of Clos and the expandability of expander graphs using the insights from our study.

Speakers

Mingyang Zhang

University of Southern California

Radhika Niranjan Mysore

VMware Research

Sucha Supittayapornpong

University of Southern California

Ramesh Govindan

University of Southern California

Tuesday February 26, 2019 5:35pm - 6:00pm EST
Constitution Ballroom

NSDI '19

6:00pm EST

Shoal: A Network Architecture for Disaggregated Racks

Disaggregated racks comprise a dense cluster of separate pools of compute, memory and storage blades, all inter-connected through an internal network within a single rack. However, their density poses a unique challenge for the rack’s network: it needs to connect an order of magnitude more nodes than today’s racks without exceeding the rack’s fixed power budget and without compromising on performance. We present Shoal, a power-efficient yet performant intra-rack network fabric built using fast circuit switches. Such switches consume less power as they have no buffers and no packet inspection mechanism, yet can be reconfigured in nanoseconds. Rack nodes transmit according to a static schedule such that there is no in-network contention without requiring a centralized controller. Shoal’s congestion control leverages the physical fabric to achieve fairness and both bounded worst-case network throughput and queuing. We use an FPGA-based prototype, testbed experiments, and simulations to illustrate Shoal’s mechanisms are practical, and can simultaneously achieve high density and high performance: 71% lower power and comparable or higher performance than today’s network designs.

Speakers

Vishal Shrivastav

Cornell University

Asaf Valadarsky

Hebrew University of Jerusalem

Hitesh Ballani

Microsoft Research

Paolo Costa

Microsoft Research

Ki Suh Lee

Waltz Networks

Han Wang

Barefoot Networks

Rachit Agarwal

Cornell University

Hakim Weatherspoon

Cornell University

Tuesday February 26, 2019 6:00pm - 6:25pm EST
Constitution Ballroom

NSDI '19

6:00pm EST

Poster Session and Reception

Check out the cool new ideas and the latest preliminary research on display at the Poster Session and Reception. Take part in discussions with your colleagues over complimentary food and drinks. View the complete list of accepted posters.

Tuesday February 26, 2019 6:00pm - 7:30pm EST
Back Bay Ballroom

FAST '19

7:00pm EST

NVM and Related Fancy Fancy BoF

SMB/NFS/NMLOP BoF

Speakers

Ric Wheeler

Facebook

Tuesday February 26, 2019 8:30pm - 10:00pm EST
Independence Ballroom

Vault '19

9:30pm EST

Board Game Night

UC Berkeley

Wednesday February 27, 2019 9:00am - 9:30am EST
Grand Ballroom

GearDB: A GC-free Key-Value Store on HM-SMR Drives with Gear Compaction

Host-managed shingled magnetic recording drives (HM-SMR) give a capacity advantage to harness the explosive growth of data. Applications where data is sequentially written and randomly read make the HM-SMR an ideal solution due to its capacity, predictable performance, and economical cost. Key-value stores based on the Log-Structured Merge Trees (LSM-trees) data structure is such a good fit due to their batched sequential writes. However, building an LSM-tree based KV store on HM-SMR drives presents severe challenges in maintaining the performance and space efficiency due to the redundant cleaning processes for applications and storage devices (i.e., compaction and garbage collections). To eliminate the overhead of on-disk garbage collections (GC) and improve compaction efficiency, this paper presents GearDB, a GC-free KV store tailored for HM-SMR drives, with three new techniques: a new on-disk data layout, compaction windows, and a novel gear compaction algorithm. We implement GearDB and evaluate it with LevelDB on a real HM-SMR drive. Our extensive experiments have shown that GearDB achieves good performance and space efficiency, i.e., on average $1.71\times$ faster than LevelDB in random write with a space efficiency of 89.9%.

Speakers

Ting Yao

Huazhong University of Science and Technology and Temple University

Jiguang Wan

Huazhong University of Science and Technology

Ping Huang

Temple University

Yiwen Zhang

Huazhong University of Science and Technology

Zhiwen Liu

Huazhong University of Science and Technology

Changsheng Xie

Huazhong University of Science and Technology

Xubin He

Temple University

Wednesday February 27, 2019 9:30am - 10:00am EST
Grand Ballroom

FAST '19

9:45am EST

SweepSense: Sensing 5 GHz in 5 Milliseconds with Low-cost SDRs

Wireless transmissions occur intermittently across the entire spectrum. For example, WiFi and Bluetooth devices transmit frames across the 100 MHz-wide 2.4 GHz band, and LTE devices transmit frames between 700 MHz and 3.7 GHz). Today, only high-cost radios can sense across the spectrum with sufficient temporal resolution to observe these individual transmissions.

We present “SweepSense”, a low-cost radio architecture that senses the entire spectrum with high-temporal resolution by rapidly sweeping across it. Sweeping introduces new challenges for spectrum sensing: SweepSense radios only capture a small number of distorted samples of transmissions. To overcome this challenge, we correct the distortion with self-generated calibration data, and classify the protocol that originated each transmission with only a fraction of the transmission’s samples. We demonstrate that SweepSense can accurately identify four protocols transmitting simultaneously in the 2.4 GHz unlicensed band. We also demonstrate that it can simultaneously monitor the load of several LTE base stations operating in disjoint bands.

Speakers

Yeswanth Guddeti

UC San Diego

Raghav Subbaraman

IIT Madras

Moein Khazraee

UC San Diego

Aaron Schulman

UC San Diego

Dinesh Bharadia

UC San Diego

Wednesday February 27, 2019 9:45am - 10:10am EST
Constitution Ballroom

NSDI '19

10:00am EST

SPEICHER: Securing LSM-based Key-Value Stores using Shielded Execution

We introduce Speicher, a secure storage system that not only provides strong confidentiality and integrity properties, but also ensures data freshness to protect against rollback/forking attacks. Speicher exports a Key-Value (KV) interface backed by Log-Structured Merge Tree (LSM) for supporting secure data storage and query operations. Speicher enforces these security properties on an untrusted host by leveraging shielded execution based on a hardware-assisted trusted execution environment (TEE)—specifically, Intel SGX. However, the design of Speicher extends the trust in shielded execution beyond the secure SGX enclave memory region to ensure that the security properties are also preserved in the stateful (or non-volatile) setting of an untrusted storage medium, including system crash, reboot, or migration.

More specifically, we have designed an authenticated and confidentiality-preserving LSM data structure. We have further hardened the LSM data structure to ensure data freshness by designing asynchronous trusted counters. Lastly, we designed a direct I/O library for shielded execution based on Intel SPDK to overcome the I/O bottlenecks in the SGX enclave. We have implemented Speicher as a fully-functional storage system by extending RocksDB, and evaluated its performance using the RocksDB benchmark. Our experimental evaluation shows that Speicher incurs reasonable overheads for providing strong security guarantees, while keeping the trusted computing base (TCB) small.

Speakers

Maurice Bailleu

Break with Refreshments

Wednesday February 27, 2019 10:30am - 11:00am EST
Grand Ballroom Foyer

University of Washington

Wednesday February 27, 2019 10:40am - 11:05am EST
Constitution Ballroom

Shinjuku: Preemptive Scheduling for μsecond-scale Tail Latency

The recently proposed dataplanes for microsecond scale applications, such as IX and ZygOS, use non-preemptive policies to schedule requests to cores. For the many real-world scenarios where request service times follow distributions with high dispersion or a heavy tail, they allow short requests to be blocked behind long requests, which leads to poor tail latency.

Shinjuku is a single-address space operating system that uses hardware support for virtualization to make preemption practical at the microsecond scale. This allows Shinjuku to implement centralized scheduling policies that preempt requests as often as every 5µsec and work well for both light and heavy tailed request service time distributions. We demonstrate that Shinjuku provides significant tail latency and throughput improvements over IX and ZygOS for a wide range of workload scenarios. For the case of a RocksDB server processing both point and range queries, Shinjuku achieves up to 6.6× higher throughput and 88% lower tail latency.

Qatar Computing Research institute, HBKU

Xiyang Wang

National Supercomputing Center in Wuxi

Tianyu Zhang

Shandong University, National Supercomputing Center in Wuxi

Xiupeng Zhu

Shandong University, National Supercomputing Center in Wuxi

Nosayba El-Sayed

Emory University

Haidong Lan

Shandong University

Yibo Yang

Shandong Unversity

Jidong Zhai

Tsinghua University

Weiguo Liu

Shandong University, National Supercomputing Center in Wuxi

Wei Xue

Tsinghua University, National Supercomputing Center in Wuxi

Wednesday February 27, 2019 1:30pm - 1:55pm EST
Constitution Ballroom

NSDI '19

1:55pm EST

Zeno: Diagnosing Performance Problems with Temporal Provenance

When diagnosing a problem in a distributed system, it is sometimes necessary to explain the timing of an event—for instance, why a response has been delayed, or why the network latency is high. Existing tools o er some support for this, typically by tracing the problem to a bottleneck or to an overloaded server. However, locating the bottleneck is merely the first step: the real problem may be some other service that is sending traffic over the bottleneck link, or a misbehaving machine that is overloading the server with requests. These off-path causes do not appear in a conventional trace and will thus be missed by most existing diagnostic tools.

In this paper, we introduce a new concept we call temporal provenance that can help with diagnosing timing-related problems. Temporal provenance is inspired by earlier work on provenance-based network debugging; however, in addition to the functional problems that can already be handled with classical provenance, it can also diagnose problems that are related to timing. We present an algorithm for generating temporal provenance and an experimental debugger called Zeno; our experimental evaluation shows that Zeno can successfully diagnose several realistic performance bugs.

Speakers

Yang Wu

Facebook

Ang Chen

Rice University

Linh Thi Xuan Phan

University of Pennsylvania

Wednesday February 27, 2019 1:55pm - 2:20pm EST
Constitution Ballroom

IIIT Delhi

Wednesday February 27, 2019 2:00pm - 2:30pm EST
Grand Ballroom

FAST '19

2:20pm EST

Confluo: Distributed Monitoring and Diagnosis Stack for High-speed Networks

Confluo is an end-host stack that can be integrated with existing network management tools to enable monitoring and diagnosis of network-wide events using telemetry data distributed across end-hosts, even for high-speed networks. Confluo achieves these properties using a new data structure—Atomic MultiLog—that supports highly-concurrent read-write operations by exploiting two properties specific to telemetry data: (1) once processed by the stack, the data is neither updated nor deleted; and (2) each field in the data has a fixed pre-defined size. Our evaluation results show that, for packet sizes 128B or larger, Confluo executes thousands of triggers and tens of filters at line rate (for 10Gbps links) using a single core.

Speakers

Anurag Khandelwal

UC Berkeley

Rachit Agarwal

Cornell University

Ion Stoica

UC Berkeley

Wednesday February 27, 2019 2:20pm - 2:45pm EST
Constitution Ballroom

NSDI '19

2:30pm EST

GraphOne: A Data Store for Real-time Analytics on Evolving Graphs

There is a growing need to perform real-time analytics on evolving graphs in order to deliver the values of big data to users. The key requirement from such applications is to have a data store to support their diverse data access efficiently, while concurrently ingesting fine-grained updates at a high velocity. Unfortunately, current graph systems, either graph databases or analytics engines, are not designed to achieve high performance for both operations. To address this challenge, we have designed and developed GraphOne, a graph data store that combines two complementary graph storage formats (edge list and adjacency list), and uses dual versioning to decouple graph computations from updates. Importantly, it presents a new data abstraction, GraphView, to enable data access at two different granularities with only a small data duplication. Experimental results show that GraphOne achieves an ingestion rate of two to three orders of magnitude higher than graph databases, while delivering algorithmic performance comparable to a static graph system. GraphOne is able to deliver 5.36x higher update rate and over 3x better analytics performance compared to a state-of-the-art dynamic graph system.

Speakers

Pradeep Kumar

George Washington University

H. Howie Huang

George Washington University

Wednesday February 27, 2019 2:30pm - 3:00pm EST
Grand Ballroom

FAST '19

2:45pm EST

DETER: Deterministic TCP Replay For Performance Diagnosis

TCP performance problems are notoriously tricky to diagnose because a subtle choice of TCP parameters or features may lead to completely different performance. A gold standard for diagnosis is to collect packet traces and trace TCP executions. However, it is not easy to use such tools in large-scale data centers where many TCP connections interact with each other. In this paper, we introduce DETER, a deterministic TCP replay tool, which runs lightweight recording all the time at all the hosts and then replay selected collections where operators can collect packet traces and trace TCP executions for diagnosis. The key challenge for deterministic TCP replay is the butterfly effect---a small timing variation causes a chain reaction between TCP and the network that drives the system to a completely different state in the replay. To eliminate the butterfly effect, we propose to replay individual TCP connection separately and capture all the interactions between a connection with the applications and the network. Our evaluation shows that \system has low recording overhead and can help diagnose many TCP performance problems such as long latency related to zero-window probes, late fast retransmission, frequent retransmission timeout, to problems related to the switch shared buffer.

Speakers

Yuliang Li

Harvard University

Rui Miao

Alibaba Group

Mohammad Alizadeh

Massachusetts Institute of Technology

Minlan Yu

Harvard University

Wednesday February 27, 2019 2:45pm - 3:10pm EST
Constitution Ballroom

NSDI '19

3:00pm EST

Automatic, Application-Aware I/O Forwarding Resource Allocation

The I/O forwarding architecture is widely adopted on modern supercomputers, with a layer of intermediate nodes sitting between the many compute nodes and backend storage nodes. This allows compute nodes to run more efficiently and stably with a leaner OS, offloads I/O coordination and communication with backend from the compute nodes, maintains less concurrent connections to storage systems, and provides additional resources for effective caching, prefetching, write buffering, and I/O aggregation. However, with many existing machines, these forwarding nodes are assigned to serve fixed set of compute nodes.

We explore an automatic mechanism, DFRA, for application-adaptive dynamic forwarding resource allocation. With I/O monitoring data that proves affordable to acquire in real time and maintain for long-term history analysis, Upon each job's dispatch, DFRA conducts a history-based study to determine whether the job should be granted more forwarding resources or given dedicated forwarding nodes. Such customized I/O forwarding lets the small fraction of I/O-intensive applications achieve higher I/O performance and scalability, meanwhile effectively isolating disruptive I/O activities. We implemented, evaluated, and deployed DFRA on Sunway TaihuLight, the current No.2 supercomputer in the world. It improves applications' I/O performance by up to 16.0x, eliminates most of the inter-application I/O interference, and has saved over 200 million of core-hours during its deployment on TaihuLight for past 8 months. Finally, our proposed DFRA design is not platform-dependent, making it applicable to the management of existing and future I/O forwarding or burst buffer resources.

Speakers

Xu Ji

Tsinghua University, National Supercomputing Center in Wuxi

Bin Yang

Shandong University, National Supercomputing Center in Wuxi

Tianyu Zhang

Shandong University, National Supercomputing Center in Wuxi

Xiaosong Ma

Qatar Computing Research institute, HBKU

Xiupeng Zhu

Shandong University, National Supercomputing Center in Wuxi

Xiyang Wang

National Supercomputing Center in Wuxi

Nosayba El-Sayed

Emory University

Jidong Zhai

Tsinghua University

Weiguo Liu

Shandong University, National Supercomputing Center in Wuxi

Wei Xue

Tsinghua University, National Supercomputing Center in Wuxi

Wednesday February 27, 2019 3:00pm - 3:30pm EST
Grand Ballroom

FAST '19

3:10pm EST

Break with Refreshments

Wednesday February 27, 2019 3:10pm - 3:40pm EST
Grand Ballroom Foyer

NSDI '19

3:30pm EST

Break with Refreshments

Wednesday February 27, 2019 3:30pm - 4:00pm EST
Grand Ballroom Foyer

FAST '19

3:40pm EST

JANUS: Fast and Flexible Deep Learning via Symbolic Graph Execution of Imperative Programs

The rapid evolution of deep neural networks is demanding deep learning (DL) frameworks not only to satisfy the requirement of quickly executing large computations, but also to support straightforward programming models for quickly implementing and experimenting with complex network structures. However, existing frameworks fail to excel in both departments simultaneously, leading to diverged efforts for optimizing performance and improving usability.

This paper presents JANUS, a system that combines the advantages from both sides by transparently converting an imperative DL program written in Python, the de-facto scripting language for DL, into an efficiently executable symbolic dataflow graph. JANUS can convert various dynamic features of Python, including dynamic control flow, dynamic types, and impure functions, into the symbolic graph operations. Experiments demonstrate that JANUS can achieve fast DL training by exploiting the techniques imposed by symbolic graph-based DL frameworks, while maintaining the simple and flexible programmability of imperative DL frameworks at the same time.

Speakers

Eunji Jeong

Seoul National University

Sungwoo Cho

Seoul National University

Gyeong-In Yu

Seoul National University

Joo Seong Jeong

Seoul National University

Dong-Jin Shin

Seoul National University

Byung-Gon Chun

Seoul National University

Wednesday February 27, 2019 3:40pm - 4:05pm EST
Constitution Ballroom

NSDI '19

4:00pm EST

Work-in-Progress Reports (WiPs)

View the list of accepted Work-in-Progress Reports.

Wednesday February 27, 2019 4:00pm - 5:30pm EST
Grand Ballroom

Tiresias: A GPU Cluster Manager for Distributed Deep Learning

Distributed training of deep learning (DL) models on GPU clusters is becoming increasingly more popular. Existing cluster managers face some unique challenges from DL training jobs, such as unpredictable training times, an all-or- nothing execution model, and inflexibility in GPU sharing. Our analysis of a large GPU cluster in production shows that existing big data schedulers – coupled with consolidated job placement constraint, whereby GPUs for the same job must be allocated in as few machines as possible – cause long queueing delays and low overall performance.

We present Tiresias, a GPU cluster resource manager tailored for distributed DL training jobs, which efficiently schedules and places DL jobs to reduce their job completion times (JCT). Given that a DL job’s execution time is often unpredictable, we propose two scheduling algorithms – Discretized Two-Dimensional Gittins Index relies on partial information and Discretized Two-Dimensional LAS is information-agnostic – that aim to minimize the average JCT. Additionally, we describe when the consolidated placement constraint can be relaxed and present a placement algorithm to leverage these observations without any user input. Experiments on a cluster with 60 P100 GPUs – and large-scale trace-driven simulations – show that Tiresias improves the average JCT by up to 5.5× over an Apache YARN-based resource manager used in production. More importantly, Tiresias’s performance is comparable to that of solutions assuming perfect knowledge.

Speakers

Juncheng Gu

University of Michigan, Ann Arbor

Mosharaf Chowdhury

University of Michigan, Ann Arbor

Kang G. Shin

University of Michigan, Ann Arbor

Yibo Zhu

Microsoft and ByteDance

Myeongjae Jeon

Microsoft and UNIST

Junjie Qian

Microsoft

Hongqiang Liu

Alibaba

Chuanxiong Guo

Bytedance

Wednesday February 27, 2019 4:30pm - 4:55pm EST
Constitution Ballroom

NSDI '19

4:55pm EST

Short Break

Wednesday February 27, 2019 4:55pm - 5:10pm EST
Grand Ballroom Foyer

NSDI '19

5:10pm EST

Correctness and Performance for Stateful Chained Network Functions

Network functions virtualization (NFV) allows operators to employ NF chains to realize custom policies, and dynamically add instances to meet demand or for failover. NFs maintain detailed per- and cross-flow state which needs careful management, especially during dynamic actions. Crucially, state management must: (1) ensure NF chain-wide correctness and (2) have good performance. To this end, we built CHC, an NFV framework that leverages an external state store coupled with state management algorithms and metadata maintenance for correct operation even under a range of failures. Our evaluation shows that CHC can support ~10Gbps per-NF throughput and $<0.6μs increase in median per-NF packet processing latency, and chain-wide correctness at little additional cost.

Speakers

Junaid Khalid

University of Wisconsin - Madison

Aditya Akella

UW-Madison

Wednesday February 27, 2019 5:10pm - 5:35pm EST
Constitution Ballroom

NSDI '19

5:35pm EST

Performance contracts for software network functions

While software network functions (NFs) promise great flexibility and easy deployment of network services, they face the challenge of unpredictable performance. We propose Bolt, a technique and tool for predicting the performance of the entire software stack of an NF comprising the core NF logic, DPDK packet processing framework, and the NIC driver. Bolt takes as input the NF implementation and generates a performance contract that provides, for any arbitrary packet scenario, a precise characterization of the NF's performance. Under the covers, Bolt leverages a state-based demarcation of NFs and combines a pre-analysis of stateful data structures with automated symbolic execution of the stateless NF code. Performance contracts allow scrutiny of NF performance with a fine level of granularity, enabling network developers and operators to understand the performance of the NF in the face of any workload, whether typical, exceptional, or adversarial. We evaluate Bolt on four realistic NFs---a NAT, a Maglev-like load balancer, an LPM Router, and a MAC bridge---and show that Bolt's performance contracts predict the dynamic instruction count and memory accesses of the NF to within a maximum of 7% of real executions, for all NFs and traffic classes analyzed.

NEC Laboratories Europe

Giuseppe Bianchi

CNIT/University of Rome Tor Vergata

Wednesday February 27, 2019 6:00pm - 6:25pm EST
Constitution Ballroom

NSDI '19

6:30pm EST

Poster Session and Reception

Check out the cool new ideas and the latest preliminary research on display at the Poster Session and Reception. Enjoy dinner, drinks, and the chance to connect with other attendees, speakers, and conference organizers. View the complete list of accepted posters.

Wednesday February 27, 2019 6:30pm - 8:00pm EST
Back Bay Ballroom

NSDI '19

7:30am EST

Continental Breakfast

Thursday February 28, 2019 7:30am - 8:30am EST
Grand Ballroom Foyer

NSDI '19

8:00am EST

Continental Breakfast

Thursday February 28, 2019 8:00am - 9:00am EST
Grand Ballroom Foyer

FAST '19

8:30am EST

SIMON: A Simple and Scalable Method for Sensing, Inference and Measurement in Data Center Networks

Network measurement and monitoring have been key to understanding the inner workings of computer networks and debugging the performance problems of distributed applications. Despite many products and much research on these topics, in the context of data centers, performing accurate measurement at scale in near real-time has remained elusive. On one hand, switch-based telemetry can give accurate per-packet views, but these must be assembled across the network and across packets to get network- and application-level insight: this is not scalable. On the other hand, purely end-host-based measurement is naturally scalable but so far has only provided partial views of in-network operation.

In this paper, we set out to push the boundary of edge-based measurement by scalably and accurately reconstructing the full queueing dynamics in the network with data gathered entirely at the transmit and receive network interface cards (NICs). We begin with a Signal Processing framework for quantifying a key trade-off: reconstruction accuracy versus the amount of data gathered. Based on this, we propose SIMON, an accurate and scalable measurement system for data centers that reconstructs key network state variables like packet queuing times at switches, link utilizations, and queue and link compositions at the flow-level. We then demonstrate that the function approximation capability of multi-layered neural networks can speed up SIMON by a factor of 5,000--10,000, enabling it to run in near real-time. We deployed SIMON in three testbeds with different link speeds, layers of switching and number of servers; evaluations with NetFPGAs and a cross-validation technique show that SIMON reconstructs queue-lengths to within 3-5 KBs and link utilizations to less than 1% of actual. The accuracy and speed of SIMON enables sensitive A/B tests, which greatly aids the real-time development of algorithms, protocols, network software and applications.

Speakers

Yilong Geng

Dankook University

Sang Lyul Min

Seoul National University

Thursday February 28, 2019 9:00am - 9:30am EST
Grand Ballroom

FAST '19

9:20am EST

Stable and Practical AS Relationship Inference with ProbLink

Knowledge of the business relationships between Autonomous Systems (ASes) is essential to understanding the behavior of the Internet routing system. Despite significant progress in the development of sophisticated relationship inference algorithms, the resulting datasets are impractical for many critical real-world applications, cannot offer adequate predictability in the configuration of routing policies, and suffer from inference oscillations. To achieve more practical and stable relationship inferences we first illuminate the root causes of the contradictions between these shortcomings and the near-perfect validation results of AS-Rank, the state-of-the-art relationship inference algorithm. Using a "naive" inference approach as a benchmark, we find that the available validation datasets over-represent AS links with easier inference requirements. We identify which types of links are harder to infer, and we develop appropriate validation subsets to enable more representative evaluation.

We then develop a probabilistic algorithm, ProbLink, to overcome the inference barriers for hard links, such as non-valley-free routing, limited visibility, and non-conventional peering practices. To this end, we identify key interconnection features that provide stochastically informative and highly predictive relationship inference signals. Compared to AS-Rank, our approach reduces the error rate for all links by 1.6$\times$, and importantly, by up to 6.1$\times$ for different types of hard links. We demonstrate the practical significance of our improvements by evaluating their impact on three applications. Compared to the current state-of-the-art, ProbLink increases the precision and recall of route leak detection by 4.1$\times$ and 3.4$\times$ respectively, reveals 27% more complex relationships, and increases the precision of predicting the impact of selective advertisements by 34%.

Speakers

Yuchen Jin

University of Washington

Colin Scott

UC Berkeley

Amogh Dhamdhere

CAIDA

Vasileios Giotsas

Lancaster University

Arvind Krishnamurthy

University of Washington

Scott Shenker

UC Berkeley, ICSI

Thursday February 28, 2019 9:20am - 9:45am EST
Constitution Ballroom

NSDI '19

9:30am EST

Fully Automatic Stream Management for Multi-Streamed SSDs Using Program Contexts

Multi-streamed SSDs can significantly improve both the performance and lifetime of flash-based SSDs when their streams are properly managed. However, existing stream management solutions do not adequately support the multi-streamed SSDs for their wide adoption. No existing stream management technique works in a fully automatic fashion for general I/O workloads. Furthermore, the limited number of available streams makes it difficult to effectively manage streams when a large number of streams are required. In this paper, we propose a fully automatic stream management technique, PCStream, which can work efficiently for general I/O workloads with heterogeneous write characteristics. PCStream is based on the key insight that stream allocation decisions should be made on dominant I/O activities. By identifying dominant I/O activities using program contexts, PCStream fully automates the whole process of stream allocation within the kernel with no manual work. In order to overcome the limited number of supported streams, we propose a new type of streams, internal streams, which can be implemented at low cost. PCStream can effectively double the number of available streams using internal streams. Our evaluations on real multi-streamed SSDs show that PCStream achieves the same efficiency as highly-optimized manual allocations by experienced programmers. PCStream improves IOPS by up to 56% over the existing automatic technique by reducing the garbage collection overhead by up to 69%.

The Pennsylvania State University

Thursday February 28, 2019 10:00am - 10:30am EST
Grand Ballroom

FAST '19

10:10am EST

Break with Refreshments

Thursday February 28, 2019 10:10am - 10:40am EST
Grand Ballroom Foyer

NSDI '19

10:30am EST

Break with Refreshments

Thursday February 28, 2019 10:30am - 11:00am EST
Grand Ballroom Foyer

FAST '19

10:40am EST

Riverbed: Enforcing User-defined Privacy Constraints in Distributed Web Services

Riverbed is a new framework for building privacy-respecting web services. Using a simple policy language, users define restrictions on how a remote service can process and store sensitive data. A transparent Riverbed proxy sits between a user's front-end client (e.g., a web browser) and the back-end server code. The back-end code remotely attests to the proxy, demonstrating that the code respects user policies; in particular, the server code attests that it executes within a Riverbed-compatible managed runtime that uses IFC to enforce user policies. If attestation succeeds, the proxy releases the user's data, tagging it with the user-defined policies. On the server-side, the Riverbed runtime places all data with compatible policies into the same universe (i.e., the same isolated instance of the full web service). The universe mechanism allows Riverbed to work with unmodified, legacy software; unlike prior IFC systems, Riverbed does not require developers to reason about security lattices, or manually annotate code with labels. Riverbed imposes only modest performance overheads, with worst-case slowdowns of 10% for several real applications.

Speakers

Frank Wang

MIT CSAIL

Ronny Ko

Harvard University

James Mickens

Harvard University

Thursday February 28, 2019 10:40am - 11:05am EST
Constitution Ballroom

NSDI '19

11:00am EST

Fast Erasure Coding for Data Storage: A Comprehensive Study of the Acceleration Techniques

Various techniques have been proposed in the literature to improve erasure code computation efficiency, including optimizing bitmatrix design, optimizing computation schedule, common XOR operation reduction, caching management techniques, and vectorization techniques. These techniques were largely proposed individually previously, and in this work, we seek to use them jointly. In order to accomplish this task, these techniques need to be thoroughly evaluated individually, and their relation better understood. Building on extensive test results, we develop methods to systematically optimize the computation chain together with the underlying bitmatrix. This led to a simple design approach of optimizing the bitmatrix by minimizing a weighted cost function, and also a straightforward erasure coding procedure: use the given bitmatrix to produce the computation schedule, which utilizes both the XOR reduction and caching management techniques, and apply XOR level vectorization. This procedure can provide a better performance than most existing techniques, and even compete against well-known codes such as EVENODD, RDP, and STAR codes. Moreover, the result suggests that vectorizing the XOR operation is a better choice than directly vectorizing finite field operations, not only because of the better encoding throughput, but also its minimal migration efforts onto newer CPUs.

Speakers

Chao Tian

Texas A&M University

Tianli Zhou

Texas A&M University

Thursday February 28, 2019 11:00am - 11:30am EST
Grand Ballroom

Huazhong University of Science and Technology

Thursday February 28, 2019 11:30am - 12:00pm EST
Grand Ballroom

Dataplane equivalence and its applications

Network verification promises to find rare bugs in networks, but using it requires that administrators (completely) characterize the expected behavior of the network in formal languages such as Datalog or CTL. The difficulty of achieving this task hampers deployment of verification widely. We propose to use equivalence between different network dataplanes as an implicit, simpler way to specify the required correctness properties. While equivalence is a well- known undecidable problem for general-purpose programs, we show that for network dataplanes without infinite loops it is decidable and can be checked efficiently. We present netdiff, an algorithm that checks the equivalence of two network dataplanes implemented in the SEFL language by using symbolic execution [11]. We implement netdiff and use it to catch a variety of bugs in Openstack Neutron, P4 programs and network dataplane updates. Our evaluation highlights that equivalence is an easy way to find bugs, scales well to relatively large programs and discovers subtle issues otherwise difficult to find.

Speakers

Dragos Dumitrescu

University Politehnica of Bucharest

Radu Stoenescu

University Politehnica of Bucharest

Matei Popovici

University Politehnica of Bucharest

Lorina Negreanu

University Politehnica of Bucharest

Costin Raiciu

University Politehnica of Bucharest

Thursday February 28, 2019 1:50pm - 2:15pm EST
Constitution Ballroom

NSDI '19

2:15pm EST

Alembic: Automated Model Inference for Stateful Network Functions

Network operators today deploy a wide range of complex stateful network functions (NFs). They typically only have access to the NFs’ binary executables, configuration interfaces, and manuals from vendors. To ensure correct behavior of NFs, operators use network testing and verification tools, which typically rely on models of the deployed NFs. The effectiveness of these tools depends upon the fidelity of such models. Today, models are handwritten, which can be error prone, tedious, and does not account for implementation-specific artifacts. To address this gap, our goal is to automatically infer behavioral models of stateful NFs for a given configuration. The problem is challenging because NF configurations can contain diverse rule types and the space of dynamic and stateful NF behaviors is large. In this work, we present Alembic, which synthesizes NF models viewed as an ensemble of finite-state machines (FSMs). Alembic consists of an offline stage that learns symbolic FSM representations for each NF rule type and a fast online stage that generates a concrete behavioral model for a given configuration using these symbolic FSMs. We demonstrate that Alembic is accurate, scalable and sheds light on subtle differences across NF implementations.

Speakers

Soo-Jin Moon

Carnegie Mellon University

Jeffrey Helt

Princeton University

Yifei Yuan

Intentionet

Yves Bieri

ETH Zurich

Sujata Banerjee

VMware Research

Vyas Sekar

Carnegie Mellon University

Wenfei Wu

Tsinghua University

Mihalis Yannakakis

Columbia University

Ying Zhang

Facebook, Inc.

Thursday February 28, 2019 2:15pm - 2:40pm EST
Constitution Ballroom

NSDI '19

2:40pm EST

Model-Agnostic and Efficient Exploration of Numerical State Space of Real-World TCP Congestion Control Implementations

The significant impact of TCP congestion control on the Internet highlights the importance of testing the correctness and performance of congestion control algorithm implementations (CCAIs) in various network environments. Many CCAI testing questions can be answered by exploring the numerical state space of CCAIs, which is defined by a group of numerical (and nonnumerical) state variables of the CCAIs. However, the current practices for automated numerical state space exploration are either limited by the approximate abstract CCAI models or inefficient due to the large space of network environment parameters and the complicated relation between the CCAI states and network environment parameters. In this paper, we propose an automated numerical state space exploration method, called ACT, which leverages the model-agnostic feature of random testing and greatly improves its efficiency by guiding random testing under the feedback iteratively obtained in a test. Our experiments on five representative Linux TCP CCAIs show that ACT can more efficiently explore a large numerical state space than manual testing, undirected random testing, and symbolic execution based testing, while without requiring any abstract CCAI models. ACT successfully detects multiple implementation bugs and design issues of these Linux TCP CCAIs, including some new bugs and issues not reported before.

Speakers

Wei Sun

University of Nebraska-Lincoln

Lisong Xu

University of Nebraska-Lincoln

Sebastian Elbaum

University of Virginia

Di Zhao

University of Nebraska-Lincoln

Thursday February 28, 2019 2:40pm - 3:05pm EST
Constitution Ballroom

NSDI '19

3:05pm EST

Break with Refreshments

Thursday February 28, 2019 3:05pm - 3:35pm EST
Grand Ballroom Foyer

NSDI '19

3:35pm EST

Scaling Community Cellular Networks with CommunityCellularManager

Hundreds of millions of people still live beyond the coverage of basic mobile connectivity, primarily in rural areas with low population density. Mobile-network operators (MNOs) traditionally struggle to justify expansion into these rural areas due to the high infrastructure costs necessary to provide service. Community cellular networks, networks built "by and for" the people they serve, represent an alternative model that, to an extent, bypasses these business case limitations and enables sustainable rural coverage. Yet despite aligned economic incentives, real deployments of community cellular networks still face significant regulatory, commercial and technical challenges.

In this paper, we present CommunityCellularManager (CCM), a system for operating community cellular networks at scale. CCM enables multiple community networks to operate under the control of a single, multi-tenant controller and in partnership with a traditional MNO. CCM preserves flexibility for each community network to operate independently, while allowing the mobile network operator to safely make critical resources such as spectrum and phone numbers available to these networks. We evaluate CCM through a multi-year, large-scale community cellular network deployment in the Philippines in partnership with a traditional MNO, providing basic communication services to over 2,000 people in 15 communities without requiring changes to the existing regulatory framework, and using existing handsets. We demonstrate that CCM can support independent community networks with unique service offerings and operating models while providing a basic level of MNO-defined service. To our knowledge, this represents the largest deployment of community cellular networks to date.

Speakers

Shaddi Hasan

UC Berkeley

Mary Claire Barela

University of the Philippines, Diliman

Matthew Johnson

University of Washington

Eric Brewer

UC Berkeley

Kurtis Heimerl

University of Washington

Thursday February 28, 2019 3:35pm - 4:00pm EST
Constitution Ballroom

NSDI '19

4:00pm EST

TrackIO: Tracking First Responders Inside-Out

First responders, a critical lifeline of any society, often find themselves in precarious situations. The ability to track them real-time in unknown indoor environments, would significantly contributes to the success of their mission as well as their safety. In this work, we present the design, implementation and evaluation of TrackIO--a system capable of accurately localizing and tracking mobile responders real-time in large indoor environments. TrackIO leverages the mobile virtual infrastructure offered by unmanned aerial vehicles (UAVs), coupled with the balanced penetration-accuracy tradeoff offered by ultra-wideband (UWB), to accomplish this objective directly from outside, without relying on access to any indoor infrastructure. Towards a practical system, TrackIO incorporates four novel mechanisms in its design that address key challenges to enable tracking responders (i) who are mobile with potentially non-uniform velocities (e.g. during turns), (ii) deep indoors with challenged reachability, (iii) in real-time even for a large network, and (iv) with high accuracy even when impacted by UAV’s position error. TrackIO’s real-world performance reveals that it can track static nodes with a median accuracy of about 1–1.5m and mobile (even running) nodes with a median accuracy of 2–2.5m in large buildings in real-time.

MIT Media Lab

Thursday February 28, 2019 4:25pm - 4:50pm EST
Constitution Ballroom

NSDI '19

4:50pm EST

Many-to-Many Beam Alignment in Millimeter Wave Networks

Millimeter Wave (mmWave) networks can deliver multi-Gbps wireless links that use extremely narrow directional beams. This provides us with a new opportunity to exploit spatial reuse in order to scale network throughput. Exploiting such spatial reuse, however, requires aligning the beams of all nodes in a network. Aligning the beams is a difficult process which is complicated by indoor multipath, which can create interference, as well as by the inefficiency of carrier sense at detecting interference in directional links. This paper presents BounceNet, the first many-to-many millimeter wave beam alignment protocol that can exploit dense spatial reuse to allow many links to operate in parallel in a confined space and scale the wireless throughput with the number of clients. Results from three millimeter wave testbeds show that BounceNet can scale the throughput with the number of clients to deliver a total network data rate of more than 39 Gbps for 10 clients, which is up to 6.6x higher than current 802.11 mmWave standards.

Speakers

NSDI '19