Display options
Share it on

EAGER: Characterization, Prediction, and Management of Latency Variation in Virtualized HPC and Warehouse Scale Systems.

[No authors listed]

UIID-NSF: 1513

Abstract

Recent advancements in the Linux operating system have enabled widespread use of lightweight host-based virtualization, or containers. Containers can be used to deploy workloads in High Performance Computing (HPC) and warehouse computing environments. However, research has shown that containers can have adverse effects on the latency of the application's network traffic, a metric that many HPC applications are sensitive to. This project addresses limitations of prior research that focus primarily on the mean latency in HPC applications but do not fully analyze variation in network latency as a factor in application performance. The project is to study, predict, and manage variation in network latency and its impact to HPC and warehouse computing application performance. Results will have impact to the sustained performance of HPC and warehouse scale applications, an area of national interest. Results from this project will give insight and direction to new research in the characterization, prediction, and management of variation of latency in highly parallel systems. Research tasks include studying the tradeoffs of host-managed network traffic versus application-managed network interfaces in the context of containers, identifying the granularity of virtualization appropriate for high speed communication between workers in different computing platforms and how these choices affect message latency variation, and studying the effects of zero-copy messaging between Docker containers in combination with other network communication technologies. This project will develop a thorough characterization and analysis of the mechanisms for managing latency that have become available with recent technologies. The project studies technologies that are not yet widely used in HPC applications, including Docker containers in combination with SR-IOV, DPDK, Ethernet, and InfiniBand. Experimental design and testing of environment configurations and application resource allocation strategies will lead to the development and validation of descriptive and predictive models of the variation in network latency for the selected technologies, and will provide guidance for the development of best practice recommendations for configuration of container networking. The project, with a focus on variation in network latency, lays the groundwork for a larger project to develop an experimental platform and framework for the evaluation of the use of emerging technologies in HPC and warehouse computing systems. Funds will support two Ph.D. students at Clemson University in South Carolina, an EPSCoR state.

Other Details

  • Award Instrument: Standard Grant
  • Email: [email protected]
  • Organization: University of Arkansas
  • Other Investigators: David Jacobs, Jill Gemmill
  • Primary Investigator: Amy Apon
  • Program(s): SOFTWARE and HARDWARE FOUNDATION
  • Start Date: 08/01/2016