Publications

Fast forward to

 

Multi- and Manycore Systems

Hardware acceleration for lock-free data structures and software transactional memory

Stephan Diestelhorst, Michael Hohmuth. In the proceedings of the Workshop on Exploiting Parallelism with Transactional Memory and other Hardware Assisted Methods (EPHAM), April 2008. Boston, MA

In this paper, we report on a new CPU-architecture extension proposal, named Advanced Synchronization Facility (ASF), which is geared toward accelerating and easing lock-free programming and software transactional memory (STM). We present an initial performance simulation and usability study of ASF’s application to a lock-free data structure (a singly linked list) and to accelerating a state-of-the-art STM system, TinySTM. Our results indicate that ASF can significantly increase the throughput and scaling behavior of both workloads: The lock-free implementation has doubled single-threaded performance and maintains a 66 % increase for eight CPUs, while application-transparent enhancement of the STM increases single-thread performance by up to 15 %, and the factor of scaling to eight CPUs by up to 20 %.

Paper: PDF
Talk: PDF

Hardware acceleration for software transactional memory

Stephan Diestelhorst. Diploma thesis, Technische Universität Dresden, January 2008. Dresden, Germany

Stephan's diploma thesis originated during his internship at AMD's OSRC in 2007.

Thesis: PDF

 

Virtualization

How to Deal with Lock-Holder Preemption

Thomas Friebel. Presentation at the Xen Summit North America, July 2008. Boston, MA

Lock-holder preemption is the preemption of a virtual CPU (VCPU) holding a spinlock.  Other VCPUs of the same guest that try to acquire the same lock will have to wait until the lock-holder is scheduled again and releases the lock.  On a multi-core machine, lock-holder preemption can cause Xen guests to waste about 7% of their time waiting for spinlocks. In this presentation we will show the effects of lock-holder preemption, show two ways to counteract it, and analyze one approach in detail.  We will give a short overview of our modifications to the Xen scheduler, and show how we regained the lost performance.

Extended abstract: PDF
Talk: PDF, PDF with comments

Nested paging hardware and software

Benjamin Serebrin, Joerg Roedel. Presentation at the KVM Forum, June 2008. Napa, CA

This presentation covers the ASPLOS paper 'Accelerating two-dimensional page walks for virtualized systems', and implementation details and performance of nested paging support for KVM.

Talk: PDF

Accelerating two-dimensional page walks for virtualized systems

Ravi Bhargava, Benjamin Serebrin, Francesco Spadini, Srilatha Manne. In the proceedings of the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS),  March 2008. Seattle, WA

Nested paging is a hardware solution for alleviating the software memory management overhead imposed by system virtualization. Nested paging complements existing page walk hardware to form a two-dimensional (2D) page walk, which reduces the need for hypervisor intervention in guest page table management. However, the extra dimension also increases the maximum number of architecturally-required page table references.

This paper presents an in-depth examination of the 2D page table walk overhead and options for decreasing it. These options include using the AMD Opteron processor's page walk cache to exploit the strong reuse of page entry references. For a mix of server and SPEC benchmarks, the presented results show a 15%-38% improvement in guest performance by extending the existing page walk cache to also store the nested dimension of the 2D page walk. Caching nested page table translations and skipping multiple page entry references produce an additional 3%-7% improvement.

Much of the remaining 2D page walk overhead is due to low-locality nested page entry references, which result in additional memory hierarchy misses. By using large pages, the hypervisor can eliminate many of these long-latency accesses and further improve the guest performance by 3%-22%.

Paper: PDF

Partitioning the physical TLB with SVM ASIDs

Sebastian Biemueller. Presentation at Xen Summit, April 2007. Yorktown Heights, NY

Slide deck used at the 2007 Xen Summit.

Talk: PDF

Nested paging support in Xen

Wei Huang. Presentation at Xen Summit, April 2007. Yorktown Heights, NY

Slide deck used at the 2007 Xen Summit including an introduction to the AMD Barcelona technology by Elsie Wahlig.

Talk: PDF

 

Miscellaneous

Myths and facts about 64-bit Linux

Andreas Herrmann, Andre Przywara. Presentation at Chemnitzer Linux-Tage, March 2008. Chemnitz, Germany

Since the dawn of 64bit-Linux on PCs there are some myths circulating around the 64bit topic. These slides will deliver some technical details to create some facts. An overview of the hardware changes of the x86-64 architecture is followed by a small report on necessary changes to Linux and the GCC toolchain. A focus lies on the compatibility to 32bit, detailing both the hardware parts and the Linux implementation. Some real life experiences and traps are shown, as well as some hints for porting old 32bit programs to 64bit. A range of benchmark results will conclude this presentation providing a view on actual performance of 64bit applications.

Talk: PDF