Multi- and Manycore Systems Research

Fast forward to

 

Introduction

Future CPU generations will no longer be able to increase their single-thread performance exponentially. Instead, CPUs will scale the number of processing cores. In consequence, software will no longer get faster execution speeds automatically with each hardware upgrade, but will have to be adapted to the higher level of parallelism exposed by the CPU. Existing parallelization techniques get more and more complex with an increasing number of execution threads, which is why the software industry is looking for new, less complex parallel programming paradigms.

Transactional memory is a promising programming model that provides transactions (known from database technology) that take the burden for synchronizing concurrent data access off programmers’ backs. However, today’s software implementations of transactional memory, known as Software Transactional Memory (STM), still inflict too much overhead for synchronization and bookkeeping, making STMs impractical for the CPU count to be expected in the near future. One way to reduce this overhead is to accelerate STMs with new hardware mechanisms.

Another promising programming paradigm is that of lock-free data structures. Many authors have shown that lock-free algorithms perform and scale well and are robust against deadlocks, but to date these algorithms have been limited by incomplete hardware support: Lock-free programming relies on atomically modifying a set of memory locations using instructions like test-and-set and compare-and-swap. However, these instructions typically operate on only one or two words of memory and have a high latency, making lock-free programming impractical for more complex data structures or when low latency is required.

AMD's Operating System Research Center helps defining and evaluating CPU-architecture extensions for making parallel programs faster as well as easier to write. We work with the STM community to develop CPU extensions for speeding up STM systems; and we evaluate an experimental AMD64 feature known as the Advanced Synchronization Facility (ASF).

 

VELOX

VELOX is an EU-sponsored research project aiming at improving STM technology for multicore CPUs found in computers today or in the near future. VELOX takes a whole-system approach and looks at all system aspects from hardware over operating systems and runtimes all the way to applications.

AMD participates in the Architecture work package of this project, helping to developing simulator technology and to evaluate architecture-extension proposals.

 

Advanced Synchronization Facility (ASF)

ASF is an experimental AMD64 extension that allows user- and system-level code to modify a set of memory objects atomically without requiring expensive synchronization mechanisms.

The ASF extension provides an inexpensive primitive from which higher-level synchronization mechanisms can be synthesized: for example, multi-word compare-and-exchange, load-locked-store- conditional, lock-free data structures, and primitives for software transactional memory.

ASF is both more flexible and faster than existing lock-free atomic memory-modification approaches. Instead of offering new instructions with hardwired semantics (such as compare-and-exchange for two independent memory locations), ASF only exposes a mechanism for atomically updating multiple independent memory locations and allows software to implement the intended synchronization semantics.

We have evaluated ASF in the contexts of both lock-free programming and software transactional memory. Please find more information in the papers posted in the Publications section of this page.

We have released the simulator we used in our evaluation: a version of the open-source AMD64 simulator PTLsim that we extended with an implementation of ASF. The simulator can be downloaded in the download section of this page.

 

Publications

Hardware acceleration for lock-free data structures and software transactional memory

Stephan Diestelhorst, Michael Hohmuth. In the proceedings of the Workshop on Exploiting Parallelism with Transactional Memory and other Hardware Assisted Methods (EPHAM), April 2008. Boston, MA

In this paper, we report on a new CPU-architecture extension proposal, named Advanced Synchronization Facility (ASF), which is geared toward accelerating and easing lock-free programming and software transactional memory (STM). We present an initial performance simulation and usability study of ASF’s application to a lock-free data structure (a singly linked list) and to accelerating a state-of-the-art STM system, TinySTM. Our results indicate that ASF can significantly increase the throughput and scaling behavior of both workloads: The lock-free implementation has doubled single-threaded performance and maintains a 66 % increase for eight CPUs, while application-transparent enhancement of the STM increases single-thread performance by up to 15 %, and the factor of scaling to eight CPUs by up to 20 %.

Paper:PDF
Talk: PDF

Hardware acceleration for software transactional memory

Stephan Diestelhorst. Diploma thesis, Technische Universität Dresden, January 2008. Dresden, Germany

Stephan's diploma thesis originated during his internship at AMD's OSRC in 2007.

Thesis: PDF

 

PTLsim-ASF release

PTLsim-ASF is a variant of the open-source AMD64 simulator PTLsim that we modified to simulate ASF. Please refer to the papers posted in the Publications section of this page for information on how we simulated and used ASF. For general information regarding PTLsim, including a detailed user's manual, please refer to the PTLsim home page.

Release notes

Please read these release notes prior to downloading the software. They contain important information regarding known issues and recommended use of PTLsim-ASF.

Download RELEASE_NOTES (8 KB)

License

PTLsim-ASF is licensed under the GPL v2.

Download LICENSE (18 KB)

Full release

Complete tarball of the latest PTLsim-ASF release:

Download full release (4.9 MB)

Patch relative to mainline PTLsim:

Download patches relative to PTLsim (svn 219) (720 KB)

Xen hypervisor with PTLsim patches

This release of Xen is required for full-system simulation using PTLsim-ASF. It is unchanged from the Xen release accompanying the original PTLsim 219 release.

Download Xen hypervisor with PTLsim patches (5.6 MB)

Download Dom0 and DomU Linux kernels for Xen (source) (44 MB)

Download Dom0 and DomU Linux kernels for Xen (binaries) (13 MB)