GENERALIZING THE SPUFS CONCEPT – A CASE STUDY TOWARDS A COMMON ACCELERATOR INTERFACE Andreas Heinig1 , Jochen Strunk1 , Wolfgang Rehm1 , Heiko Schick2 1 Chemnitz University of Technology, Germany 2 IBM Deutschland Entwicklung GmbH, Germany {heandr,sjoc,rehm}@informatik.tu-chemnitz.de,
[email protected] corresponding author: Andreas Heinig, phone (+49)179 9101345 This research is supported by the Center for Advanced Studies (CAS) of the IBM Boeblingen Laboratory, Germany as part of the NICOLL Project.
The development of specialized application accelerators is happening today. However, they do not share a common attach point, and have no common architecture or programing model. A framework that economically and efficiently enables specialized acceleration is highly desirable.
User Level
In this work we propose a generic interface concept called ”ACCFS” for integrating application accelerators into Linux-based platforms. The idea is to extend the programing model chosen by the Linux for Cell/B.E. team. On the Cell/B.E. multiple independent vector processors called Synergistic Processing Units (SPUs) are built around a 64-bit r PowerPC core (PPE). The programing model is to create a virtual file system (VFS) to export the functionality of the SPUs to the user space via a file interface, they called it ”SPUFS”. Against other solutions such as using character devices or introducing a new process space the VFS interface uses common file system calls and provides an economic and efficient access to the accelerator units, the SPUs. application
application
libspe *
libacc **
system call interface
system call interface libfs
RSPUFS concept
Hardware Level
Character / Network / Block device drivers
PPE
S P U
S P U
ACC **
ACC **
device handler
ACC **
...
device handler
proc
spufs *
device handler
ext2
...
proc
accfs ** ext2
Kernel Level
libfs
Character / Network / Block device drivers
...
S P U
CPU
PPE
S P U
S P U
...
S P U
A C C
CPU
A C C
...
Cell/B.E.
CPU: case study Opteron
ACCelerator: e.g. FPGA
starting point: SPUFS concept
intermediate generalization step
target: ACCFS concept
Figure 1: Extending the SPUFS 1
*
concept to ACCFS
**
A C C
We have had the idea to check out whether this approach could be adopted successfully to a more generic coupling of a CPU and accelerators. Therefore we have chosen a stepby-step porting. In a first step we replaced the PPE by a commodity main stream CPU – in that case AMD Opteron. A further step will be the substitution of the SPUs by other specialized accelerators such as FPGAs and the like. Figure 1 illustrates the stepwise generalization starting with the SPUFS concept followed by an intermediate step (”RSPUFS”) that finally leads to the ACCFS concept. The SPUFS concept virtualizes the SPUs. Therefore SPUFS manages a context for each virtual SPU with all the necessary data for suspending and resuming SPU program execution. For manipulating the context and the eventually bounded physical SPU a virtual file system interface is provided. The contexts are bounded on a physical SPU by a scheduler inside SPUFS. All this work is done by the ”spufs” kernel driver on the PPE. The first step towards a common accelerator file system was to prove that the PPE can be replaced by another processor. In our case we chose an AMD Opteron. As there was no hardware solution available that enables direct coupling of an Opteron processor with SPUs we use an Ethernet-based coupling as intermediate step. In conjunction with our Remote SPUFS (RSPUFS) implementation a direct coupling could be emulated. RSPUFS consists of two parts. The first is a Linux kernel driver with the name ”rspufs”. It runs on the Opteron side and provides the same VFS interface as SPUFS. The second is a proxy like daemon named ”rspufsd” executed as user-space program on the Cell. This daemon translates the network requests from rspufs into the appropriate SPUFS virtual file system calls. We could prove that it is possible to cope with problems like byte ordering (endianes) and direct memory access (even emulated) without modifying the SPUFS concept. Furthermore the RSPUFS implementation shows a way for integrating accelerators other than SPUs by splitting the functionality into two parts: one abstracting the user interface (rspufs) and one integrating the acceleration hardware (rspufsd). We propose exactly this structure for the ACCFS concept. The upper half of ACCFS provides the virtual file system interface and the bottom manages the vendor specific parts of the accelerator components. The interface between them has to support a wide variety of accelerators like FPGAs, GPUs or DSPs. It also has to provide a basic function set comprising at least functions for the manipulation of memory, the register file and DMA units, and for signaling and mailboxing. Currently we are working on an enhanced specification of an interface within the context of various accelerator integrations in conjunction with tight coupling using HyperTransport Technology (Torrenza) and direct system bus connections as well. In this paper we describe SPUFS and the current RSPUFS implementation. Finally we give an outlook on ACCFS.
2