— PREVIEW/PRELIMINARY — PREVIEW/PRELIMINARY — PREVIEW/PRELIMINARY —
ACCFS – Accelerator File System A case study towards a generalized accelerator interface A R C H I T E K T U R
Andreas Heinig1, Wolfgang Rehm1, Heiko Schick2 1 Chemnitz University of Technology, Germany 2 IBM Deutschland Entwicklung GmbH, Germany {heandr,rehm}@cs.tu-chemnitz.de,
[email protected]
Extending the SPUFS concept to ACCFS libspe *
libacc **
system call interface
system call interface
libfs
⇒ No common attach point inside the Kernel is available • This is disadvantageous for application- and library programmers ⇒ Every interface has other syntax and semantics
Basic Idea
SPE
SPE
SPE
SPE
SPE
SPE
SPE
SPU
SPU
SPU
SPU
SPU
SPU
SPU
SPU
SXU
SXU
SXU
SXU
SXU
SXU
SXU
SXU
LS
LS
LS
LS
LS
LS
LS
LS
MFC
MFC
MFC
MFC
MFC
MFC
MFC
MFC
ACC **
device handler
...
ACC **
spufs *
device handler
ext2
proc
ext2
accfs ** ...
SPE
libfs
RSPUFS concept
Character / Network / Block device drivers
• The idea is to extend the programing modle chosen for integrate the Cell processor into the Linux environment • On the Cell/B.E. multiple independent vector processors called Synergistic Processing Units (SPUs) are built around a 64-bit PowerPC core (PPE) • The programing model is to create a virtual file system (VFS) to export the functionality of the SPUs to the user space via a file system interface
Cell Broadband Engine Architecture
application
ACC **
API SPUFS OpenFPGA, AAL CUDA
Kernel Level
Integration VFS Char. Dev Char. Dev ..
Hardware Level
Accelerator Cell/B.E. FPGA (GP)GPU (Tesla)
application
device handler
• Different accelerators are available on the market • They all are integrated through different ways into the Linux environment, e.g.:
User Level
Current Situation
proc
Introduction
ON−chip coherent bus (EIB)
Character / Network / Block device drivers
L2 PPE
S P E
S P E
...
S P E
CPU
PPE
S P E
S P E
...
S P E
A C C
CPU
A C C
...
Cell/B.E.
CPU: case study Opteron
ACCelerator: e.g. FPGA
starting point: SPUFS concept
intermediate generalization step
target: ACCFS concept
memory controller
PPE
A C C
L1
bus interface controller
Dual Rambus XDR
PXU
Rambus FlexIO
PPU
PPE - Power Processor Element • Power Architecture with Altivec unit • 32 KiB Level-1-Cache and 512 KiB Level-2-Cache ⇒ Executes the operating system
SPE - Synergistic Processing Element • RISC processor with 128-bit SIMD organization • 256 KiB instruction and data memory (local-store) • The execution unit can only operate on the local-strore • DMA has to be used to access other addresses ⇒ Accelerator Units
RSPUFS User Level
(iii)
VFS context entries File Description mbox SPU to CPU mailbox ibox SPU to CPU mailbox wbox CPU to SPU mailbox mem local-store memory regs register file ..
ACCFS Concepts
User Level
rspufs
Bottom Half – Vendor Interface Tasks: • Managing the accelerator – Establish the interconnection – Initialize the hardware – Configure memory mappings – ... • Provide the virtualization
"VFS interface"
system call interface
(ii)
Character / Network / Block device drivers
libfs
Hardware
(ii)
Cell/B.E.
Hardware
Opteron
dedicated link (Ethernet, TCP/IP)
• RSPUFS = Remote Synergistic Processing Unit File System • Proof of concept integration of the SPEs into the Opteron • Cell and Opteron are connected through Ethernet (TCP/IP)
accfs ...
ACC
(iii)
ACC
Character / Network / Block device drivers
Character / Network / Block device drivers
Challanges 1. Different byte order ⇒ The Opteron kernel swaps the bytes before sending and after receiving ⇒ The application has to swap the data by itself 2. No RDMA capable interconnection • Accessing the memory of the Cell is not possible in hardware • The functionality is necessary to support assisted callbacks, the direct mapping of the local-store and the XDR access ⇒ RSPUFS has to simulate the DMA =⇒ extension of the VFS context with a new ”xdr” interface
Top Half – User Interface
"Vendor Interface"
ACC
device handler
ACC
device handler
ACC
device handler
proc
ext 2 ... proc
(ii)
ACCFS Concepts (continued)
1. Virtualization of the accelerator 2. Virtual File System context access 3. Separation of the functionalities: • Top half: ”accfs” ACCFS interfaces – VFS implementation "User Interface" (Syscalls) – Provides the user/VFS inlibfs terface accfs – Provides the vendor interface ... • Bottom half: ”device handlers” – Vendor specific part – Integrates the accelerator ext2
libfs
device handler
1. Virtualization of the SPE • Accelerators are mostly only exclusively usable ⇒ The system can dead-lock if several applications need a huge amount of SPEs =⇒ The ”physical SPE” gets abstracted with ”SPE context”
spufs
(i)
device handler
SPUFS - Concepts
(ii)
libfs ext 2 ... proc
system call interface
ACC
Cell/B.E
2. VFS context access ⇒ VFS uses well known system calls (open, close, read, ...) • Only two new system calls: – sys spu create → Creates a SPE context – sys spu run → Starts the execution of the SPU code
system call interface
libacc
device handler
Hardware
(i)
proc
Hardware Level
Character / Network / Block device drivers
(iii)
application
ext2
spufs
(ii)
Kernel Level
ext 2 ... proc
libspe
Tasks: • Handle the VFS • Provide the Interfaces User Interface: • Two new system calls: – sys acc create → Creates a new ACCFS context, reflecting in a new reflecting in a new folder – sys spu run → Starts the accelerator • VFS context entries File Description regs register file message message interface memory/ exported memories semaphore/ semaphores
Hardware Level
libfs
rspufsd
network stack
Kernel Level
system call interface
ACCelerator File System (ACCFS) application
network stack
libspe
Kernel Level
• SPUFS = Synergistic Processing Unit File System • Virtual File System (VFS) mounted on ”/spu” by convention • Integrates the SPUs in the Linux environment
application
Hardware Level
User Level
SPUFS
CPU
A C C
A C C
...
A C C
ACCelerator: e.g. FPGA
• ACCFS = ACCelerator File System • Virtual File System (VFS) mounted on ”/acc” by convention • Proposal for integrating different kinds of accelerators into the Linux environment
ACCFS Benefits 1. Device handlers have only to concentrate on the hardware integration • No management of operation system structures • No providing of a whole user interface 2. Ease the development of library programing • Well known interface ⇒ No non-standard ioctl differing from on accelerator to another • Better usability of the accelerator ⇒ Always the same usage ”protocol”: 1. Create the Context 2. Upload the Code 3. Execute the Context 4. Destroy the Context 3. The accelerator becomes better exchangeable
ACCFS: Further Work • Finish the interface implementation of ACCFS • Porting SPUFS to an ACCFS device handler for SPEs • Implementing device handlers for the first accelerators other than Cell
Interface: (between top half and bottom half) • accfs register (const struct accfs vendor *) ⇒ Loading (registration) of a device handler • accfs unregister (const struct accfs vendor *) ⇒ Unloading of the device handler
This research is supported by the Center for Advanced Studies (CAS) of the IBM B¨oblingen Laboratory as part of the NICOLL Project.
— PREVIEW/PRELIMINARY — PREVIEW/PRELIMINARY — PREVIEW/PRELIMINARY —