Parallel Programming with Cilk and Array Notation using the Intel Compiler and Autotuning
Chi-Keung Luk (Intel Software Pathfinding Team), Peng Tu (Intel Compiler Team), William Leiserson (Intel Cilk Team)
The latest release of the Intel Compiler (ICC v12) supports two new extensions
to C/C++ for parallel programming: Cilk and Array Notation, collectively known as Intel® Cilk™ Plus.
Cilk provides a straightforward way to convert a sequential program into a multithreaded program,
thereby exploiting the thread-level parallelism available on multicore machines. On the other hand,
Array Notation is an expressive method to vectorize a computational kernel in order to exploit the
SIMD parallelism within each core. This tutorial serves as an introduction to Cilk and Array Notation.
It also advocates a programming methodology based on cache-oblivious techniques that uses Cilk and
Array Notation together to achieve high performance on multicores.
In addition, we will introduce a newly developed autotuning tool that provides further performance improvement.
Length: Half day
Topics to be covered:
- Overview of Parallel Hardware
Overview of Existing Software Parallelization Techniques
- Multicores, Accelerators like GPU and Larrabee
- Threading: Pthreads, OpenMP, TBB, Cilk
- Simdization: Manual, Auto, Pragma-based, Array Notation
- Apply the Cilk keywords to expose parallelism in a sequential application
- Use the Parallel Performance Analyzer to optimize the application
- Use the Cilkscreen Race Detector to identify race bugs
- Apply Cilk hyperobjects to eliminate data races on global variables
Introduction to our Autotuning Tool
- Array section syntax extension to C/C++
- Parallel operator maps on array expressions and vectorization
- Function maps with array arguments and parallelization
- Elemental function specification and vectorization
- Cache-oblivious Techniques
- Case Studies: Stencil Computations and Tree Searching
- Experimental Evidences
Target audiences & prerequisite knowledge:
C/C++ developers who are interested in writing elegant yet high-performance parallel programs. Prior knowledge in parallel programming is helpful but not required.
Bios of Speakers:
Chi-Keung (CK) Luk is a Senior Staff Engineer in a Software Pathfinding Group at Intel, where he conducts research and advanced development in parallel programming, compiler, and program-analysis tools. Previously, he worked on the Pin dynamic instrumentation system. CK obtained his Ph.D. from the University of Toronto and was a visiting scholar at Carnegie Mellon University. He has over 20 publications and two issued patents with a few others pending. He received an Intel Achievement Award and a nomination for the ACM Doctoral Dissertation Award.
Peng Tu is a Principle Engineer at Intel Corporation and manages the Intel compiler IA32/Intel64 global optimizer and code generation team. He received his Ph.D. in Computer Science from University of Illinois at Urbana-Champaign, and his M.S. and B.S in Computer Science from Shanghai Jiao Tong University. Peng's primary interest is in languages and compilers for parallel processing, and has 4 patents in this area. Previously, he has worked on various compiler products at SGI and Tensilica Inc.
William Leiserson has been working on the development of Cilk and Cilk-related technologies for 3 years, joining Intel along with his colleagues from Cilk Arts in 2009. He has coauthored a paper with Charles Leiserson and Yuxiong He on the scalability model used by Cilkview, and at present is a member of the Cilk Runtime and Tools team. He has a B.S. in Computer Science from RPI ('03) and an M.S. in Computer Science from RIT ('07).