Parallel Programming with Cilk and Array Notation using the Intel Compiler and Autotuning

Slides Available

Presenters:

Chi-Keung Luk (Intel Software Pathfinding Team), Peng Tu (Intel Compiler Team), William Leiserson (Intel Cilk Team)

Objectives:

The latest release of the Intel Compiler (ICC v12) supports two new extensions to C/C++ for parallel programming: Cilk and Array Notation, collectively known as Intel® Cilk™ Plus. Cilk provides a straightforward way to convert a sequential program into a multithreaded program, thereby exploiting the thread-level parallelism available on multicore machines. On the other hand, Array Notation is an expressive method to vectorize a computational kernel in order to exploit the SIMD parallelism within each core. This tutorial serves as an introduction to Cilk and Array Notation. It also advocates a programming methodology based on cache-oblivious techniques that uses Cilk and Array Notation together to achieve high performance on multicores. In addition, we will introduce a newly developed autotuning tool that provides further performance improvement.

Length:

Half day

Topics to be covered:

  1. Introduction
    1. Overview of Parallel Hardware
    2. Overview of Existing Software Parallelization Techniques
  2. Cilk
    1. Apply the Cilk keywords to expose parallelism in a sequential application
    2. Use the Parallel Performance Analyzer to optimize the application
    3. Use the Cilkscreen Race Detector to identify race bugs
    4. Apply Cilk hyperobjects to eliminate data races on global variables
  3. Array Notation
    1. Array section syntax extension to C/C++
    2. Parallel operator maps on array expressions and vectorization
    3. Function maps with array arguments and parallelization
    4. Elemental function specification and vectorization
  4. Introduction to our Autotuning Tool
  5. Programming Methodology
    1. Cache-oblivious Techniques
    2. Case Studies: Stencil Computations and Tree Searching
    3. Experimental Evidences

Target audiences & prerequisite knowledge:

C/C++ developers who are interested in writing elegant yet high-performance parallel programs. Prior knowledge in parallel programming is helpful but not required.

Bios of Speakers:

Chi-Keung (CK) Luk is a Senior Staff Engineer in a Software Pathfinding Group at Intel, where he conducts research and advanced development in parallel programming, compiler, and program-analysis tools. Previously, he worked on the Pin dynamic instrumentation system. CK obtained his Ph.D. from the University of Toronto and was a visiting scholar at Carnegie Mellon University. He has over 20 publications and two issued patents with a few others pending. He received an Intel Achievement Award and a nomination for the ACM Doctoral Dissertation Award.

Peng Tu is a Principle Engineer at Intel Corporation and manages the Intel compiler IA32/Intel64 global optimizer and code generation team. He received his Ph.D. in Computer Science from University of Illinois at Urbana-Champaign, and his M.S. and B.S in Computer Science from Shanghai Jiao Tong University. Peng's primary interest is in languages and compilers for parallel processing, and has 4 patents in this area. Previously, he has worked on various compiler products at SGI and Tensilica Inc.

William Leiserson has been working on the development of Cilk and Cilk-related technologies for 3 years, joining Intel along with his colleagues from Cilk Arts in 2009. He has coauthored a paper with Charles Leiserson and Yuxiong He on the scalability model used by Cilkview, and at present is a member of the Cilk Runtime and Tools team. He has a B.S. in Computer Science from RPI ('03) and an M.S. in Computer Science from RIT ('07).