The usual approach to implementing portable high performance kernels is to use vendor optimized libraries that implement standard APIs like the MPI, BLAS, LAPACK or FFTW interface, and to use standardized annotations like OpenMP or OpenACC to convey parallelization opportunities. This leaves performance optimization to the experts knowing the platform and allows application programmers to focus on functionality. The application programmers are free to use any language for which the libraries have bindings to implement functionality not supported by libraries, and to mix and match libraries. We present a new take on this tried-and-true concept:
We interpret a subset of BLAS1/2/3, FFTW, and OpenMP, and a subset of C as a domain specific language targeted at implementing HPC kernels. We then use a domain specific compiler to extract a high level specification in SPIRAL's operator language from a program implemented in the domain specific language, and perform high level cross library-call optimizations and convert loops over library calls into aggregate/batch calls. Finally, we translate the resulting representation into native code for an Intel Haswell CPU, an Intel Xeon PHI GPU, and a near-memory accelerator that is part of a 3D logic/DRAM stack that we developed in the context of the DARPA PERFECT program. For the PERFECT benchmark suite's STAP kernel (space time adaptive processing), the library based source code then runs unmodified across the 3 test platforms and fully leverage their performance capabilities.
Franz Franchetti is an Associate Professor with the Department of Electrical and Computer Engineering at Carnegie Mellon University. He received the Dipl.-Ing. (M.Sc.) degree in Technical Mathematics and the Dr. techn. (Ph.D.) degree in Computational Mathematics from the Vienna University of Technology in 2000 and 2003, respectively. In 2006 he was member of the team winning the Gordon Bell Prize (Peak Performance Award) and in 2010 he was member of the team winning the HPC Challenge Class II Award (most productive system). Dr. Franchetti's research focuses on automatic performance tuning and program generation for emerging parallel platforms and algorithm/hardware co-synthesis. He targets multicore CPUs, clusters and high-performance systems (HPC), graphics processors (GPUs), field programmable gate arrays (FPGAs), FPGA-acceleration for CPUs, and logic-in-memory and 3DIC chip design. Within the Spiral effort, his research goal is to enable automatic generation of highly optimized software libraries for important kernel functionality. In other collaborative research threads, Dr. Franchetti is investigating the applicability of domain-specific transformations within standard compilers. He leads two DARPA projects in the HACMS and PERFECT program and is PI/Co-PI on a number of federal and industry grants.
This talk is organized by the Compilers and Languages Group at the Institute of Computer Languages.
Tea at the library of E185/1, Argentinierstr. 8, 4th floor (central) at 15:30.