Machine Learning for Compilers and Architecture

Fall 2021

Compilers are the workhorse that bridge the gap between human-readable and machine executable code that executes on a given hardware platform. They gradually lower code written using high-level programming languages to hardware assembly instructions while performing a wide array of optimizations including loop transformations, parallelization and vectorization etc. In doing so, compilers usually face mutually-exclusive optimization choices with differing profitability characteristics. Traditionally, compilers use heuristic algorithms based on simple profitability metrics to make such decisions. However, as of late, computing platforms and workloads have increasingly become more complex and diverse making optimization decision making inside compilers increasingly challenging. On the other hand, designing the next generation computing architectures targeting newer workloads such as deep neural networks has also become a challenging task.

In this course, we will explore how machine learning is helping both compiler engineers and computer architects design newer and more scalable design techniques to better adapt to newer workloads and computing needs. In particular, we will discuss new trends in compiler auto-tunning, domain specific optimizations and in constructing data-driven cost models. We will also discuss how computer architects are using machine learning to design state-of-the-art hardware platforms. In particular, we will cover topics such as domain specific architecture designs, scalable design space exploration techniques and data-driven simulations. After completing the course you should be able to appreciate the new trends of using data-driven techniques in both compiler and architecture design and should be prepared to do your own research in these areas.

New! We have started using Zoom to accommodate remote students.

 1302 Siebel Center

 Charith Mendis
 Assistant Professor
 Computer Science, UIUC
 4118 Siebel Center

 Office Hours:
 Tuesday 1-2pm
 4118 Siebel Center



  • 09/16: Project details are out.
  • 09/06: You can find useful resources including tutorials in the resources section.
  • 09/06: Piazza online discussion forum is up! Please use this forum to have discussions about papers we discuss in the class. We have also started using Zoom to accommodate remote students.
  • 09/06: Paper review website is up!
  • 08/27: Reading list is up! Please provide at least 5 paper selections by September 7th using this form.
  • 08/24: First day of classes. Please fill out the Class Statistics Survey.
  • 08/16: Class Website is up!



The class is conducted in a seminar format. The first set of classes will mainly be lectures aimed at introducing core concepts and techniques that will be useful in understanding state-of-the-art research in this area. After that the class will transition into reading papers, paper presentations, online discussions and a final class project. For Fall 2021, we are planning on meeting in person. However, if the circumstances change we may use Zoom and go fully virtual. Submission links and other infrastructure will be announced in subsequent lectures and will be posted here in advance.

Following is a tentative grading rubric we are going to use.

Activity Grade Details
Paper Reviews and Discussion 25%
  • A short summary and review of the paper (250-750 words). Use the hotCRP website to submit your reviews.
  • Submission deadlines are on midnight on Sundays (for Tuesday papers) and Tuesdays (for Thursday papers).
  • Participate in paper discussions in the class.
  • (Optional) Discuss the paper in the piazza online forum.
  • You will only be graded for the top 15 paper reviews you submit. The least scoring 4 paper grades will be dropped. This essentially means that you can skip 4 paper reviews without any penalty.
Presentation and Discussion Lead 25%
  • Select at least five papers you would like to present by September 7th using this form. You will be assigned a paper after all selections are finalized by September 8th.
  • Schedule a meeting with the instructor a week before your presentation slot.
  • Submit the final version of the slides using this form. Slides are due when paper reviews are due for that paper. The presentation should be 15-20min long.
  • You are required to lead in-class discussions for the paper for the first 10min after the presentation mainly answering questions from the class.
  • Prepare a 250-750 word summary of the discussions and submit it by editing your entry using the same form.
Project 50%
  • Groups of 2 students shall propose and finish a project by the end of the semester. Please refer to project details to find project types.
  • Proposal (1 page): submit by October 7th. Each group should meet with the instructor after the proposal is submitted (October 9th-16th) to discuss the plan.
  • Presentation (7 minutes + 3min Q&A): on December 7th
  • Report (5 pages): submit by December 7th. Please use ISCA style format.
  • Form for submitting the proposal, presentation and report can be found here. If your project type is replicating experiments you are also required to submit a link to your artifact.


Tentative Schedule  

We will first discuss the core concepts behind compiler construction and computer organization. We will go into details about optimization and design decisions compiler engineers and computer architects face. Next, we will discuss core Machine Learning concepts that are needed to understand research done in this area. We will cover black-box optimizations, neural network and basics of sequential decision making. We will use this knowledge to read latest research papers that are published in this area covering compiler auto-tuning, cost models, domain specific optimizations, design space exploration for domain specific architectures and data-driven simulation etc.

Date Topic Presenter Notes

Introduction to compilers, architecture and logistics

Todo: Class Statistics Survey

Quick overview of Compiler Construction + Optimizations

Compiler Optimizations

Anatomy of a Compiler Optimization Pass, DSLs, Domain Specific Optimizations

DSLs + ML in Architecture

Continuation of discussion on DSLs and examples of ML in architecture

Background Reading: A Survey of Machine Learning for Computer Architecture and Systems
Machine Learning Techniques

Quick overview of ML techniques: Neural Networks

Todo: Paper selections
Machine Learning Techniques (Contd.) and Auto-tuning

Quick Overview of ML techniques: Genetic Algorithms, Simulated Annealing, Sequential Decision Making; Introduction to Auto-tuning


Background Reading: 
A Survey on Compiler Autotuning using Machine Learning (ACM CSUR 2018)
A taxonomy of ML for Systems Problems(IEEE Micro Sept/Oct 2020)

Autotuning: Empirical Autotuning

Main Reading: A Fast Fourier Transform Compiler (PLDI 1999)


Related Reading: 
Automatically Tuned Linear Algebra Software (SC 1998)
Fast Automatic Generation of DSP Algorithms (ICCS 2001)
The Design and Implementation of FFTW3 (IEEE 2005)

Autotuning: Languages for exposing choices

Main Reading: Petabricks: A Language and Compiler for Algorithmic Choice (PLDI 2009)


Related Reading: 
A framework for adaptive algorithm selection in STAPL (PPoPP 2005)
Halide: A language and compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines (PLDI 2013)

Autotuning: Techniques

Main Reading: Bliss: Auto-tuning Complex Applications using a Pool of Diverse Lightweight Learning Models (PLDI 2021)


Related Reading: 
Learning to Generate Fast Signal Processing Implementations (ICML 2001)
Towards Better Understanding of Black-box Auto-tuning: A Comparative Analysis for Storage Systems (ATC 2018) - A systems paper with a good overview of techniques

Autotuning: Frameworks

Main Reading: OpenTuner: An extensible framework for Program Autotuning (PACT 2014)

Rafae Related Reading: 
AutoTVM: Learning to Optimize Tensor Programs (NeurIPS 2018)
Autotuning: Scaling Up

Main Reading: GPTune: Multitask Learning for Autotuning Exascale Applications (PPoPP 2021)

Archit Related Reading: 
Portable Performance on Heterogeneous Architectures (ASPLOS 2013)
Autotuning: Diverging Workloads

Main Reading: A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs (PPoPP 2019)

Garvita Related Reading: 
Autotuning Algorithmic Choice for Input Sensitivity (PLDI 2015)
Guest Lecture: Autotuning in the industry (Google)
Data-driven Cost Models: Part 1

Main Reading: Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks (ICML 2019)

Ashitabh Related Reading: 
Learning execution through neural code fusion (ICLR 2020)
Data-driven Cost Models: Part 2

Main Reading: A Learned Performance Model for Tensor Processing Units (MLSys 2021)

Kartik Related Reading: 
A Deep Learning based cost model for automatic code optimization (MLSys 2021)
Program Embeddings: Part 1

Main Reading: Learning and Evaluating Contextual Embedding of Source Code (ICML 2020)

Damitha Related Reading: 
Blended, precise semantic program embeddings (PLDI 2020)
Program Embeddings: Part 2

Main Reading: ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations (ICML 2021)

Jiawei Related Reading: 
IR2Vec: LLVM IR Based Scalable Program Embeddings (TACO 2020)
Learned Optimizations: Traditional Compiler Optimizations 1

Main Reading: Compiler Auto-Vectorization with Imitation Learning (NeurIPS 2019)


Related Reading: 
NeuroVectorizer: End-to-end Vectorization with Deep Reinforcement Learning (CGO 2020)
Meta Optimization: improving compiler heuristics (PLDI 2003)

Learned Optimizations: Traditional Compiler Optimizations 2

Main Reading: End-to-end Deep Learning of Optimization Heuristics (PACT 2017)

Stefanos Related Reading: 
AutoPhase: Juggling HLS Phase Orderings in Random Forests with Deep Reinforcement Learning (MLSys 2020)
Learned Optimizations: DSLs Part 1

Main Reading: Learning to Optimize Halide with Tree Search and Random Programs (SIGGRAPH 2019)

Hyoungwook Related Reading: 
Ansor: Generating High-Performance Tensor Programs for Deep Learning (OSDI 2020)
Learned Optimizations: DSLs Part 2

Main Reading: Value Learning for Throughput Optimization of Deep Neural Networks (MLSys 2021)

Vishnu Related Reading: 
The case for learned index structures (SIGMOD 2018) - databases
Learned Optimizations: Tensor Programs

Main Reading: Device Placement Optimization with reinforcement learning (ICML 2017)

Yong Zhi Related Reading: 
Transferable Graph Optimizers for ML Compilers (NeurIPS 2020)
Architecture Design Space Exploration: Part 1

Main Reading: Timeloop: A Systematic Approach to DNN Accelerator Evaluation (ISPASS 2019)

Project Clinic
Related Reading: 
Mind Mappings: Enabling Efficient Algorithm-Accelerator Mapping Space Search (ASPLOS 2021)
Architecture Design Space Exploration: Part 2

Main Reading: A graph placement methodology for fast chip design (Nature 2021)

Ajay Related Reading: 
A Full-stack Accelerator Search Technique for Vision Applications
Guest Lecture: Chris Cummins from Facebook
Recap + Project Clinic
Learned Architecture Simulation

Main Reading: DiffTune: Optimizing CPU Simulator Parameters with Learned Differentiable Surrogates (MICRO 2020)

Jovan Related Reading: 
SimNet: Computer Architecture Simulation using Machine Learning
Learned Systems: Caches

Main Reading: Learning Cache Replacement with CACHEUS (FAST 2021)


Related Reading: 
Learning Memory Access Patterns (ICML 2018)
Applying Deep Learning to the Cache Replacement Problem (MICRO 2019)

Student Presentations


Project Details

You are required to propose and complete a class project by the end of the semester (Groups of 2). The goal is to pursue an idea in a novel research direction. At the end of the semester you should present your findings (negative results are fine!) and submit a write up describing the problem specification, methodology, results and key takeaways. Even though it is encouraged to do a project exploring an idea in a novel direction, following are all the project types you can pursue.

  • Explore a novel research idea (preferred)
  • Survey on an existing research area. The area should be relevant to the course. In general, you should summarize, categorize and make high-level conclusions on about 20-25 papers that you chose from that area. Note that the survey cannot be just a summarization. You need to compare and contrast techniques across all selected papers. Please see this survey on auto-tuning for an example.
  • Replication of results. You can choose around 3-4 experiment heavy papers and replicate their results. You should compare them and make relevant conclusions. You should also submit a link to a publicly available repository with your experiment code.

Please feel free (not required) to setup a time before you submit your proposal to discuss interesting ideas that can be your class project.

  • Proposal: October 7th
  • Meet the instructor: between October 9th-16th to discuss the viability of the proposal; we can increase or reduce scope.
  • Presentation: 7min presentation on December 7th with 3min Q&A
  • Report: 5 page double-column write-up on December 7th; Please use ISCA style format. Assume that you are submitting a workshop paper.

Please use this form to submit your proposal, presentation, report and artifact link (if applicable).


Academic Conferences and Workshops
Text Books
  • Alfred Aho, Monica Lam, Ravi Sethi and Jeffrey Ullman, "Compilers: Principles, Techniques, and Tools (2nd edition)"
  • Ian Goodfellow, Yoshua Bengio and Aaron Courville, "Deep Learning", MIT Press (website)