Reyhaneh Jabbarvand

 
  Home
 
   
             
 

Course Title 

CS598: Machine Learning for Software Engineering

Course Information

Instructor: Reyhaneh Jabbarvand

Email: reyhaneh@illinois.edu

Class Time: Tuesday/Thursday 12:30 pm -- 1:45 pm

Office hours: Friday 8:30 am -- 9:30 am (Office hourse will be held on Zoom)

Zoom link (for office hours): https://illinois.zoom.us/j/6923892953?pwd=bE8wMmtTT3QxeGdoWk5MT1BrYVRQdz09

Forum: https://campuswire.com/c/G48C5B61F/ (All the notifications and announcements will be posted on Campuswire)

Course Description and Objectives

This course aims to help students explore and understand the applications of machine learning (ML) to solve real-world software engineering and analysis problems. Students will obtain knowledge about (1) fundamentals and advanced topics in SE and (2) how ML techniques can enhance software analysis to ensure the quality and reliability of software. After finishing this course, students are expected to know how to:

  • Analyze related work in the area of ML for SE
  • Investigate and find appropriate methods to represent the source code, test suite, or relevant program properties for consumption by ML algorithms for a specific problem
  • Generate an appropriate dataset to train the ML models
  • Evaluate the proposed methods on real-world programs and analyze the result
  • Write an academic paper in the area of ML for SE (for PhD students)
There will be a semester-long research project. The goal of this semster's project is to assess trustworthiness (adversarial robustness and interpretability) of neural models for software analysis.

We will explore recent advancements in the following topics enabled by ML (tentative topics):

  • Code representation and embeddings
  • Source code analysis
  • Code summarization
  • Test input generation
  • Fuzz testing
  • Oracle inference
  • Fault localization
  • Program (bug) repair
  • Regression testing
  • Security testing and vulnerability detection
  • Code completion
  • Clone detection

Class Organization

This is a research-oriented seminar course with three major components:

  1. Lectures: The first two weeks will be a crash course with lectures on advanced topics in SE, with a focus on program analysis and software testing. The purpose of these lectures is to provide a background on the topics and identify the open problems that can be potentially solved using ML. Some lectures are accompanied by live coding, where we will work with some tools that you will use for your project.  

  2. Paper review and presentation: The students should read and review the research papers related to the course topics (applications of ML techniques in software engineering and analysis). Summary submissions are due before each class and should be submitted on a corresponding thread on Campuswire. In addition, each student is expected to lead "one" in-class discussion related to the papers. Leading a discussion requires reading the paper in-depth and prepare a presentation.  

  3. Research project: There will be a semester-long research project. The goal of this semester's project is to assess the trustworthiness of neural models for program analysis. The project is designed to be delivered into multiple milestones during the semester. PhD students are expected to produce a paper from their projects. Master's and undergrad students are encouraged but not require to write a paper. Instead, they should write a report that explains their progress on different project milestones. 

Requirements

This is not an introductory course, and students are expected to have a basic knowledge about (1) Machine Learning, Deep Learning, and Reinforcement Learning, as well as (2) software analysis, software testing, and software debugging is required. You will get more in-depth knowledge about these topics as you work on your research projects. Please note that the course will be very demanding. All students that are willing to work hard and dive into cutting-edge ML for SE research are welcome to take the course.

Grading Policy

We will compute the final grade using the following tentative criteria. Additional instructions will be given throughout the semester during lectures or posts on the Piazza.

Activity

 

Grade

 

Details

 

Class participation

 

10%

 

This is a discussion-based course, and students are expected to show up in the class meetings and participate in the discussion.

 

Paper review and discussion

 

15%

 

You should submit a review of 500-1000 words for each paper. Your review should include problem definition, solution, and supporting results. There is no late submission penalty, but reviewing the paper helps you engage in the discussions and get the full credit. 

 

Paper presentation and discussion lead

 

15%

 

You should select three topics that you are interested in to present their related paper. Please refer to the class schedule table and look at the classes whose topic is discussion to find out about the topics. You are welcome to suggest new topics. Select your preferences by the end of the first week (1/23). When you present, your presentation should (1) describe the problem, (2) explain the intuition behind the solution, (3) describe the solution and proposed technique, and (4) discuss the results.

 

Research project

 

60%

 

There are three milestones for the research project throughout the semester. In other words, your course project has been divided into three sections and as you finish each one, you make a progress towards the project. Undergraduate and Master's students are required to submit a report for their project. PhD students are expected to write a submission quality paper for their projects. Ideally, Undergraduate and Master's students team up to produce a high-quality research paper. Depending on the size of class, you can work on projects individually or in a group.

Milestone 1 - Select subject projects and extract the code representations (15%)

Milestone 2 - Task selection and model training (15%)

Milestone 3 - Assessing trustworthiness of the model (30%)


Feel free to reach out to the instructor at reyhaneh@illinois.edu if you have any questions or feedback.

Class Schedule (Tentative)

To download the slides and access files, please use your Google Apps @Illinois credentials.

         

Date

 

Topic

 

Details

         
 

1/18

 

Introduction (Lecture)

  • Overview on the class organization, logistics, and expectations

 

Presenter

 

Presentation topic selection

Due: 1/23 Midnight

 

1/20

 

Program analysis (Lecture and live coding)

  • Static code representation
  • Static analysis
  • Working with Soot program analysis tool

 

Presenter

 

 

1/25

 

Testing (Lecture)

  • Fuzzing
  • Search-based testing
  • Symbolic and concolic execution

 

Presenter

 

 

1/27

 

Testing and Debugging (Lecture)

  • Test oracle
  • Mutation testing
  • Bug localization
  • Automated program repair

 

Presenter

 

 

Project Milestone 1: code representation extraction

  • Milestone description will be posted on Campuswire

Due: 2/20 Midnight

 

2/1

 

Vocabulary models for static source code representation

 

Presenter

 

 

2/3

 

Distributed representation of the code

 

Presenter

 

 

2/8

 

Static code representation and embedding

 

Presenter

 

 

2/10

 

Dynamic embedding

 

Presenter

 

 

2/15

 

No discussions - Work on research project

 

 

 

2/17

 

No discussions - Work on research project

 

 

 

Project Milestone 2: task selection and model training

  • Milestone description will be posted on Campuswire

Due: 3/6 Midnight

 

2/22

 

Relational Models of Source Code

 

Presenter

 

 

2/24

 

Static program execution and reasoning

 

Presenter

 

 

3/1

 

No discussions - Work on research project

 

 

 

3/3

 

No discussions - Work on research project

 

 

 

Project Milestone 3: assessing trustworthiness

  • Milestone description will be posted on Campuswire

Due: 4/24 Midnight

 

3/8

 

Grammar-baed input generation

 

Presenter

 

3/10

 

Fuzzing

 

Presenter

 

 
 

3/15

 

Spring Break

 

 

 

3/17

 

Spring Break

 

 

 

3/22

 

ML-enable testing of ML-enabled systems

 

Presenter

 

 

3/24

 

Assertion inference for unit tests

 

Presenter

 

 

3/29

 

Non-functional test oracle construction

 

Presenter

 

 

3/31

 

Bug seeding

 

Presenter

 

 

4/5

 

Defect prediction based on code changes

 

Presenter

 

 

4/7

 

Defect prediction

 

Presenter

 

 

4/12

 

Spectrum-based bug localization

 

Presenter

 

 

4/14

 

Program repair

 

Presenter

 

 

4/19

 

Program repair (compilers)

 

Presenter

 

 

4/21

 

Code search

 

Presenter

 

 

4/26

 

Test case prioritization

 

Presenter

 

 

Project Milestone: paper draft submission

  • Milestone description will be posted on Campuswire

Due: 5/8 Midnight

 
 
             
 

Design by: Reyhaneh Jabbarvand