Reyhaneh Jabbarvand

 
  Home
 
   
             
 

Course Title 

CS598: Machine Learning for Software Engineering

Course Information

Instructor: Reyhaneh Jabbarvand

Email: reyhaneh@illinois.edu

Class Time: Tuesday/Thursday 09:30 am -- 10:45 am (Instructions starts sharply at 9:30, so please join the meeting at least 5 minutes before the start time)

Office hours: Tuesday/Thursday 10:45 am -- 11:45 am

Zoom link*: https://illinois.zoom.us/j/87583722772

Forum: http://piazza.com/illinois/spring2021/cs598jbr (All the notifications and announcements will be posted on Piazza)

* All the lectures and class sessions will be recorded and the links for recordings will be posted on Piazza after each class. Class participation is mandatory. If you have any concern or situation that prevents you from attending the lectures, please reach out to me.

Course Description and Objectives

This course aims to help students explore and understand the applications of machine learning (ML) to solve real-world software engineering (SE) problems. Students will obtain knowledge about (1) fundamentals and advanced topics in SE and (2) how ML and data mining techniques can be used at different stages of software development to ensure the quality and reliability of software. After finishing this course, students are expected to know how to:

  • Analyze related work in the area of ML for SE
  • Investigate and find appropriate methods to represent the source code, test suite, or relevant program properties for consumption by ML algorithms for a specific problem
  • Generate an appropriate dataset to train the ML models
  • Evaluate the proposed methods on real-world programs and analyze the result
  • Write an academic paper in the area of ML for SE

We will explore recent advancements in the following topics enabled by ML (tentative topics):

  • Code representation and embeddings
  • Source code analysis
  • Code summarization
  • Test input generation
  • Fuzz testing
  • Oracle inference
  • Metamorphic testing
  • Fault localization
  • Program (bug) repair
  • Regression testing
  • Security testing and vulnerability detection
  • Code completion
  • Clone detection
  • Code obfuscation 

Class Organization

This is a research-oriented seminar course with three major components:

  1. Lectures: The first two weeks will be a crash course with lectures on advanced topics in SE (specifically, program analysis and software testing). The purpose of these lectures is to provide a background on the topics and identify the open problems that can be potentially solved using ML. 
  2. Paper review and presentation: The students should read and review the research papers related to the course topics (applications of ML techniques in SE). Summary submissions are due before each class and should be submitted on a corresponding thread on Piazza. In addition, each student is expected to lead at least one in-class discussion (depending on the size of the class) related to the papers. Leading a discussion requires a presentation of the paper. 
  3. Research project: There will be a semester-long research project. Students (individually or in a group of two) can either choose from the available topics provided by the instructor or choose the topic of their choice, as long as it is related to the course topics. The goal of this research project for students is to (1) propose a technique to solve a SE problem with some form of ML technique, (2) implement their proposed techniques, and (3) submit a report including preliminary results that demonstrate the effectiveness of the proposed technique. Each team is supposed to provide a quick informal update about their progress each week. 

Requirements

This is not an introductory course, and students are expected to have a basic knowledge about (1) Machine Learning, Deep Learning, and Reinforcement Learning, as well as (2) software analysis, software testing, and software debugging is required. You will get more in-depth knowledge about these topics as you work on your research projects. Please note that the course will be very demanding. All students that are willing to work hard and dive into cutting-edge ML for SE research are welcome to take the course.

Grading Policy

We will compute the final grade using the following tentative criteria. Additional instructions will be given throughout the semester during lectures or posts on the Piazza.

Activity

 

Grade

 

Details

 

Class participation

 

10%

 

This is a discussion-based course, and students are expected to show up in the synchronous online class meetings and participate in the discussion.

 

Paper review and discussion

 

15%

 

You should submit a review of 500-1000 words for each paper. Your review should include problem definition, solution, and supporting results. For each class session, there are two papers to read. One is a SE engineering paper that discuss the problem, the other one is ML4SE that tries to solve the problem using ML. You are required to read the introductory sections of SE paper, but the review is required for the ML4SE paper. 


10% of the grade for this activity is for review submission and 5% for engaging in the discussion. There is no late submission penalty, but reviewing the paper helps you engage in the discussions and get the full credit.

 

Paper presentation and discussion lead

 

20%

 

You should select at least five topics that you are interested in to present their related paper. Please refer to the topic table below to find out about tentative topics. You are welcome to suggest new topics.

Email your preferences to the instructor by the end of the first week. The instructor will assign you at least one topic and a paper to present.

You should prepare a presentation of the paper. Your presentation should (1) describe the problem, (2) explain the intuition behind the solution, (3) describe the solution and proposed technique, and (4) discuss the results.

* Depending on the size of the class, we may either have co-presentation or two presentations in one session.

 

Research project

 

55%

 

There are three formal presentations and three report submissions of the research project throughout the semester.

Step 1- Identify the problem, possible solutions, and plans for the evaluation (10%)

  • Choose a topic, confirm it with
  • Deliverables: project proposal + 10-15 minutes presentation
  • Deadline: February 25th

 

Step 2- Find a proper solution (15%)

  • What is the solution and why it is the best one for a given problem
  • Deliverables: draft of the paper (3-4 pages) + 10-15 minutes presentation
  • Deadline: March 26th

Step 3- Implement and evaluate the approach (30%)

  • Results should demonstrate the effectiveness of the proposed
  • Deliverables: draft of the paper (6-8 pages) + 20 minutes presentation + implementation source code
  • Deadline: May 4th

Each student (or a team of two) should provide a quick informal 3-5 minutes update about their progress each week on Tuesdays (what they have done and what is the next step). Please reach out during office hours or email the instructor if you have any issues or questions regarding the project.

Note: The code of your project should be hosted on a private Github repository.


Feel free to reach out to the instructor at reyhaneh@illinois.edu if you have any questions or feedback.

Class Schedule

The recording of lectures is available on Piazza (thread “Asynchronous class recording videos”). To download the slides, please use your Google Apps @Illinois credentials).

         

Date

 

Topic

 

Details

         
 

1/26

 

Introduction

  • Overview on the class organization, logistics, and expectations

 

Presenter


Assignments (due February 5th)
  • Introduce yourself and participate in the poll on Piazza

 

1/28

 

Program analysis

  • Code representation
  • Static analysis

 

Presenter

 

 

2/2

 

Testing

  • Coverage criteria
  • Fuzz testing
  • Search-based testing
  • Symbolic and concolic execution

 

Presenter

 

 

2/4

 

Testing and Debugging

  • Mutation testing
  • Test oracle
  • Bug localization
  • Automated program repair

 

Presenter

 

 

2/9

 

Non-functional test oracle construction

 

Presenter

 

 

2/11

 

 

Deadline to submit your presentation topic preferences

 

 

2/11

 

Assertion inference for unit tests

 

Presenter

 

 

2/16

 

Vocabulary models for source code representation

 

Presenter

 

 

2/18

 

Relational Models of Source Code

 

Presenter

 

 

2/23

 

Code search enhanced by deep learning

 

Presenter

 

 

2/25

 

 

Deadline to submit your proposal (step 1)

 

 

2/25

 

Compiler bug localization

 

Presenter

 

 

3/2

 

Proposal presentation and discussion

 

 

 

3/4

 

Text input generation

 

Presenter

 

 

3/9

 

Static program execution and reasoning

 

Presenter

 

 

3/11

 

AI-enabled input and oracle generation

 

Presenter

 

 

3/16

 

Fuzzing

 

Presenter

 

 

3/18

 

Program repair

 

Presenter

 

 

3/23

 

Binary analysis

 

Presenter

 

 

3/25

 

Inferring input grammars

 

Presenter

 

 

3/26

 

 

Deadline to submit midterm report (step 2)

 

 

3/30

 

Research progress presentation and discussion

 

 

 

4/1

 

Program repair (compilers)

 

Presenter

 

 

4/6

 

Test suite minimization

 

Presenter

 

 

4/8

 

Test case prioritization

 

Presenter

 

 

4/15

 

Code clone detection

 

Presenter

 

 

4/20

 

Code obfuscation

 

Presenter

 

 

4/22

 

Code change

 

Presenter

 

 

4/27

 

Code embedding

 

Presenter

 

 

4/29

 

Code embedding

 

Presenter

 

 

5/4

 

Research progress presentation and discussion

 

 

5/6

 

Research progress presentation and discussion

 

 

 

5/9

 

 

Deadline to submit final report (step 3)

 

 
             
 

Design by: Reyhaneh Jabbarvand