Reyhaneh Jabbarvand

 
  Home
 
   
             
 

Course Title 

CS598: Machine Learning for Software Engineering

Course Information

Instructor: Reyhaneh Jabbarvand

Email: reyhaneh@illinois.edu

Class Time: Tuesday/Thursday 09:30 am -- 10:45 am (Instructions starts sharply at 9:30, so please join the meeting at least 5 minutes before the start time)

Office hours: Tuesday/Thursday 10:45 am -- 11:45 am

Zoom link*: https://illinois.zoom.us/j/87583722772

Forum: TBD (All the notifications and announcements will be posted on Piazza)

* All the lectures and class sessions will be recorded and available on the Illinois Media Space (links are available on Piazza after each class). Class participation is mandatory. If you have any concern or situation that prevents you from attending the lectures, please reach out to me.

Course Description and Objectives

This course aims to help students explore and understand the applications of machine learning (ML) to solve real-world software engineering (SE) problems. Students will obtain knowledge about (1) fundamentals and advanced topics in SE and (2) how ML and data mining techniques can be used at different stages of software development to ensure the quality and reliability of software. After finishing this course, students are expected to know how to:

  • Analyze related work in the area of ML for SE
  • Investigate and find appropriate methods to represent the source code, test suite, or relevant program properties for consumption by ML algorithms for a specific problem
  • Generate an appropriate dataset to train the ML models
  • Evaluate the proposed methods on real-world programs and analyze the result
  • Write an academic paper in the area of ML for SE

We will explore recent advancements in the following topics enabled by ML (tentative topics):

  • Code representation and embeddings
  • Source code analysis
  • Code summarization
  • Test input generation
  • Fuzz testing
  • Oracle inference
  • Metamorphic testing
  • Fault localization
  • Program (bug) repair
  • Regression testing
  • Security testing and vulnerability detection
  • Code completion
  • Clone detection
  • Code obfuscation 

Class Organization

This is a research-oriented seminar course with three major components:

  1. Lectures: The first two weeks will be a crash course with lectures on advanced topics in SE (specifically, program analysis and software testing). The purpose of these lectures is to provide a background on the topics and identify the open problems that can be potentially solved using ML. 
  2. Paper review and presentation: The students should read and summarize the research papers related to the course topics (applications of ML techniques in SE). Summary submissions are due before each class and should be submitted through a google form. In addition, each student is expected to lead at least one in-class discussion (depending on the size of the class) related to the papers. Leading a discussion requires a 20-30 minutes presentation of the paper. 
  3. Research project: There will be a semester-long research project. Students (individually or in a group of two) can either choose from the available topics provided by the instructor or choose the topic of their choice, as long as it is related to the course topics. The goal of this research project for students is to (1) propose a technique to solve a SE problem with some form of ML technique, (2) implement their proposed techniques, and (3) submit a report including preliminary results that demonstrate the effectiveness of the proposed technique. Each team is supposed to provide a quick informal update about their progress each week. 

Requirements

This is not an introductory course, and students are expected to have a basic knowledge about (1) Machine Learning, Deep Learning, and Reinforcement Learning, as well as (2) software analysis, software testing, and software debugging is required. You will get more in-depth knowledge about these topics as you work on your research projects. Please note that the course will be very demanding. All students that are willing to work hard and dive into cutting-edge ML for SE research are welcome to take the course.

Grading Policy

We will compute the final grade using the following tentative criteria. Additional instructions will be given throughout the semester during lectures or posts on the Piazza.

Activity

 

Grade

 

Details

 

Class participation

 

10%

 

This is a discussion-based course, and students are expected to show up in the synchronous online class meetings and participate in the discussion.

 

Paper review and discussion

 

20%

 

1- You should submit a summary of 500-1000 words for each paper. Your review should include problem definition, solution, and supporting results. 

2- 10% of the grade for this activity is for review submission and 10% for engaging in the discussion. There is no late submission penalty, but reviewing the paper helps you engage in the discussions and get the full credit.

 

Paper presentation and discussion lead

 

20%

 

1- You should select at least five topics that you are interested in to present their related paper. Please refer to the topic table below to find out about tentative topics. You are welcome to suggest new topics.

2- Email your preferences to the instructor by the end of the first week. The instructor will assign you at least one topic and a paper to present.

3- You should prepare a 20-30 minutes presentation of the paper. Your presentation should (1) describe the problem, (2) explain the intuition behind the solution, (3) describe the solution and proposed technique, and (4) discuss the results. 

* Depending on the size of the class, we may either have co-presentation or two presentations in one session.

 

Research project

 

50%

 

1- You should identify your research problem and submit a 1-2 page proposal by the end of week 5 (end of February). Feel free to reach out to the instructor to discuss your choices and select the topic. (10%)

2- Each student (or a team of two) should provide a quick informal 3-5 minutes update about their progress each week on Tuesdays (what they have done and what is the next step). Please reach out during office hours or email the instructor if you have any issues or questions regarding the project. 

3- There are three formal presentations and three report submissions of the research project throughout the semester. One at the last week of March, one at the last week of April, and one final presentation at the end of the semester. 

  • The first presentation and report should identify the problem and explain the importance of the problem (why it worth solving this problem). (10%)
  • The second presentation and report should be about potential solutions for the problem (what is the intuition behind the solution and why the proposed solution is the best for the identified problem). (10%)
  • The final presentation and report should include preliminary results demonstrating the effectiveness of the proposed technique for a problem. (20%)

 

Note: The code of your project should be hosted on a private Github repository.


Feel free to reach out to the instructor at reyhaneh@illinois.edu if you have any questions or feedback.

Tentative Schedule

TBD

  

 
             
 

Design by: Reyhaneh Jabbarvand