Project

EG4338/EG6338 Data Science and Large Language Models

Author

Yike Zhang

Published

June 1, 2026

Overview

Each student will develop and present an individual coding-based project that applies the concepts learned in this class. Your project should demonstrate end-to-end work: framing a problem, collecting data, applying appropriate models or methods, and communicating your results. Below are several suggested project ideas. You are welcome to propose your own topic, but it must be approved by the instructor in advance. The following project ideas outline the minimal expectations for each project, and you are encouraged to go beyond these requirements to make your project more interesting and impactful.

TipDeliverables
  • A Jupyter notebook (or any IDEs) showing your analysis.
  • An in-class presentation with slides, followed by Q&A.
  • Submit the code, slides, and other materials to Canvas by the end of Week 9.

Project Developement

For the presentations, the instructor encourages students to use AI tools ethically to assist with brainstorming, code writing, debugging, and presentation preparation. Here is a general guide on how to ethically use AI tools for your project presentations in this class:

  • Use AI tools as a learning assistant to help you understand concepts in Python and troubleshoot issues, rather than as a shortcut to complete the project without putting in any effort.
  • If you use AI-generated code, be transparent about it in your presentation. You can mention that you used AI tools to assist with certain parts of the code, but make sure to explain the logic and design choices in your own words.
  • Be prepared to answer questions about the code you wrote (with and without AI assistance), and make sure you can explain how it works and why you chose to use it in your project. You should take ownership and responsibility for the code you present, regardless of whether it was generated by you or with the help of AI tools.

Project Topics

Project 1 - Data Analysis and Storytelling Dashboard

Project Goal: Pick a real-world public dataset and produce a clear, well-visualized data story that uncovers non-obvious patterns.

Minimal Expectations:

  • Choose a dataset from a reputable source (e.g., Kaggle, data.gov, the World Bank, or a domain-specific repository).
  • Clean and wrangle the data using pandas and regular expressions (Module 2).
  • Produce at least five high quality figures using matplotlib and seaborn.
  • Frame 2–3 specific questions and answer them with evidence from your visualizations.

Project 2 - Sentiment Analysis with LLM Approaches

Project Goal: Apply a pretrained large language model on a sentiment classification task and analyze its performance.

Minimal Expectations:

  • Collect a labeled text dataset (e.g., IMDB reviews, Amazon product reviews, tweets about a chosen topic).
  • Apply a pretrained LLM-based classifier (e.g., a model from Hugging Face).
  • Evaluate the model’s accuracy and provides some insights of the results.
  • Do a qualitative error analysis: pick 10 examples generate by the model and show the outputs.

Project 3 - Text Summarization and Information Extraction Tool

Project Goal: Build a tool that ingests long-form text (news articles, research abstracts, meeting transcripts, etc.) and produces a structured summary.

Minimal Expectations:

  • Collect a corpus of ~10–30 long-form documents in a domain you care about.
  • Use a pretrained summarization model (e.g., bart-large-cnn) to generate summaries.
  • Extract structured fields (e.g., key entities, dates, action items).
  • Compare model-generated summaries to human-written reference summaries visually.
  • Visualize trends across your corpus (e.g., topic frequency over time).

Other Projects - Proposing Your Own Topic

If none of the project ideas above fit your interests, you may propose your own. A good proposal should clearly answer:

  1. What is the question or problem you are addressing?
  2. What data will you use, and how will you obtain it?
  3. What methods from this course will you apply?
  4. What does success look like (i.e., how will you evaluate your results)?

Email your proposal to the instructor by Week 6 for approval.