Overview
EG4338/EG6338 Data Science and Large Language Models

Course Summary
Data science combines data, computing, and critical thinking to help people and organizations solve complex problems and better understand the world around them. This beginner-friendly course serves as a bridge between introductory data science and more advanced courses in computer science, such as large language models. In this class, we will explore the key steps of the data science process, including how to form meaningful questions, collect and clean data, visualize information, perform statistical analysis, leverage large language models, and make informed decisions based on data. The course emphasizes hands-on data analysis and practical problem solving. Students will learn how to use Python programming libraries to work with data, create clear and informative visualizations, understand basic statistical concepts, and use efficient methods for analyzing various datasets.
Prerequisites: None.
Cross Listing Course
EG6338
Data science combines data, computing, and critical thinking to help people and organizations solve complex problems and better understand the world around them. This beginner-friendly course serves as a bridge between introductory data science and more advanced courses in computer science, such as large language models. In this class, we will explore the key steps of the data science process, including how to form meaningful questions, collect and clean data, visualize information, perform statistical analysis, leverage large language models, and make informed decisions based on data. The course emphasizes hands-on data analysis and practical problem solving. Students will learn how to use Python programming libraries to work with data, create clear and informative visualizations, understand basic statistical concepts, and use efficient methods for analyzing various datasets. EG4338 and EG6338 are similar. Graduate students must register for EG6338.
Prerequisites: None.
Course Objectives
After this course, you should be able to …
- Formulate meaningful questions that can be analyzed using data.
- Collect, organize, and clean datasets.
- Use programming tools to explore and analyze data.
- Create clear and informative data visualizations.
- Apply basic statistical concepts to interpret data and evaluate results.
- Communicate data insights clearly using visualizations and written explanations.
Please refer to the class Schedule for weekly updates and learning objectives. This is the central page for the course, where you will also find the Syllabus, Instructor information, and other study materials. Note that the schedule page is subject to change, and the most up-to-date version will always be posted on the course website. Be sure to check it regularly.
What You’ll Learn
The course content includes but not limited to …
- Introduction to data science
- Data collection, cleaning, and preprocessing techniques
- Data visualization techniques
- Basic statistical understanding and analysis methods
- Communicating findings through data reports and visualizations
- Introduction to large language models
- Ethical considerations in machine learning models