greeting = "Hello, EG4338"
print(greeting)Hello, EG4338
Introduction and Background
Yike Zhang
June 1, 2026
This supplementary material page gets your hands on the keyboard so the rest of the course has somewhere to stand. Read it top to bottom, and when you reach a code block, try to understand it first and predict the output before you read it.
Almost everything we do this summer happens inside a notebook: a document that mixes prose, code, and the output of that code in one scrollable page. The page you are reading right now is itself a notebook. Each grey block is a code cell, and the result printed underneath it is what Python produced when that cell ran.
The mental model worth keeping is that a notebook runs from top to bottom, and the cells share memory. A variable you define near the top is still available much later on the page. That is convenient, but it also means order matters: if you run cells out of order while experimenting, you can confuse yourself. When in doubt, restart and run everything from the top.
Notice we did not have to declare a type or end the line with a semicolon. Python keeps the syntax light so you can spend your attention on the data, which is the whole point of the course.
A variable is just a name pointing at a value. The four building-block types you will reach for constantly are integers, floating-point numbers, strings, and booleans.
<class 'int'> <class 'float'> <class 'str'> <class 'bool'>
Python figures out the type from the value, which is called dynamic typing. You can always ask what type something is with type(...), and that habit will save you a surprising amount of debugging later when a number is secretly a string.
Arithmetic looks the way you would expect, with one wrinkle worth memorizing: a single slash always gives a float, and a double slash does integer (floor) division.
Text shows up everywhere in data work, from column names to the messy free-form text we will feed language models in Module 3. The most useful tool for building strings is the f-string: put an f before the opening quote and you can drop variables straight into the text inside curly braces.
Ada scored 91.27 points
Ada scored 91.3 points
ADA in Data Science
Strings carry a deep toolbox of methods. A few you will use this week and forever after:
A list is an ordered, changeable collection. This is your default container when you have several things and the order matters.
88
84
5
Slicing pulls out a run of elements with start:stop. The rule that trips up everyone at first is that the stop index is not included. We will meet this exact rule again in Week 3 when we compare pandas iloc to loc, so it is worth burning in now.
[92, 79]
[88, 92]
[79, 95, 84]
Lists are happy to grow and change in place.
A dictionary maps keys to values. Reach for it when each piece of data has a name rather than a position. A row of a spreadsheet is naturally a dictionary: column name to cell value.
A for loop walks through a collection one element at a time. An if statement runs code only when a condition holds. Together they let you make decisions across a dataset.
88: B
92: A
79: needs review
95: A
84: B
When the goal is to build a new list out of an old one, Python offers a tighter form called a list comprehension. It reads almost like the English sentence “the square of x for each x in scores”.
A function packages a piece of logic behind a name so you can reuse it and read your own code months later. Define it once with def, then call it as often as you like.
A
['B', 'A', 'C', 'A', 'B']
The triple-quoted line just under def is a docstring. It is optional, but writing one short sentence about what the function does is a habit that pays for itself.
Python on its own is deliberately small. The real power comes from libraries that other people have written, which you pull in with import. The whole second half of this course leans on a handful of them, and you will see this same import block at the top of nearly every page.
numpy 2.4.6
pandas 3.0.3
The as np part gives the library a short nickname so you do not have to type the full name every time. The nicknames np for numpy and pd for pandas are universal conventions. Use them, and any data scientist will instantly recognize your code.
Let us close with a 30-second preview of where Module 2 is headed. NumPy gives us fast arrays of numbers, and pandas gives us the DataFrame: a table with named columns, which is the single most important object in the course.
import numpy as np
import pandas as pd
# A reproducible random-number generator. The seed makes the "random"
# numbers come out the same every time the page is rendered, which is
# exactly what we want for teaching.
rng = np.random.default_rng(seed=42)
df = pd.DataFrame({
"student": ["Ada", "Babbage", "Curie", "Dirac", "Euler"],
"score": rng.integers(low=70, high=100, size=5),
"hours_studied": rng.integers(low=2, high=12, size=5),
})
df| student | score | hours_studied | |
|---|---|---|---|
| 0 | Ada | 72 | 10 |
| 1 | Babbage | 93 | 2 |
| 2 | Curie | 89 | 8 |
| 3 | Dirac | 83 | 4 |
| 4 | Euler | 82 | 2 |
That object is a DataFrame. It already knows how to summarize itself:
| score | hours_studied | |
|---|---|---|
| count | 5.000000 | 5.00000 |
| mean | 83.800000 | 5.20000 |
| std | 7.981228 | 3.63318 |
| min | 72.000000 | 2.00000 |
| 25% | 82.000000 | 2.00000 |
| 50% | 83.000000 | 4.00000 |
| 75% | 89.000000 | 8.00000 |
| max | 93.000000 | 10.00000 |
And it can answer a question with a single readable line. “Show me only the students who scored at least 85” looks like this:
If that last line feels a little magical, good. Pulling exactly the rows you want out of a table is the heart of Week 3, and by the end of Module 2 you will write lines like that without thinking. For now, the takeaway from Week 1 is simply that the tools are installed, code cells run, and a table of data is something Python can hold and question.
Re-open this page and change things. Swap the seed in default_rng, raise the cutoff in the filter from 85 to 95, add a classmate to the DataFrame. Nothing here can break in a way that a restart will not fix, so experiment freely. The fastest way to get comfortable is to make a small change, predict the output, then run it.