Reading about DataFrames feels comfortable. Cleaning real data still feels messy.
That’s the exact gap these 100 Pandas practice problems are designed to close—hands-on, messy, and incredibly rewarding.
What’s inside:
- 100 carefully crafted challenges, from basic Series and DataFrame operations to groupby, merges, pivots, and time series.
- Clear, step-by-step solutions that never just show the answer—they explain the why behind every transformation.
- Problems rooted in everyday data tasks: filtering, handling missing values, cleaning strings, and reshaping data for analysis.
How to practice for real growth:
- Read the problem written in plain, jargon-free language.
- Try solving it yourself — struggle a little, that’s where learning sticks.
- Check the solution and truly understand the logic.
- Repeat until data manipulation feels less like a chore and more like a superpower.
Why does deliberate practice matter so much? Because data analysis isn’t about memorizing functions — it’s about building intuition. Every problem you solve sharpens your ability to spot patterns, fix messy datasets, and write clean, efficient Pandas code. That’s what turns a Python user into a confident data analyst — and what stands out in interviews and real projects.
Whether you’re falling in love with data or preparing for your next big opportunity, these exercises will make you feel capable, prepared, and genuinely excited about what you can do. Pick a problem, open a notebook, and let your “I can really do this!” moment arrive.
Also try it: 100 NumPy practice problems with solutions
1. Create a Pandas Series from a list [1, 2, 3, 4, 5] and print it.
python
import pandas as pd s = pd.Series([1, 2, 3, 4, 5]) print(s)
2. Create a Pandas DataFrame from a dictionary: {‘Name’: [‘Alice’,’Bob’], ‘Age’: [25,30]}.
python
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
print(df)
3. Read a CSV file named ‘data.csv’ and display the first 5 rows.
python
df = pd.read_csv('data.csv')
print(df.head())
4. Write the DataFrame df to a CSV file named ‘output.csv’ without the index.
python
df.to_csv('output.csv', index=False)
5. Get the number of rows and columns of a DataFrame df.
python
rows, cols = df.shape
print(f"Rows: {rows}, Columns: {cols}")
6. Display the column names of a DataFrame df.
python
print(df.columns)
7. Check the data types of each column in a DataFrame df.
python
print(df.dtypes)
8. Get a quick statistical summary of a DataFrame df (numeric columns only).
python
print(df.describe())
9. Select a single column ‘Age’ from a DataFrame df as a Series.
python
age_series = df['Age'] print(age_series)
10. Select multiple columns [‘Name’, ‘Age’] from a DataFrame df.
python
subset = df[['Name', 'Age']] print(subset)
11. Filter rows where the ‘Age’ column is greater than 25.
python
filtered = df[df['Age'] > 25] print(filtered)
12. Add a new column ‘City’ with constant value ‘New York’ to a DataFrame df.
python
df['City'] = 'New York'
13. Update the ‘Age’ column by adding 1 to every value.
python
df['Age'] = df['Age'] + 1
14. Drop a column named ‘City’ from a DataFrame df.
python
df = df.drop('City', axis=1)
15. Drop rows with any missing values (NaN) in a DataFrame df.
python
df = df.dropna()
16. Fill missing values in column ‘Age’ with the mean of that column.
python
df['Age'].fillna(df['Age'].mean(), inplace=True)
17. Rename the column ‘Name’ to ‘FullName’ in a DataFrame df.
python
df.rename(columns={'Name': 'FullName'}, inplace=True)
18. Sort a DataFrame df by the ‘Age’ column in descending order.
python
df_sorted = df.sort_values('Age', ascending=False)
19. Group a DataFrame df by column ‘City’ and compute the mean of ‘Age’.
python
grouped = df.groupby('City')['Age'].mean()
print(grouped)
20. Create a new DataFrame from two Series: names = pd.Series(['A','B']) and scores = pd.Series([90,85]).
python
df = pd.DataFrame({'Name': names, 'Score': scores})
print(df)
21. Apply a lambda function that squares each element in a column ‘Value’.
python
df['Squared'] = df['Value'].apply(lambda x: x**2)
22. Use map to change category codes in a column: replace {‘M’:’Male’,’F’:’Female’}.
python
df['Gender'] = df['Gender'].map({'M': 'Male', 'F': 'Female'})
23. Create a DataFrame with dates as index from ‘2024-01-01’ to ‘2024-01-05’ and random values.
python
import numpy as np
dates = pd.date_range('2024-01-01', periods=5)
df = pd.DataFrame(np.random.randn(5, 2), index=dates, columns=['A', 'B'])
print(df)
24. Select rows between two dates ‘2024-01-02’ and ‘2024-01-04’ from a DataFrame with datetime index.
python
subset = df['2024-01-02':'2024-01-04'] print(subset)
25. Resample a time series DataFrame to weekly frequency and compute the mean.
python
df_weekly = df.resample('W').mean()
26. Compute the rolling (moving) average of a column with window size 3.
python
df['rolling_avg'] = df['Value'].rolling(window=3).mean()
27. Create a pivot table from DataFrame df with index=’Category’, columns=’Year’, values=’Sales’, aggfunc=’sum’.
python
pivot = pd.pivot_table(df, values='Sales', index='Category', columns='Year', aggfunc='sum') print(pivot)
28. Merge two DataFrames df1 and df2 on a common column ‘ID’ with inner join.
python
merged = pd.merge(df1, df2, on='ID', how='inner')
29. Concatenate two DataFrames df1 and df2 vertically (stack rows).
python
concat = pd.concat([df1, df2], axis=0, ignore_index=True)
30. Concatenate two DataFrames df1 and df2 horizontally (side by side).
python
concat = pd.concat([df1, df2], axis=1)
31. Find the correlation matrix of a DataFrame df with numeric columns.
python
corr = df.corr() print(corr)
32. Get the number of unique values in a column ‘Product’.
python
n_unique = df['Product'].nunique() print(n_unique)
33. Count the frequency of each unique value in column ‘Category’.
python
freq = df['Category'].value_counts() print(freq)
34. Use isin to filter rows where column ‘Country’ is in a list [‘USA’,’Canada’].
python
filtered = df[df['Country'].isin(['USA', 'Canada'])]
35. Use between to filter rows where ‘Price’ is between 10 and 20.
python
filtered = df[df['Price'].between(10, 20)]
36. Create a DataFrame from a dictionary of lists and set a specific column as index.
python
df = pd.DataFrame({'ID': [1,2,3], 'Name': ['A','B','C']})
df.set_index('ID', inplace=True)
37. Reset the index of a DataFrame to default integer index.
python
df.reset_index(drop=True, inplace=True)
38. Use loc to select rows with index labels 2 to 4 and columns ‘A’ and ‘B’.
python
subset = df.loc[2:4, ['A', 'B']]
39. Use iloc to select the first 3 rows and the first 2 columns.
python
subset = df.iloc[:3, :2]
40. Create a copy of a DataFrame to avoid chained assignment issues.
python
df_copy = df.copy()
41. Replace all occurrences of 0 in a column ‘Quantity’ with NaN.
python
df['Quantity'] = df['Quantity'].replace(0, np.nan)
42. Use where to keep values >0 in a column and set others to NaN.
python
df['Positive'] = df['Value'].where(df['Value'] > 0, np.nan)
43. Create a new column ‘AgeGroup’ based on ages: ‘Child’ (<18), ‘Adult’ (18-64), ‘Senior’ (>=65).
python
df['AgeGroup'] = pd.cut(df['Age'], bins=[0,17,64,100], labels=['Child','Adult','Senior'])
44. Use qcut to divide a column into 4 equal‑sized bins (quartiles).
python
df['Quartile'] = pd.qcut(df['Score'], q=4, labels=['Q1','Q2','Q3','Q4'])
45. Convert a string column ‘Date’ to datetime type.
python
df['Date'] = pd.to_datetime(df['Date'])
46. Extract the month name from a datetime column ‘Date’.
python
df['MonthName'] = df['Date'].dt.month_name()
47. Extract the day of week from a datetime column ‘Date’.
python
df['Weekday'] = df['Date'].dt.day_name()
48. Compute the difference in days between two datetime columns ‘Start’ and ‘End’.
python
df['Duration'] = (df['End'] - df['Start']).dt.days
49. Shift a column ‘Value’ by 2 rows forward (lag).
python
df['Shifted'] = df['Value'].shift(2)
50. Compute the cumulative sum of a column ‘Sales’.
python
df['CumSales'] = df['Sales'].cumsum()
51. Compute the rank of values in a column ‘Score’ (dense ranking).
python
df['Rank'] = df['Score'].rank(method='dense')
52. Use sample to randomly select 5 rows from a DataFrame.
python
sample_df = df.sample(n=5, random_state=42)
53. Use nlargest to get the top 3 rows based on ‘Value’.
python
top3 = df.nlargest(3, 'Value')
54. Use nsmallest to get the bottom 2 rows based on ‘Value’.
python
bottom2 = df.nsmallest(2, 'Value')
55. Drop duplicate rows based on all columns.
python
df = df.drop_duplicates()
56. Drop duplicate rows based on a subset of columns [‘Name’, ‘City’], keeping the first occurrence.
python
df = df.drop_duplicates(subset=['Name', 'City'], keep='first')
57. Get the unique values of a column as a NumPy array.
python
unique_vals = df['Product'].unique()
58. Convert a numeric column to integer type (e.g., float to int).
python
df['Quantity'] = df['Quantity'].astype(int)
59. Convert a column to categorical type to save memory.
python
df['Status'] = df['Status'].astype('category')
60. Use query to filter rows where ‘Age’ > 25 and ‘City’ == ‘London’.
python
filtered = df.query("Age > 25 and City == 'London'")
61. Use eval to create a new column ‘Total’ as ‘Price * Quantity’.
python
df.eval('Total = Price * Quantity', inplace=True)
62. Compute the percentage change between consecutive rows in a column ‘Value’.
python
df['PctChange'] = df['Value'].pct_change() * 100
63. Compute the cumulative product of a column ‘Factor’.
python
df['CumProd'] = df['Factor'].cumprod()
64. Create a new DataFrame by transposing an existing DataFrame (rows become columns).
python
df_transposed = df.T
65. Convert a DataFrame to a dictionary (records format).
python
records = df.to_dict('records')
66. Convert a DataFrame to a NumPy array (excluding index).
python
array = df.to_numpy()
67. Apply a function to multiple columns using apply with axis=1.
python
df['Sum'] = df.apply(lambda row: row['A'] + row['B'], axis=1)
68. Use groupby and agg to compute both mean and max of ‘Value’ per ‘Category’.
python
result = df.groupby('Category')['Value'].agg(['mean', 'max'])
print(result)
69. Use transform to add group mean as a new column without collapsing the DataFrame.
python
df['GroupMean'] = df.groupby('Category')['Value'].transform('mean')
70. Use filter to keep groups in a groupby where group sum of ‘Value’ > 100.
python
filtered = df.groupby('Category').filter(lambda g: g['Value'].sum() > 100)
71. Create a DataFrame with MultiIndex from two columns ‘Year’ and ‘Month’.
python
df.set_index(['Year', 'Month'], inplace=True)
72. Swap the levels of a MultiIndex DataFrame.
python
df.swaplevel(i=0, j=1, axis=0, inplace=True)
73. Use stack to convert columns to rows (pivot long format).
python
stacked = df.stack()
74. Use unstack to convert rows to columns (pivot wide format).
python
unstacked = df.unstack()
75. Read a JSON file ‘data.json’ into a DataFrame.
python
df = pd.read_json('data.json')
76. Export a DataFrame df to a JSON file ‘output.json’.
python
df.to_json('output.json', orient='records')
77. Read an Excel file ‘data.xlsx’ (sheet named ‘Sheet1’) into a DataFrame.
python
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
78. Write a DataFrame to an Excel file ‘output.xlsx’ without the index.
python
df.to_excel('output.xlsx', index=False)
79. Read a SQL query from a SQLite database into a DataFrame (example with sqlite3).
python
import sqlite3
conn = sqlite3.connect('database.db')
df = pd.read_sql_query("SELECT * FROM table", conn)
conn.close()
80. Write a DataFrame to a SQLite table (replace if exists).
python
import sqlite3
conn = sqlite3.connect('database.db')
df.to_sql('table', conn, if_exists='replace', index=False)
conn.close()
81. Detect duplicate rows (boolean mask) in a DataFrame.
python
dup_mask = df.duplicated() print(dup_mask)
82. Get the row index of the maximum value in column ‘Score’.
python
idx_max = df['Score'].idxmax() print(idx_max)
83. Get the row index of the minimum value in column ‘Score’.
python
idx_min = df['Score'].idxmin() print(idx_min)
Also try it: 100 SQL practice problems with solutions
84. Use where with a condition to replace values in a DataFrame.
python
df2 = df.where(df > 0, other=0) # replace negative values with 0
85. Use mask to replace values where condition is True.
python
df2 = df.mask(df < 0, 0) # same as above
86. Convert a DataFrame with a date column to a time series where date is the index.
python
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
87. Use at_time to select rows at a specific time (e.g., 09:00:00) from a datetime index.
python
subset = df.at_time('09:00:00')
88. Use between_time to select rows between two times (e.g., 09:00 and 17:00).
python
subset = df.between_time('09:00', '17:00')
89. Use first to get the first 3 days of a time series (requires sorted index).
python
first_3d = df.first('3D')
90. Use last to get the last 2 business days of a time series (regular days).
python
last_2d = df.last('2D')
91. Create a date range with business days only (excluding weekends).
python
bday_range = pd.date_range('2024-01-01', periods=10, freq='B')
print(bday_range)
92. Use to_period to convert datetime index to monthly period.
python
df_period = df.to_period('M')
93. Use asfreq to change frequency of time series to monthly and fill forward.
python
df_monthly = df.asfreq('M', method='ffill')
94. Compute the expanding (cumulative) max of a column.
python
df['ExpandingMax'] = df['Value'].expanding().max()
95. Compute the exponential weighted moving average of a column (span=5).
python
df['EWMA'] = df['Value'].ewm(span=5, adjust=False).mean()
96. Use cut to bin a numeric column into custom intervals and count.
python
bins = [0, 50, 100, 150] df['Bin'] = pd.cut(df['Score'], bins) counts = df['Bin'].value_counts() print(counts)
97. Use pd.testing.assert_frame_equal to check if two DataFrames are identical.
python
pd.testing.assert_frame_equal(df1, df2, check_dtype=True)
98. Use pd.option_context to temporarily change display options (e.g., max rows).
python
with pd.option_context('display.max_rows', 10):
print(df)
99. Use memory_usage to get memory usage of each column in bytes.
python
mem = df.memory_usage(deep=True) print(mem)
100. Create a DataFrame from a list of dictionaries and then melt it into a long format.
python
df = pd.DataFrame([{'Name':'A','Math':90,'Science':85}, {'Name':'B','Math':80,'Science':88}])
melted = pd.melt(df, id_vars=['Name'], var_name='Subject', value_name='Score')
print(melted)
Final Thought
Pandas is a tool that rewards practice. Every .groupby(), every melt and pivot, every null you hunted down added another layer of confidence. The messy data sets that once felt intimidating will now keep revealing their stories to you.
Keep this page near. Come back whenever you hit a wall in your own data project. Challenge yourself to clean a real dataset from Kaggle, or help a friend who’s just starting out. Teaching is where mastery deepens.
You’re no longer just someone who “uses Python for data.” You’re a person who can wrangle, analyze, and truly see what the numbers mean. That’s powerful. Onward — your next insight is just a df.head() away. Happy coding!