100 Pandas practice problems with solutions

Reading about DataFrames feels comfortable. Cleaning real data still feels messy.
That’s the exact gap these 100 Pandas practice problems are designed to close—hands-on, messy, and incredibly rewarding.

What’s inside:

  • 100 carefully crafted challenges, from basic Series and DataFrame operations to groupby, merges, pivots, and time series.
  • Clear, step-by-step solutions that never just show the answer—they explain the why behind every transformation.
  • Problems rooted in everyday data tasks: filtering, handling missing values, cleaning strings, and reshaping data for analysis.

How to practice for real growth:

  1. Read the problem written in plain, jargon-free language.
  2. Try solving it yourself — struggle a little, that’s where learning sticks.
  3. Check the solution and truly understand the logic.
  4. Repeat until data manipulation feels less like a chore and more like a superpower.

Why does deliberate practice matter so much? Because data analysis isn’t about memorizing functions — it’s about building intuition. Every problem you solve sharpens your ability to spot patterns, fix messy datasets, and write clean, efficient Pandas code. That’s what turns a Python user into a confident data analyst — and what stands out in interviews and real projects.

Whether you’re falling in love with data or preparing for your next big opportunity, these exercises will make you feel capable, prepared, and genuinely excited about what you can do. Pick a problem, open a notebook, and let your “I can really do this!” moment arrive.

Also try it: 100 NumPy practice problems with solutions

1. Create a Pandas Series from a list [1, 2, 3, 4, 5] and print it.

python

import pandas as pd
s = pd.Series([1, 2, 3, 4, 5])
print(s)

2. Create a Pandas DataFrame from a dictionary: {‘Name’: [‘Alice’,’Bob’], ‘Age’: [25,30]}.

python

df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
print(df)

3. Read a CSV file named ‘data.csv’ and display the first 5 rows.

python

df = pd.read_csv('data.csv')
print(df.head())

4. Write the DataFrame df to a CSV file named ‘output.csv’ without the index.

python

df.to_csv('output.csv', index=False)

5. Get the number of rows and columns of a DataFrame df.

python

rows, cols = df.shape
print(f"Rows: {rows}, Columns: {cols}")

6. Display the column names of a DataFrame df.

python

print(df.columns)

7. Check the data types of each column in a DataFrame df.

python

print(df.dtypes)

8. Get a quick statistical summary of a DataFrame df (numeric columns only).

python

print(df.describe())

9. Select a single column ‘Age’ from a DataFrame df as a Series.

python

age_series = df['Age']
print(age_series)

10. Select multiple columns [‘Name’, ‘Age’] from a DataFrame df.

python

subset = df[['Name', 'Age']]
print(subset)

11. Filter rows where the ‘Age’ column is greater than 25.

python

filtered = df[df['Age'] > 25]
print(filtered)

12. Add a new column ‘City’ with constant value ‘New York’ to a DataFrame df.

python

df['City'] = 'New York'

13. Update the ‘Age’ column by adding 1 to every value.

python

df['Age'] = df['Age'] + 1

14. Drop a column named ‘City’ from a DataFrame df.

python

df = df.drop('City', axis=1)

15. Drop rows with any missing values (NaN) in a DataFrame df.

python

df = df.dropna()

16. Fill missing values in column ‘Age’ with the mean of that column.

python

df['Age'].fillna(df['Age'].mean(), inplace=True)

17. Rename the column ‘Name’ to ‘FullName’ in a DataFrame df.

python

df.rename(columns={'Name': 'FullName'}, inplace=True)

18. Sort a DataFrame df by the ‘Age’ column in descending order.

python

df_sorted = df.sort_values('Age', ascending=False)

19. Group a DataFrame df by column ‘City’ and compute the mean of ‘Age’.

python

grouped = df.groupby('City')['Age'].mean()
print(grouped)

20. Create a new DataFrame from two Series: names = pd.Series(['A','B']) and scores = pd.Series([90,85]).

python

df = pd.DataFrame({'Name': names, 'Score': scores})
print(df)

21. Apply a lambda function that squares each element in a column ‘Value’.

python

df['Squared'] = df['Value'].apply(lambda x: x**2)

22. Use map to change category codes in a column: replace {‘M’:’Male’,’F’:’Female’}.

python

df['Gender'] = df['Gender'].map({'M': 'Male', 'F': 'Female'})

23. Create a DataFrame with dates as index from ‘2024-01-01’ to ‘2024-01-05’ and random values.

python

import numpy as np
dates = pd.date_range('2024-01-01', periods=5)
df = pd.DataFrame(np.random.randn(5, 2), index=dates, columns=['A', 'B'])
print(df)

24. Select rows between two dates ‘2024-01-02’ and ‘2024-01-04’ from a DataFrame with datetime index.

python

subset = df['2024-01-02':'2024-01-04']
print(subset)

25. Resample a time series DataFrame to weekly frequency and compute the mean.

python

df_weekly = df.resample('W').mean()

26. Compute the rolling (moving) average of a column with window size 3.

python

df['rolling_avg'] = df['Value'].rolling(window=3).mean()

27. Create a pivot table from DataFrame df with index=’Category’, columns=’Year’, values=’Sales’, aggfunc=’sum’.

python

pivot = pd.pivot_table(df, values='Sales', index='Category', columns='Year', aggfunc='sum')
print(pivot)

28. Merge two DataFrames df1 and df2 on a common column ‘ID’ with inner join.

python

merged = pd.merge(df1, df2, on='ID', how='inner')

29. Concatenate two DataFrames df1 and df2 vertically (stack rows).

python

concat = pd.concat([df1, df2], axis=0, ignore_index=True)

30. Concatenate two DataFrames df1 and df2 horizontally (side by side).

python

concat = pd.concat([df1, df2], axis=1)

31. Find the correlation matrix of a DataFrame df with numeric columns.

python

corr = df.corr()
print(corr)

32. Get the number of unique values in a column ‘Product’.

python

n_unique = df['Product'].nunique()
print(n_unique)

33. Count the frequency of each unique value in column ‘Category’.

python

freq = df['Category'].value_counts()
print(freq)

34. Use isin to filter rows where column ‘Country’ is in a list [‘USA’,’Canada’].

python

filtered = df[df['Country'].isin(['USA', 'Canada'])]

35. Use between to filter rows where ‘Price’ is between 10 and 20.

python

filtered = df[df['Price'].between(10, 20)]

36. Create a DataFrame from a dictionary of lists and set a specific column as index.

python

df = pd.DataFrame({'ID': [1,2,3], 'Name': ['A','B','C']})
df.set_index('ID', inplace=True)

37. Reset the index of a DataFrame to default integer index.

python

df.reset_index(drop=True, inplace=True)

38. Use loc to select rows with index labels 2 to 4 and columns ‘A’ and ‘B’.

python

subset = df.loc[2:4, ['A', 'B']]

39. Use iloc to select the first 3 rows and the first 2 columns.

python

subset = df.iloc[:3, :2]

40. Create a copy of a DataFrame to avoid chained assignment issues.

python

df_copy = df.copy()

41. Replace all occurrences of 0 in a column ‘Quantity’ with NaN.

python

df['Quantity'] = df['Quantity'].replace(0, np.nan)

42. Use where to keep values >0 in a column and set others to NaN.

python

df['Positive'] = df['Value'].where(df['Value'] > 0, np.nan)

43. Create a new column ‘AgeGroup’ based on ages: ‘Child’ (<18), ‘Adult’ (18-64), ‘Senior’ (>=65).

python

df['AgeGroup'] = pd.cut(df['Age'], bins=[0,17,64,100], labels=['Child','Adult','Senior'])

44. Use qcut to divide a column into 4 equal‑sized bins (quartiles).

python

df['Quartile'] = pd.qcut(df['Score'], q=4, labels=['Q1','Q2','Q3','Q4'])

45. Convert a string column ‘Date’ to datetime type.

python

df['Date'] = pd.to_datetime(df['Date'])

46. Extract the month name from a datetime column ‘Date’.

python

df['MonthName'] = df['Date'].dt.month_name()

47. Extract the day of week from a datetime column ‘Date’.

python

df['Weekday'] = df['Date'].dt.day_name()

48. Compute the difference in days between two datetime columns ‘Start’ and ‘End’.

python

df['Duration'] = (df['End'] - df['Start']).dt.days

49. Shift a column ‘Value’ by 2 rows forward (lag).

python

df['Shifted'] = df['Value'].shift(2)

50. Compute the cumulative sum of a column ‘Sales’.

python

df['CumSales'] = df['Sales'].cumsum()

51. Compute the rank of values in a column ‘Score’ (dense ranking).

python

df['Rank'] = df['Score'].rank(method='dense')

52. Use sample to randomly select 5 rows from a DataFrame.

python

sample_df = df.sample(n=5, random_state=42)

53. Use nlargest to get the top 3 rows based on ‘Value’.

python

top3 = df.nlargest(3, 'Value')

54. Use nsmallest to get the bottom 2 rows based on ‘Value’.

python

bottom2 = df.nsmallest(2, 'Value')

55. Drop duplicate rows based on all columns.

python

df = df.drop_duplicates()

56. Drop duplicate rows based on a subset of columns [‘Name’, ‘City’], keeping the first occurrence.

python

df = df.drop_duplicates(subset=['Name', 'City'], keep='first')

57. Get the unique values of a column as a NumPy array.

python

unique_vals = df['Product'].unique()

58. Convert a numeric column to integer type (e.g., float to int).

python

df['Quantity'] = df['Quantity'].astype(int)

59. Convert a column to categorical type to save memory.

python

df['Status'] = df['Status'].astype('category')

60. Use query to filter rows where ‘Age’ > 25 and ‘City’ == ‘London’.

python

filtered = df.query("Age > 25 and City == 'London'")

61. Use eval to create a new column ‘Total’ as ‘Price * Quantity’.

python

df.eval('Total = Price * Quantity', inplace=True)

62. Compute the percentage change between consecutive rows in a column ‘Value’.

python

df['PctChange'] = df['Value'].pct_change() * 100

63. Compute the cumulative product of a column ‘Factor’.

python

df['CumProd'] = df['Factor'].cumprod()

64. Create a new DataFrame by transposing an existing DataFrame (rows become columns).

python

df_transposed = df.T

65. Convert a DataFrame to a dictionary (records format).

python

records = df.to_dict('records')

66. Convert a DataFrame to a NumPy array (excluding index).

python

array = df.to_numpy()

67. Apply a function to multiple columns using apply with axis=1.

python

df['Sum'] = df.apply(lambda row: row['A'] + row['B'], axis=1)

68. Use groupby and agg to compute both mean and max of ‘Value’ per ‘Category’.

python

result = df.groupby('Category')['Value'].agg(['mean', 'max'])
print(result)

69. Use transform to add group mean as a new column without collapsing the DataFrame.

python

df['GroupMean'] = df.groupby('Category')['Value'].transform('mean')

70. Use filter to keep groups in a groupby where group sum of ‘Value’ > 100.

python

filtered = df.groupby('Category').filter(lambda g: g['Value'].sum() > 100)

71. Create a DataFrame with MultiIndex from two columns ‘Year’ and ‘Month’.

python

df.set_index(['Year', 'Month'], inplace=True)

72. Swap the levels of a MultiIndex DataFrame.

python

df.swaplevel(i=0, j=1, axis=0, inplace=True)

73. Use stack to convert columns to rows (pivot long format).

python

stacked = df.stack()

74. Use unstack to convert rows to columns (pivot wide format).

python

unstacked = df.unstack()

75. Read a JSON file ‘data.json’ into a DataFrame.

python

df = pd.read_json('data.json')

76. Export a DataFrame df to a JSON file ‘output.json’.

python

df.to_json('output.json', orient='records')

77. Read an Excel file ‘data.xlsx’ (sheet named ‘Sheet1’) into a DataFrame.

python

df = pd.read_excel('data.xlsx', sheet_name='Sheet1')

78. Write a DataFrame to an Excel file ‘output.xlsx’ without the index.

python

df.to_excel('output.xlsx', index=False)

79. Read a SQL query from a SQLite database into a DataFrame (example with sqlite3).

python

import sqlite3
conn = sqlite3.connect('database.db')
df = pd.read_sql_query("SELECT * FROM table", conn)
conn.close()

80. Write a DataFrame to a SQLite table (replace if exists).

python

import sqlite3
conn = sqlite3.connect('database.db')
df.to_sql('table', conn, if_exists='replace', index=False)
conn.close()

81. Detect duplicate rows (boolean mask) in a DataFrame.

python

dup_mask = df.duplicated()
print(dup_mask)

82. Get the row index of the maximum value in column ‘Score’.

python

idx_max = df['Score'].idxmax()
print(idx_max)

83. Get the row index of the minimum value in column ‘Score’.

python

idx_min = df['Score'].idxmin()
print(idx_min)

Also try it: 100 SQL practice problems with solutions

84. Use where with a condition to replace values in a DataFrame.

python

df2 = df.where(df > 0, other=0)  # replace negative values with 0

85. Use mask to replace values where condition is True.

python

df2 = df.mask(df < 0, 0)  # same as above

86. Convert a DataFrame with a date column to a time series where date is the index.

python

df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

87. Use at_time to select rows at a specific time (e.g., 09:00:00) from a datetime index.

python

subset = df.at_time('09:00:00')

88. Use between_time to select rows between two times (e.g., 09:00 and 17:00).

python

subset = df.between_time('09:00', '17:00')

89. Use first to get the first 3 days of a time series (requires sorted index).

python

first_3d = df.first('3D')

90. Use last to get the last 2 business days of a time series (regular days).

python

last_2d = df.last('2D')

91. Create a date range with business days only (excluding weekends).

python

bday_range = pd.date_range('2024-01-01', periods=10, freq='B')
print(bday_range)

92. Use to_period to convert datetime index to monthly period.

python

df_period = df.to_period('M')

93. Use asfreq to change frequency of time series to monthly and fill forward.

python

df_monthly = df.asfreq('M', method='ffill')

94. Compute the expanding (cumulative) max of a column.

python

df['ExpandingMax'] = df['Value'].expanding().max()

95. Compute the exponential weighted moving average of a column (span=5).

python

df['EWMA'] = df['Value'].ewm(span=5, adjust=False).mean()

96. Use cut to bin a numeric column into custom intervals and count.

python

bins = [0, 50, 100, 150]
df['Bin'] = pd.cut(df['Score'], bins)
counts = df['Bin'].value_counts()
print(counts)

97. Use pd.testing.assert_frame_equal to check if two DataFrames are identical.

python

pd.testing.assert_frame_equal(df1, df2, check_dtype=True)

98. Use pd.option_context to temporarily change display options (e.g., max rows).

python

with pd.option_context('display.max_rows', 10):
    print(df)

99. Use memory_usage to get memory usage of each column in bytes.

python

mem = df.memory_usage(deep=True)
print(mem)

100. Create a DataFrame from a list of dictionaries and then melt it into a long format.

python

df = pd.DataFrame([{'Name':'A','Math':90,'Science':85}, {'Name':'B','Math':80,'Science':88}])
melted = pd.melt(df, id_vars=['Name'], var_name='Subject', value_name='Score')
print(melted)

Final Thought

Pandas is a tool that rewards practice. Every .groupby(), every melt and pivot, every null you hunted down added another layer of confidence. The messy data sets that once felt intimidating will now keep revealing their stories to you.

Keep this page near. Come back whenever you hit a wall in your own data project. Challenge yourself to clean a real dataset from Kaggle, or help a friend who’s just starting out. Teaching is where mastery deepens.

You’re no longer just someone who “uses Python for data.” You’re a person who can wrangle, analyze, and truly see what the numbers mean. That’s powerful. Onward — your next insight is just a df.head() away. Happy coding!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top