🐍 Python Tutorial: Data Science Foundations

Data Science with Python begins by learning how to load, manipulate, analyze, and visualize data. In this tutorial, we focus on three essential tools: NumPy for numerical computing, Pandas for data wrangling, and Matplotlib/Seaborn for visual storytelling with data.


1. NumPy

NumPy (Numerical Python) allows you to work efficiently with large arrays and matrices of numeric data. It supports broadcasting, vectorization, and a host of statistical and algebraic operations.

import numpy as np

# Create a 1D array
arr = np.array([1, 2, 3, 4])
print(arr * 2)  # [2 4 6 8]

# 2D array and basic stats
matrix = np.array([[1, 2], [3, 4]])
print(matrix.mean())  # 2.5
print(matrix.shape)   # (2, 2)

NumPy arrays are more compact and faster than Python lists, especially for large-scale numerical computations.


2. Pandas

Pandas makes data cleaning and manipulation easy through its two core data structures: Series (1D) and DataFrame (2D, like an Excel table). It supports powerful filtering, aggregation, merging, and group operations.

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)

# Accessing columns and filtering
print(df['Name'])
print(df[df['Age'] > 28])

Pandas is perfect for working with CSV files, SQL tables, JSON, and Excel spreadsheets. It integrates smoothly with NumPy for numerical tasks.


3. Data Visualization

Visualization helps communicate insights clearly. Matplotlib gives low-level control over every element of a chart, while Seaborn simplifies complex visualizations with fewer lines of code.

import matplotlib.pyplot as plt
import seaborn as sns

# Line plot with Matplotlib
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y)
plt.title("Line Chart")
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.grid(True)
plt.show()

# Seaborn histogram
import pandas as pd

ages = pd.Series([25, 30, 22, 35, 29, 41])
sns.histplot(ages, kde=True)
plt.title("Age Distribution")
plt.show()

These visual tools are essential in Exploratory Data Analysis (EDA), helping identify patterns, outliers, and relationships between variables.


Additional Resources & References


← Back : Concurrency & AsyncNext: Automation →