Skip to content

Latest commit

 

History

History
196 lines (128 loc) · 5.39 KB

File metadata and controls

196 lines (128 loc) · 5.39 KB

Pandas Series: One-Dimensional Structure

In Pandas, a Series is a fundamental data structure that represents a one-dimensional labeled array. Think of it as a single column in a spreadsheet or a collection of elements along a single axis. While seemingly simple, Series offer a robust foundation for various data analysis tasks.

Key Characteristics:

  • Labeled Data: Each element in a Series has a corresponding label (index) for efficient data access and manipulation.
  • Data Types: Series can hold data of various types, including integers, floats, strings, Python objects, and more.
  • Creation: You can create a Series from various sources like lists, NumPy arrays, dictionaries, or even another Series.

Creating a Series:

Here are several ways to create a Series:

1. From a List:

import pandas as pd

data = [10, 20, 30, 40]
my_series = pd.Series(data)
print(my_series)

2. From a NumPy Array:

import numpy as np

data = np.array([10, 20, 30, 40])
my_series = pd.Series(data)
print(my_series)

3. From a Dictionary (Index Becomes Label):

data = {"name": "Alice", "age": 30, "city": "New York"}
my_series = pd.Series(data)
print(my_series)

When to Use Series?

  • Representing one-dimensional data (e.g., temperature readings, list of names).
  • Creating a single column for a DataFrame.
  • Performing data operations on a single dimension.

Series vs. DataFrames:

  • Series: One-dimensional, single column-like structure.
  • DataFrame: Two-dimensional, tabular data structure with rows and columns.

Series as Building Blocks:

Series are often used as the building blocks for DataFrames. You can combine multiple Series with different labels into a single DataFrame for creating comprehensive datasets.

Series Operations:

Pandas Series offers a rich set of operations you can perform on one-dimensional data. Here's a breakdown of some essential operations with examples:

  1. Accessing Elements:

  • Use the label (index) to retrieve a specific value:
import pandas as pd

data = {"name": "Alice", "age": 30, "city": "New York"}
my_series = pd.Series(data)

name = my_series["name"]
print(name)  # Output: Alice
  1. Slicing:

  • Extract a sub-Series using familiar slicing syntax:
sub_series = my_series[1:2]  # Extracts elements at indices 1 (inclusive) and 2 (exclusive)
print(sub_series)
  1. Selection:

  • Use boolean indexing to filter elements based on conditions:
import pandas as pd

# Creating a sample Series
data = {"name": "Alice", "age": 30, "city": "New York"}
my_series = pd.Series(data)

print("Original Series:")
print(my_series)

# Convert elements to numeric, coercing errors to NaN
numeric_series = pd.to_numeric(my_series, errors='coerce')

# Filter out NaN values (non-numeric elements)
numeric_only_series = numeric_series.dropna()

# Apply the condition to filter elements greater than 25
filtered_series = numeric_only_series[numeric_only_series > 25]

print("\nFiltered Series (elements greater than 25):")
print(filtered_series)
  1. Arithmetic Operations:

  • Perform basic arithmetic operations on the entire Series or between Series:
# Convert elements to numeric, coercing errors to NaN
my_series = pd.to_numeric(my_series, errors='coerce')

# Add a constant value
added_series = my_series + 5
print(added_series)

# Multiply two Series with compatible indices
data2 = {"name": "Bob", "age": 20, "city": "London"}
series2 = pd.Series(data2)
multiplied_series = my_series * series2
print(multiplied_series)  # Note: Only elements with matching indices are multiplied
  1. Statistical Methods:

  • Calculate descriptive statistics like mean, median, standard deviation:
print(my_series.mean())
print(my_series.median())
print(my_series.std())
  1. Sorting:

  • Re-arrange the Series based on its values:
sorted_series = my_series.sort_values(ascending=False)  # Sorts by age in descending order
print(sorted_series)
  1. String Methods (for String Data):

  • Leverage built-in string methods for text manipulation (assuming string data type):
# Convert all city names to uppercase
my_series["city"] = my_series["city"].str.upper()
print(my_series)
  1. Missing Value Handling:

  • Replace missing values (NaN) with specific values or methods:
import pandas as pd

# Creating a sample Series with a missing value
data = {"name": "Alice", "age": None, "city": "New York"}  # Notice 'age' is None
my_series = pd.Series(data)

print("Original Series:")
print(my_series)

# Fill missing values in the Series
my_series = my_series.fillna({"age": 25})

print("\nSeries after filling missing values:")
print(my_series)

Beyond the Basics:

  • Explore functions like unique, value_counts, and get for more advanced data manipulation.
  • Remember that some operations might have limitations based on the data type of your Series.

These examples provide a glimpse into the power of Pandas Series operations. As you delve deeper, you'll discover even more functionalities to transform and analyze your one-dimensional data effectively!


Reading Materials: