import pandas as pd
print("The gates to the Grand Library of DataFrames are now open!")The gates to the Grand Library of DataFrames are now open!
Step into a world where data transforms into organized knowledge. This is the Grand Library of DataFrames, a place where weāll learn to manage, explore, and understand our data using the powerful magic of Pandas in Python.
Think of a DataFrame as a grand catalog within this library, filled with structured information. Letās begin our magical journey by opening the gates to this library.
Every library needs a catalog to keep track of its treasures. In our Grand Library, we create catalogs (DataFrames) from various sources, like enchanted scrolls (dictionaries), lists of artifacts, or even ancient texts (CSV files).
Letās create a simple catalog of magical artifacts:
Now that we have our magical catalog, letās learn how to browse through its entries.
Peeking at the first or last entries:
You can quickly peek at the first few artifacts using the .head() spell or the last few with the .tail() spell.
Understanding Your Catalogās Secrets:
To truly understand the nature of your catalog, you can use special incantations.
The .info() spell reveals a concise summary of the catalog, including the type of magic (data type) in each column and how many entries are not empty. The .describe() spell conjures up descriptive statistics of the numerical aspects of your artifacts, like the average power level.
Focusing on Specific Columns:
Sometimes you only need to focus on specific types of information in your catalog, like just the āArtifact Nameā or āPower Levelā. You can select a single column by calling its name in square brackets [] or multiple columns by listing their names.
Pandas is a powerful open-source library for data analysis and manipulation in Python. Its core data structure is the DataFrame, which is a two-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or a SQL table.
Letās start by importing the pandas library.
import pandas as pd
print("The gates to the Grand Library of DataFrames are now open!")The gates to the Grand Library of DataFrames are now open!
You can create a DataFrame from various data sources, such as dictionaries, lists, or CSV files. Hereās an example of creating a DataFrame from a dictionary:
data = {'Artifact Name': ['Phoenix Feather', 'Dragon Scale', 'Unicorn Horn', 'Griffin Claw'],
'Power Level': [10, 9, 8, 7],
'Location': ['Forbidden Forest', 'Dragon Mountains', 'Mystical Meadow', 'Sky Peaks']}
magic_artifacts_df = pd.DataFrame(data)
print("Behold! Your first magical catalog (DataFrame):")
display(magic_artifacts_df)Behold! Your first magical catalog (DataFrame):
| Artifact Name | Power Level | Location | |
|---|---|---|---|
| 0 | Phoenix Feather | 10 | Forbidden Forest |
| 1 | Dragon Scale | 9 | Dragon Mountains |
| 2 | Unicorn Horn | 8 | Mystical Meadow |
| 3 | Griffin Claw | 7 | Sky Peaks |
Once you have a DataFrame, you can perform various operations on it.
Viewing data:
You can view the first few rows using .head() and the last few rows using .tail().
print("Peeking at the first 2 artifacts:")
display(magic_artifacts_df.head(2))
print("\nLooking at the last artifact:")
display(magic_artifacts_df.tail(1))Peeking at the first 2 artifacts:
| Artifact Name | Power Level | Location | |
|---|---|---|---|
| 0 | Phoenix Feather | 10 | Forbidden Forest |
| 1 | Dragon Scale | 9 | Dragon Mountains |
Looking at the last artifact:
| Artifact Name | Power Level | Location | |
|---|---|---|---|
| 3 | Griffin Claw | 7 | Sky Peaks |
Getting information about the DataFrame:
.info() provides a concise summary of the DataFrame, including the data types of each column and the number of non-null values. .describe() generates descriptive statistics of the numerical columns.
print("Unveiling the catalog's information:")
display(magic_artifacts_df.info())
print("\nDescribing the magical properties (numerical columns):")
display(magic_artifacts_df.describe())Unveiling the catalog's information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Artifact Name 4 non-null object
1 Power Level 4 non-null int64
2 Location 4 non-null object
dtypes: int64(1), object(2)
memory usage: 228.0+ bytes
None
Describing the magical properties (numerical columns):
| Power Level | |
|---|---|
| count | 4.000000 |
| mean | 8.500000 |
| std | 1.290994 |
| min | 7.000000 |
| 25% | 7.750000 |
| 50% | 8.500000 |
| 75% | 9.250000 |
| max | 10.000000 |
Selecting Rows:
Sometimes you need to retrieve specific artifacts from your catalog. You can select rows by their position in the catalog using .iloc[] (integer-based) or by their magical label (index) using .loc[] (label-based).
Selecting columns:
You can select a single column using square brackets [] or multiple columns using a list of column names.
print("\nFocusing on just the names of the artifacts:")
display(magic_artifacts_df['Artifact Name'])
print("\nExamining the power levels and locations:")
display(magic_artifacts_df[['Power Level', 'Location']])
Focusing on just the names of the artifacts:
| Artifact Name | |
|---|---|
| 0 | Phoenix Feather |
| 1 | Dragon Scale |
| 2 | Unicorn Horn |
| 3 | Griffin Claw |
Examining the power levels and locations:
| Power Level | Location | |
|---|---|---|
| 0 | 10 | Forbidden Forest |
| 1 | 9 | Dragon Mountains |
| 2 | 8 | Mystical Meadow |
| 3 | 7 | Sky Peaks |
Selecting rows:
You can select rows by their index using .loc[] (label-based) or .iloc[] (integer-based).
print("\nRetrieving the first artifact in the catalog using iloc (by position):")
display(magic_artifacts_df.iloc[0])
Retrieving the first artifact in the catalog using iloc (by position):
| 0 | |
|---|---|
| Artifact Name | Phoenix Feather |
| Power Level | 10 |
| Location | Forbidden Forest |
print("\nGetting the second and third artifacts by their position using iloc:")
display(magic_artifacts_df.iloc[1:3])
Getting the second and third artifacts by their position using iloc:
| Artifact Name | Power Level | Location | |
|---|---|---|---|
| 1 | Dragon Scale | 9 | Dragon Mountains |
| 2 | Unicorn Horn | 8 | Mystical Meadow |
# Example using .loc with the indexed DataFrame
print("\nRetrieving the 'Dragon Scale' artifact using loc (by magical name):")
display(magic_artifacts_indexed_df.loc['Dragon Scale'])
print("\nRetrieving multiple artifacts using loc:")
display(magic_artifacts_indexed_df.loc[['Phoenix Feather', 'Unicorn Horn']])
print("\nRetrieving 'Power Level' and 'Location' for 'Dragon Scale' using loc:")
display(magic_artifacts_indexed_df.loc['Dragon Scale', ['Power Level', 'Location']])
print("\nRetrieving 'Power Level' for multiple artifacts using loc:")
display(magic_artifacts_indexed_df.loc[['Phoenix Feather', 'Unicorn Horn'], 'Power Level'])
print("\nRetrieving all columns for artifacts from 'Dragon Scale' to 'Griffin Claw' using loc:")
display(magic_artifacts_indexed_df.loc['Dragon Scale':'Griffin Claw', :])
# Adding examples from user's notes for selecting rows and columns:
print("\nRetrieving 'Power Level' for 'Dragon Scale' using loc (row and column label):")
display(magic_artifacts_indexed_df.loc['Dragon Scale', 'Power Level'])
print("\nRetrieving 'Power Level' for 'Dragon Scale' using iloc (row and column index):")
display(magic_artifacts_indexed_df.iloc[1, 0]) # Dragon Scale is at index 1, Power Level is at index 0
print("\nRetrieving 'Power Level' for 'Dragon Scale' and 'Unicorn Horn' using loc (row labels and column label):")
display(magic_artifacts_indexed_df.loc[['Dragon Scale', 'Unicorn Horn'], 'Power Level'])
print("\nRetrieving 'Power Level' for 'Dragon Scale' and 'Unicorn Horn' using iloc (row indices and column index):")
display(magic_artifacts_indexed_df.iloc[[1, 2], 0]) # Dragon Scale at 1, Unicorn Horn at 2, Power Level at 0
print("\nRetrieving 'Power Level' and 'Location' for 'Dragon Scale' and 'Unicorn Horn' using loc (row labels and column labels):")
display(magic_artifacts_indexed_df.loc[['Dragon Scale', 'Unicorn Horn'], ['Power Level', 'Location']])
print("\nRetrieving 'Power Level' and 'Location' for 'Dragon Scale' and 'Unicorn Horn' using iloc (row indices and column indices):")
display(magic_artifacts_indexed_df.iloc[[1, 2], [0, 1]]) # Dragon Scale at 1, Unicorn Horn at 2, Power Level at 0, Location at 1
Retrieving the 'Dragon Scale' artifact using loc (by magical name):
--------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipython-input-1707977752.py in <cell line: 0>() 1 # Example using .loc with the indexed DataFrame 2 print("\nRetrieving the 'Dragon Scale' artifact using loc (by magical name):") ----> 3 display(magic_artifacts_indexed_df.loc['Dragon Scale']) 4 5 print("\nRetrieving multiple artifacts using loc:") NameError: name 'magic_artifacts_indexed_df' is not defined
Every library needs a catalog to keep track of its treasures. In our Grand Library, we create catalogs (DataFrames) from various sources, like enchanted scrolls (dictionaries), lists of artifacts, or even ancient texts (CSV files).
Letās create a simple catalog of magical artifacts:
data = {'Artifact Name': ['Phoenix Feather', 'Dragon Scale', 'Unicorn Horn', 'Griffin Claw'],
'Power Level': [10, 9, 8, 7],
'Location': ['Forbidden Forest', 'Dragon Mountains', 'Mystical Meadow', 'Sky Peaks']}
magic_artifacts_df = pd.DataFrame(data)
print("Behold! Your first magical catalog (DataFrame):")
display(magic_artifacts_df)Behold! Your first magical catalog (DataFrame):
| Artifact Name | Power Level | Location | |
|---|---|---|---|
| 0 | Phoenix Feather | 10 | Forbidden Forest |
| 1 | Dragon Scale | 9 | Dragon Mountains |
| 2 | Unicorn Horn | 8 | Mystical Meadow |
| 3 | Griffin Claw | 7 | Sky Peaks |
By default, your catalog uses a simple numerical order as its identifier. However, you can set one of the columns as a unique magical identifier (index) for easier retrieval of artifacts. We can use the set_index() spell for this.
print("\nSetting 'Artifact Name' as the magical identifier:")
magic_artifacts_indexed_df = magic_artifacts_df.set_index('Artifact Name')
display(magic_artifacts_indexed_df)
Setting 'Artifact Name' as the magical identifier:
| Power Level | Location | |
|---|---|---|
| Artifact Name | ||
| Phoenix Feather | 10 | Forbidden Forest |
| Dragon Scale | 9 | Dragon Mountains |
| Unicorn Horn | 8 | Mystical Meadow |
| Griffin Claw | 7 | Sky Peaks |
To add a new artifact to your catalog, you can create a new DataFrame for the artifact and then use the pd.concat() spell to merge it with your existing magic_artifacts_df.
new_artifact_data = {'Artifact Name': ["Goblin's Gold Coin"],
'Power Level': [6],
'Location': ["Goblin's Lair"]}
new_artifact_df = pd.DataFrame(new_artifact_data)
print("\nOur new artifact:")
display(new_artifact_df)
# Concatenate the new artifact to the existing DataFrame
magic_artifacts_df = pd.concat([magic_artifacts_df, new_artifact_df], ignore_index=True)
print("\nCatalog with the new artifact added:")
display(magic_artifacts_df)
Our new artifact:
| Artifact Name | Power Level | Location | |
|---|---|---|---|
| 0 | Goblin's Gold Coin | 6 | Goblin's Lair |
Catalog with the new artifact added:
| Artifact Name | Power Level | Location | |
|---|---|---|---|
| 0 | Phoenix Feather | 10 | Forbidden Forest |
| 1 | Dragon Scale | 9 | Dragon Mountains |
| 2 | Unicorn Horn | 8 | Mystical Meadow |
| 3 | Griffin Claw | 7 | Sky Peaks |
| 4 | Goblin's Gold Coin | 6 | Goblin's Lair |
Just as you can compare numbers or text, you can also use logical comparisons to filter your DataFrame and find artifacts that meet specific criteria. This is like casting a spell to reveal only the artifacts you are interested in!
We can use operators like >, <, ==, >=, <=, and != to create conditions based on the values in our columns.
print("\nFinding artifacts with a Power Level greater than 8:")
powerful_artifacts = magic_artifacts_df[magic_artifacts_df['Power Level'] > 8]
display(powerful_artifacts)
print("\nFinding artifacts located in the 'Forbidden Forest':")
forbidden_forest_artifacts = magic_artifacts_df[magic_artifacts_df['Location'] == 'Forbidden Forest']
display(forbidden_forest_artifacts)
print("\nFinding artifacts with a Power Level less than or equal to 7:")
lesser_artifacts = magic_artifacts_df[magic_artifacts_df['Power Level'] <= 7]
display(lesser_artifacts)
Finding artifacts with a Power Level greater than 8:
| Artifact Name | Power Level | Location | |
|---|---|---|---|
| 0 | Phoenix Feather | 10 | Forbidden Forest |
| 1 | Dragon Scale | 9 | Dragon Mountains |
Finding artifacts located in the 'Forbidden Forest':
| Artifact Name | Power Level | Location | |
|---|---|---|---|
| 0 | Phoenix Feather | 10 | Forbidden Forest |
Finding artifacts with a Power Level less than or equal to 7:
| Artifact Name | Power Level | Location | |
|---|---|---|---|
| 3 | Griffin Claw | 7 | Sky Peaks |
| 4 | Goblin's Gold Coin | 6 | Goblin's Lair |
By default, your catalog uses a simple numerical order as its identifier. However, you can set one of the columns as a unique magical identifier (index) for easier retrieval of artifacts. We can use the set_index() spell for this.
print("\nSetting 'Artifact Name' as the magical identifier:")
magic_artifacts_indexed_df = magic_artifacts_df.set_index('Artifact Name')
display(magic_artifacts_indexed_df)Now you can retrieve artifacts directly using their magical name:
print("\nRetrieving the 'Dragon Scale' artifact using its magical name:")
display(magic_artifacts_indexed_df.loc['Dragon Scale'])Instead of using an existing column, you can also assign a new list of magical identifiers to your catalog.
magic_artifacts_df_new_index = magic_artifacts_df.copy() # Create a copy to keep the original DataFrame
new_magical_ids = ['Artifact_1', 'Artifact_2', 'Artifact_3', 'Artifact_4']
magic_artifacts_df_new_index.index = new_magical_ids
print("\nCatalog with new magical identifiers:")
display(magic_artifacts_df_new_index)To add a new artifact to your catalog, you can create a new DataFrame for the artifact and then use the pd.concat() spell to merge it with your existing magic_artifacts_df.
Now you can use these new magical identifiers to retrieve artifacts:
print("\nRetrieving 'Artifact_3' using its new magical identifier:")
display(magic_artifacts_df_new_index.loc['Artifact_3'])new_artifact_data = {'Artifact Name': ["Goblin's Gold Coin"],
'Power Level': [6],
'Location': ["Goblin's Lair"]}
new_artifact_df = pd.DataFrame(new_artifact_data)
print("\nOur new artifact:")
display(new_artifact_df)
# Concatenate the new artifact to the existing DataFrame
magic_artifacts_df = pd.concat([magic_artifacts_df, new_artifact_df], ignore_index=True)
print("\nCatalog with the new artifact added:")
display(magic_artifacts_df)To bring order to your magical catalog, you can sort the artifacts based on the values in one or more columns. The .sort_values() spell allows you to arrange your artifacts. Letās sort them by āPower Levelā to see which are the most powerful!
print("\nSorting artifacts by Power Level (ascending):")
sorted_artifacts_ascending = magic_artifacts_df.sort_values(by='Power Level')
display(sorted_artifacts_ascending)
print("\nSorting artifacts by Power Level (descending):")
sorted_artifacts_descending = magic_artifacts_df.sort_values(by='Power Level', ascending=False)
display(sorted_artifacts_descending)Sometimes your magical artifacts are stored in ancient scrolls (CSV files). You can import this data directly into your catalog using the pd.read_csv() spell. You can even specify a column to be the magical identifier (index) when you import it using the index_col parameter.
# Let's use a sample CSV file available in this environment
csv_file_path = '/content/sample_data/california_housing_train.csv'
print(f"\nImporting artifacts from the ancient scroll: {csv_file_path}")
housing_df = pd.read_csv(csv_file_path, index_col='longitude')
print("\nBehold! Your new catalog conjured from the ancient scroll:")
display(housing_df.head())Just like selecting items from a list, you can select a range of artifacts from your catalog using numerical intervals within square brackets []. This is often called āslicingā. Remember that the end of the interval is exclusive, meaning the artifact at the end index is not included.
print("\nRetrieving the first two artifacts using slicing:")
display(magic_artifacts_df[0:2])
print("\nRetrieving artifacts from the third to the fifth (index 2 to 4) using slicing:")
display(magic_artifacts_df[2:5])Just as you can compare numbers or text, you can also use logical comparisons to filter your DataFrame and find artifacts that meet specific criteria. This is like casting a spell to reveal only the artifacts you are interested in!
We can use operators like >, <, ==, >=, <=, and != to create conditions based on the values in our columns.
print("\nFinding artifacts with a Power Level greater than 8:")
powerful_artifacts = magic_artifacts_df[magic_artifacts_df['Power Level'] > 8]
display(powerful_artifacts)
print("\nFinding artifacts located in the 'Forbidden Forest':")
forbidden_forest_artifacts = magic_artifacts_df[magic_artifacts_df['Location'] == 'Forbidden Forest']
display(forbidden_forest_artifacts)
print("\nFinding artifacts with a Power Level less than or equal to 7:")
lesser_artifacts = magic_artifacts_df[magic_artifacts_df['Power Level'] <= 7]
display(lesser_artifacts)
Finding artifacts with a Power Level greater than 8:
| Artifact Name | Power Level | Location | |
|---|---|---|---|
| 0 | Phoenix Feather | 10 | Forbidden Forest |
| 1 | Dragon Scale | 9 | Dragon Mountains |
Finding artifacts located in the 'Forbidden Forest':
| Artifact Name | Power Level | Location | |
|---|---|---|---|
| 0 | Phoenix Feather | 10 | Forbidden Forest |
Finding artifacts with a Power Level less than or equal to 7:
| Artifact Name | Power Level | Location | |
|---|---|---|---|
| 3 | Griffin Claw | 7 | Sky Peaks |
| 4 | Goblin's Gold Coin | 6 | Goblin's Lair |
This is just a basic introduction. Pandas DataFrames offer many more functionalities for data manipulation, cleaning, and analysis. Feel free to ask if you have any specific questions or want to explore more advanced topics!
This is just the beginning of our adventure in the Grand Library of DataFrames! There are many more spells (operations) to learn for manipulating, cleaning, and analyzing your data.
Would you like to learn how to:
Let me know what magical data skill youād like to unlock next!