{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"id": "eea2a235-0f27-48f1-a2cd-7f1e3cdec9f9",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"id": "b18f60cb-4f9c-42c5-91e1-c7cb5c68fa78",
"metadata": {},
"source": [
"# Selecting and Modifying Rows/Columns"
]
},
{
"cell_type": "markdown",
"id": "3f6b7794-bea3-4d98-9f28-dfec37ef9223",
"metadata": {},
"source": [
"This subchapter will go into greater detail on how you can select/modify rows and columns in `pandas`. \n",
"\n",
"First, let us load in the same dataset as the first subchapter in the `Pandas` chapter. As a reminder, this dataset has beer sales across 50 continental states in the US. It is sourced from [_Salience and Taxation: Theory and Evidence_](https://www.aeaweb.org/articles?id=10.1257/aer.99.4.1145) by Chetty, Looney, and Kroft (AER 2009), and it includes 7 columns:\n",
"- `st_name`: the state abbreviation\n",
"- `year`: the year the data was recorded\n",
"- `c_beer`: the quantity of beer consumed, in thousands of gallons\n",
"- `beer_tax`: the ad valorem tax, as a percentage\n",
"- `btax_dollars`: the excise tax, represented in dollars per case (24 cans) of beer \n",
"- `population`: the population of the state, in thousands\n",
"- `salestax`: the sales tax percentage\n"
]
},
{
"cell_type": "code",
"execution_count": 73,
"id": "28c73197-835e-4786-90e0-7a29a79f2d90",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 72.341130 | \n",
" 2.370 | \n",
" 3450 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 69.304600 | \n",
" 2.370 | \n",
" 3497 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 2 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 67.149190 | \n",
" 2.370 | \n",
" 3539 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 3 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 63.217026 | \n",
" 2.370 | \n",
" 3580 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 4 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 56.933796 | \n",
" 2.370 | \n",
" 3627 | \n",
" 4.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 7 columns
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population salestax\n",
"0 AL 1970 33098 72.341130 2.370 3450 4.0\n",
"1 AL 1971 37598 69.304600 2.370 3497 4.0\n",
"2 AL 1972 42719 67.149190 2.370 3539 4.0\n",
"3 AL 1973 46203 63.217026 2.370 3580 4.0\n",
"4 AL 1974 49769 56.933796 2.370 3627 4.0\n",
"... ... ... ... ... ... ... ...\n",
"1703 WY 1999 12423 0.319894 0.045 492 4.0\n",
"1704 WY 2000 12595 0.309491 0.045 494 4.0\n",
"1705 WY 2001 12808 0.300928 0.045 494 4.0\n",
"1706 WY 2002 13191 0.296244 0.045 499 4.0\n",
"1707 WY 2003 15535 0.289643 0.045 501 4.0\n",
"\n",
"[1708 rows x 7 columns]"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv('data/beer_tax.csv')\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "f3babea0-c486-4707-8fc2-654a8ef49ad8",
"metadata": {},
"source": [
"## Extracting Rows/Columns that Satisfy a Given Condition"
]
},
{
"cell_type": "markdown",
"id": "e7a3d028-84b1-4344-9c26-0c621c2ff071",
"metadata": {},
"source": [
"An important task we want to accomplish when looking at data is to filter out all rows that satisfy a certain condition. In other words, we want to extract all the rows **where** a certain thing is accomplished. Conditional selection helps us achieve exactly this, extracting all rows that satisfy a certain condition.\n",
"\n",
"In the [previous subchapter](loading-looking.ipynb), we briefly gave an example of how we can use `.loc` to extract all rows that match certain values. To be more precise, when we pass in certain values for `.loc` and `iloc` to use, they generate an array of True/False values and use that to determine which values to include. This is very similar to our prior discussion on [NumPy slicing](../02-prereqs/numpy.ipynb).\n",
"\n",
"For example, let's say we want all the data for the state of Alabama. Then, we would want all rows where `st_name` is `CA`. We can generate a Series of True/False values corresponding to whether each row has a `st_name` of `AL` using the code below."
]
},
{
"cell_type": "code",
"execution_count": 74,
"id": "0076d360-cf23-4db8-81e5-13175babd330",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 True\n",
"1 True\n",
"2 True\n",
"3 True\n",
"4 True\n",
" ... \n",
"1703 False\n",
"1704 False\n",
"1705 False\n",
"1706 False\n",
"1707 False\n",
"Name: st_name, Length: 1708, dtype: bool"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['st_name'] == 'AL'"
]
},
{
"cell_type": "markdown",
"id": "b6e91ccb-d608-4fa3-acc8-2e8599c078bb",
"metadata": {},
"source": [
"Now, to extract all rows where the state is Alabama, we can simply pass in this boolean series into the `.loc` operator."
]
},
{
"cell_type": "code",
"execution_count": 75,
"id": "6cc48f37-123a-4dd2-b07e-059531963054",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 72.341130 | \n",
" 2.37 | \n",
" 3450 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 69.304600 | \n",
" 2.37 | \n",
" 3497 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 2 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 67.149190 | \n",
" 2.37 | \n",
" 3539 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 3 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 63.217026 | \n",
" 2.37 | \n",
" 3580 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 4 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 56.933796 | \n",
" 2.37 | \n",
" 3627 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population salestax\n",
"0 AL 1970 33098 72.341130 2.37 3450 4.0\n",
"1 AL 1971 37598 69.304600 2.37 3497 4.0\n",
"2 AL 1972 42719 67.149190 2.37 3539 4.0\n",
"3 AL 1973 46203 63.217026 2.37 3580 4.0\n",
"4 AL 1974 49769 56.933796 2.37 3627 4.0"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[df['st_name'] == 'AL',:].head(5) # only showing the first 5 rows"
]
},
{
"cell_type": "markdown",
"id": "22d43d1a-2694-4044-a59e-35f8ebc3af87",
"metadata": {},
"source": [
"Similarly, we can also use the context dependent `[]` operator."
]
},
{
"cell_type": "code",
"execution_count": 76,
"id": "ded1e819-8528-40ec-95ca-e8e0333e3575",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
"
\n",
" \n",
" \n",
" \n",
" 29 | \n",
" AL | \n",
" 1999 | \n",
" 93828 | \n",
" 16.847755 | \n",
" 2.37 | \n",
" 4430 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 30 | \n",
" AL | \n",
" 2000 | \n",
" 94530 | \n",
" 16.299862 | \n",
" 2.37 | \n",
" 4452 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 31 | \n",
" AL | \n",
" 2001 | \n",
" 92893 | \n",
" 15.848877 | \n",
" 2.37 | \n",
" 4466 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 32 | \n",
" AL | \n",
" 2002 | \n",
" 94732 | \n",
" 15.602203 | \n",
" 2.37 | \n",
" 4479 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 33 | \n",
" AL | \n",
" 2003 | \n",
" 96188 | \n",
" 15.254544 | \n",
" 2.37 | \n",
" 4501 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population salestax\n",
"29 AL 1999 93828 16.847755 2.37 4430 4.0\n",
"30 AL 2000 94530 16.299862 2.37 4452 4.0\n",
"31 AL 2001 92893 15.848877 2.37 4466 4.0\n",
"32 AL 2002 94732 15.602203 2.37 4479 4.0\n",
"33 AL 2003 96188 15.254544 2.37 4501 4.0"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['st_name'] == 'AL'].tail(5) # only showing the last 5 rows"
]
},
{
"cell_type": "markdown",
"id": "38a90d4b-da5d-48d9-88c1-27805d82a454",
"metadata": {},
"source": [
"We could also use the same technique to extract all columns that are integers (but not floats, so they do not contain decimals).\n"
]
},
{
"cell_type": "code",
"execution_count": 82,
"id": "596a6119-5595-4258-ab04-325fdf73d5a0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" c_beer | \n",
" population | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1970 | \n",
" 33098 | \n",
" 3450 | \n",
"
\n",
" \n",
" 1 | \n",
" 1971 | \n",
" 37598 | \n",
" 3497 | \n",
"
\n",
" \n",
" 2 | \n",
" 1972 | \n",
" 42719 | \n",
" 3539 | \n",
"
\n",
" \n",
" 3 | \n",
" 1973 | \n",
" 46203 | \n",
" 3580 | \n",
"
\n",
" \n",
" 4 | \n",
" 1974 | \n",
" 49769 | \n",
" 3627 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" 1999 | \n",
" 12423 | \n",
" 492 | \n",
"
\n",
" \n",
" 1704 | \n",
" 2000 | \n",
" 12595 | \n",
" 494 | \n",
"
\n",
" \n",
" 1705 | \n",
" 2001 | \n",
" 12808 | \n",
" 494 | \n",
"
\n",
" \n",
" 1706 | \n",
" 2002 | \n",
" 13191 | \n",
" 499 | \n",
"
\n",
" \n",
" 1707 | \n",
" 2003 | \n",
" 15535 | \n",
" 501 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 3 columns
\n",
"
"
],
"text/plain": [
" year c_beer population\n",
"0 1970 33098 3450\n",
"1 1971 37598 3497\n",
"2 1972 42719 3539\n",
"3 1973 46203 3580\n",
"4 1974 49769 3627\n",
"... ... ... ...\n",
"1703 1999 12423 492\n",
"1704 2000 12595 494\n",
"1705 2001 12808 494\n",
"1706 2002 13191 499\n",
"1707 2003 15535 501\n",
"\n",
"[1708 rows x 3 columns]"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[:,df.dtypes == int]"
]
},
{
"cell_type": "markdown",
"id": "c69cb89b-3b4b-487a-855e-9bca6cdd386f",
"metadata": {},
"source": [
"Alternatively, if we wanted to extract all the rows where the index is less than 30, we could similarly use the `.iloc` operator."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "881ee775-0213-462f-972c-b06653555be2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ True, True, True, ..., False, False, False])"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.index < 30"
]
},
{
"cell_type": "code",
"execution_count": 83,
"id": "fb2c183f-4bd2-443e-81fb-4443ba279579",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
"
\n",
" \n",
" \n",
" \n",
" 25 | \n",
" AL | \n",
" 1995 | \n",
" 86991 | \n",
" 18.417560 | \n",
" 2.37 | \n",
" 4297 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 26 | \n",
" AL | \n",
" 1996 | \n",
" 89534 | \n",
" 17.889332 | \n",
" 2.37 | \n",
" 4331 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 27 | \n",
" AL | \n",
" 1997 | \n",
" 88807 | \n",
" 17.488075 | \n",
" 2.37 | \n",
" 4368 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 28 | \n",
" AL | \n",
" 1998 | \n",
" 92082 | \n",
" 17.219852 | \n",
" 2.37 | \n",
" 4405 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 29 | \n",
" AL | \n",
" 1999 | \n",
" 93828 | \n",
" 16.847755 | \n",
" 2.37 | \n",
" 4430 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population salestax\n",
"25 AL 1995 86991 18.417560 2.37 4297 4.0\n",
"26 AL 1996 89534 17.889332 2.37 4331 4.0\n",
"27 AL 1997 88807 17.488075 2.37 4368 4.0\n",
"28 AL 1998 92082 17.219852 2.37 4405 4.0\n",
"29 AL 1999 93828 16.847755 2.37 4430 4.0"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[df.index < 30,:].tail(5) # only showing the last 5 rows"
]
},
{
"cell_type": "markdown",
"id": "becd185c-66f0-4645-8ce0-e2c1562da4ce",
"metadata": {},
"source": [
"As you can imagine, quickly and easily generating these boolean arrays/Series is very important for accessing our elements!\n"
]
},
{
"cell_type": "markdown",
"id": "531350fd-f08d-4b10-9c45-22bea0dfdbb0",
"metadata": {},
"source": [
"### Logical Operators\n"
]
},
{
"cell_type": "markdown",
"id": "68874d06-aa17-45b6-bb22-0ab06848d943",
"metadata": {},
"source": [
"\n",
"What if we have multiple conditions we want to filter on? Let's say we want to look at all the rows where `beer_tax` is less than 3 and `btax_dollars` is less than 0.5. We can generate boolean filters for both of those conditions."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "1ad6af1f-cb90-41d6-8f5b-6d31004a7169",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
" ... \n",
"1703 True\n",
"1704 True\n",
"1705 True\n",
"1706 True\n",
"1707 True\n",
"Name: beer_tax, Length: 1708, dtype: bool"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['beer_tax'] < 3"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "0d0b98e9-e814-4123-a6b2-48b7cd20d357",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
" ... \n",
"1703 True\n",
"1704 True\n",
"1705 True\n",
"1706 True\n",
"1707 True\n",
"Name: btax_dollars, Length: 1708, dtype: bool"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['btax_dollars'] < 0.5"
]
},
{
"cell_type": "markdown",
"id": "e941078f-b6b2-465a-9751-8b6f08e3864c",
"metadata": {},
"source": [
"How can we combine both of these boolean arrays? Well, since we're interested in cases where both of these are true, we can use the `&` operator. The `&` operator is the logical way of expressing you only want the cases where elements from both the arrays are true.\n",
"\n",
"The code is shown below. Note the parenthesis around both the boolean arrays - this is an absolutely crucial piece of syntax!"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "2c1222be-2bcd-453d-97db-916dcc6a88aa",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
"
\n",
" \n",
" \n",
" \n",
" 77 | \n",
" AZ | \n",
" 1979 | \n",
" 77684 | \n",
" 2.936328 | \n",
" 0.180 | \n",
" 2636 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 78 | \n",
" AZ | \n",
" 1980 | \n",
" 79660 | \n",
" 2.587105 | \n",
" 0.180 | \n",
" 2736 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 79 | \n",
" AZ | \n",
" 1981 | \n",
" 86283 | \n",
" 2.345187 | \n",
" 0.180 | \n",
" 2810 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 80 | \n",
" AZ | \n",
" 1982 | \n",
" 86249 | \n",
" 2.209093 | \n",
" 0.180 | \n",
" 2890 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 81 | \n",
" AZ | \n",
" 1983 | \n",
" 83067 | \n",
" 2.140336 | \n",
" 0.180 | \n",
" 2969 | \n",
" 4.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
628 rows × 7 columns
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population salestax\n",
"77 AZ 1979 77684 2.936328 0.180 2636 4.0\n",
"78 AZ 1980 79660 2.587105 0.180 2736 4.0\n",
"79 AZ 1981 86283 2.345187 0.180 2810 4.0\n",
"80 AZ 1982 86249 2.209093 0.180 2890 4.0\n",
"81 AZ 1983 83067 2.140336 0.180 2969 4.0\n",
"... ... ... ... ... ... ... ...\n",
"1703 WY 1999 12423 0.319894 0.045 492 4.0\n",
"1704 WY 2000 12595 0.309491 0.045 494 4.0\n",
"1705 WY 2001 12808 0.300928 0.045 494 4.0\n",
"1706 WY 2002 13191 0.296244 0.045 499 4.0\n",
"1707 WY 2003 15535 0.289643 0.045 501 4.0\n",
"\n",
"[628 rows x 7 columns]"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[(df['beer_tax'] < 3) & (df['btax_dollars'] < 0.5)]"
]
},
{
"cell_type": "markdown",
"id": "1015594b-fac5-486a-b710-3974368e5614",
"metadata": {},
"source": [
"Alternatively, if we were looking for the states where either `beer_tax` is less than 3 or `btax_dollars` is less than 0.5, we could use the logical `|` operator. The `|` operator is the logical way of expressing you only want the cases where an element from either array is true."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "e3c69325-fcd5-4135-a86a-ab9b7419ec6a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
"
\n",
" \n",
" \n",
" \n",
" 68 | \n",
" AZ | \n",
" 1970 | \n",
" 38604 | \n",
" 5.494264 | \n",
" 0.180 | \n",
" 1795 | \n",
" 3.0 | \n",
"
\n",
" \n",
" 69 | \n",
" AZ | \n",
" 1971 | \n",
" 41837 | \n",
" 5.263641 | \n",
" 0.180 | \n",
" 1896 | \n",
" 3.0 | \n",
"
\n",
" \n",
" 70 | \n",
" AZ | \n",
" 1972 | \n",
" 47949 | \n",
" 5.099939 | \n",
" 0.180 | \n",
" 2008 | \n",
" 3.0 | \n",
"
\n",
" \n",
" 71 | \n",
" AZ | \n",
" 1973 | \n",
" 53380 | \n",
" 4.801294 | \n",
" 0.180 | \n",
" 2124 | \n",
" 3.0 | \n",
"
\n",
" \n",
" 72 | \n",
" AZ | \n",
" 1974 | \n",
" 58188 | \n",
" 4.324086 | \n",
" 0.180 | \n",
" 2223 | \n",
" 3.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
1144 rows × 7 columns
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population salestax\n",
"68 AZ 1970 38604 5.494264 0.180 1795 3.0\n",
"69 AZ 1971 41837 5.263641 0.180 1896 3.0\n",
"70 AZ 1972 47949 5.099939 0.180 2008 3.0\n",
"71 AZ 1973 53380 4.801294 0.180 2124 3.0\n",
"72 AZ 1974 58188 4.324086 0.180 2223 3.0\n",
"... ... ... ... ... ... ... ...\n",
"1703 WY 1999 12423 0.319894 0.045 492 4.0\n",
"1704 WY 2000 12595 0.309491 0.045 494 4.0\n",
"1705 WY 2001 12808 0.300928 0.045 494 4.0\n",
"1706 WY 2002 13191 0.296244 0.045 499 4.0\n",
"1707 WY 2003 15535 0.289643 0.045 501 4.0\n",
"\n",
"[1144 rows x 7 columns]"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[(df['beer_tax'] < 3) | (df['btax_dollars'] < 0.5)]"
]
},
{
"cell_type": "markdown",
"id": "b160e0f1-64c5-4501-8ba1-eeaf2b781728",
"metadata": {},
"source": [
"What about cases where either `beer_tax` is less than 3 or `btax_dollars` is less than 0.5, but not both? Well, here we could use the `^` operator, which is the logical way of expressing you only want the cases where an element from either of the arrays is true, but both the elements are not true."
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "6f28bfb3-006c-483f-96c4-b1277bc9b29a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
"
\n",
" \n",
" \n",
" \n",
" 68 | \n",
" AZ | \n",
" 1970 | \n",
" 38604 | \n",
" 5.494264 | \n",
" 0.1800 | \n",
" 1795 | \n",
" 3.0 | \n",
"
\n",
" \n",
" 69 | \n",
" AZ | \n",
" 1971 | \n",
" 41837 | \n",
" 5.263641 | \n",
" 0.1800 | \n",
" 1896 | \n",
" 3.0 | \n",
"
\n",
" \n",
" 70 | \n",
" AZ | \n",
" 1972 | \n",
" 47949 | \n",
" 5.099939 | \n",
" 0.1800 | \n",
" 2008 | \n",
" 3.0 | \n",
"
\n",
" \n",
" 71 | \n",
" AZ | \n",
" 1973 | \n",
" 53380 | \n",
" 4.801294 | \n",
" 0.1800 | \n",
" 2124 | \n",
" 3.0 | \n",
"
\n",
" \n",
" 72 | \n",
" AZ | \n",
" 1974 | \n",
" 58188 | \n",
" 4.324086 | \n",
" 0.1800 | \n",
" 2223 | \n",
" 3.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1644 | \n",
" WI | \n",
" 1972 | \n",
" 128725 | \n",
" 4.113951 | \n",
" 0.1452 | \n",
" 4502 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1645 | \n",
" WI | \n",
" 1973 | \n",
" 133792 | \n",
" 3.873043 | \n",
" 0.1452 | \n",
" 4524 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1646 | \n",
" WI | \n",
" 1974 | \n",
" 136807 | \n",
" 3.488096 | \n",
" 0.1452 | \n",
" 4546 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1647 | \n",
" WI | \n",
" 1975 | \n",
" 146350 | \n",
" 3.196341 | \n",
" 0.1452 | \n",
" 4579 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1648 | \n",
" WI | \n",
" 1976 | \n",
" 142811 | \n",
" 3.022199 | \n",
" 0.1452 | \n",
" 4596 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
516 rows × 7 columns
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population salestax\n",
"68 AZ 1970 38604 5.494264 0.1800 1795 3.0\n",
"69 AZ 1971 41837 5.263641 0.1800 1896 3.0\n",
"70 AZ 1972 47949 5.099939 0.1800 2008 3.0\n",
"71 AZ 1973 53380 4.801294 0.1800 2124 3.0\n",
"72 AZ 1974 58188 4.324086 0.1800 2223 3.0\n",
"... ... ... ... ... ... ... ...\n",
"1644 WI 1972 128725 4.113951 0.1452 4502 4.0\n",
"1645 WI 1973 133792 3.873043 0.1452 4524 4.0\n",
"1646 WI 1974 136807 3.488096 0.1452 4546 4.0\n",
"1647 WI 1975 146350 3.196341 0.1452 4579 4.0\n",
"1648 WI 1976 142811 3.022199 0.1452 4596 4.0\n",
"\n",
"[516 rows x 7 columns]"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[(df['beer_tax'] < 3) ^ (df['btax_dollars'] < 0.5)]"
]
},
{
"cell_type": "markdown",
"id": "7b5f596b-1892-47e1-b19e-8259dc43b39d",
"metadata": {},
"source": [
"Notice how the number of columns with the `&` operator plus the number of columns with the `^` operator sum to the number of columns with the `|` operator (628 + 516 = 1144)."
]
},
{
"cell_type": "markdown",
"id": "ea047b82-1ba3-41a1-9d78-24683b9d98e7",
"metadata": {},
"source": [
"Finally, what if you're interested in the cases where something isn't true? For example, what if you're interested in all the cases where the `st_name` isn't `AL`? Well, then you can use the `~` operator, which logically represents the inverse of a boolean array (so it converts all true values into false and all false values intro true). "
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "09575326-d347-46f2-9f37-25e4f2c73dd0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 True\n",
"1 True\n",
"2 True\n",
"3 True\n",
"4 True\n",
" ... \n",
"1703 False\n",
"1704 False\n",
"1705 False\n",
"1706 False\n",
"1707 False\n",
"Name: st_name, Length: 1708, dtype: bool"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['st_name'] == 'AL'"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "c3f6fbd6-45aa-4483-a77c-2760ad4fb58e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
" ... \n",
"1703 True\n",
"1704 True\n",
"1705 True\n",
"1706 True\n",
"1707 True\n",
"Name: st_name, Length: 1708, dtype: bool"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"~(df['st_name'] == 'AL')"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "6912de3e-1ecb-4448-a5e5-eba751659318",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
"
\n",
" \n",
" \n",
" \n",
" 34 | \n",
" AK | \n",
" 1970 | \n",
" 5372 | \n",
" 13.735660 | \n",
" 0.5625 | \n",
" 304 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 35 | \n",
" AK | \n",
" 1971 | \n",
" 6336 | \n",
" 13.159102 | \n",
" 0.5625 | \n",
" 316 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 36 | \n",
" AK | \n",
" 1972 | \n",
" 6038 | \n",
" 12.749847 | \n",
" 0.5625 | \n",
" 324 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 37 | \n",
" AK | \n",
" 1973 | \n",
" 6453 | \n",
" 12.003234 | \n",
" 0.5625 | \n",
" 331 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 38 | \n",
" AK | \n",
" 1974 | \n",
" 7598 | \n",
" 10.810215 | \n",
" 0.5625 | \n",
" 341 | \n",
" 0.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.0450 | \n",
" 492 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.0450 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.0450 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.0450 | \n",
" 499 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.0450 | \n",
" 501 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
1674 rows × 7 columns
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population salestax\n",
"34 AK 1970 5372 13.735660 0.5625 304 0.0\n",
"35 AK 1971 6336 13.159102 0.5625 316 0.0\n",
"36 AK 1972 6038 12.749847 0.5625 324 0.0\n",
"37 AK 1973 6453 12.003234 0.5625 331 0.0\n",
"38 AK 1974 7598 10.810215 0.5625 341 0.0\n",
"... ... ... ... ... ... ... ...\n",
"1703 WY 1999 12423 0.319894 0.0450 492 4.0\n",
"1704 WY 2000 12595 0.309491 0.0450 494 4.0\n",
"1705 WY 2001 12808 0.300928 0.0450 494 4.0\n",
"1706 WY 2002 13191 0.296244 0.0450 499 4.0\n",
"1707 WY 2003 15535 0.289643 0.0450 501 4.0\n",
"\n",
"[1674 rows x 7 columns]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[~(df['st_name'] == 'AL')]"
]
},
{
"cell_type": "markdown",
"id": "53fb7f82-73ed-4099-89ad-d28d099ad2d0",
"metadata": {},
"source": [
"Here's a table summarizing all the different logical operators to combine boolean arrays.\n",
"\n",
"Symbol | Usage | Meaning \n",
"------ | ---------- | -------------------------------------\n",
"~ | ~p | Returns the negation of p\n",
"| | p | q | p OR q\n",
"& | p & q | p AND q\n",
"^ | p ^ q | p XOR q (exclusive or)"
]
},
{
"cell_type": "markdown",
"id": "a7fa0843-25f4-43b0-b9c0-8e74afdc7c68",
"metadata": {},
"source": [
"### Multiple Labels"
]
},
{
"cell_type": "markdown",
"id": "0af6af74-3152-4067-8fdc-94a3a309e67c",
"metadata": {},
"source": [
"If you want to keep all rows where a certain column is in a list of given values, you can use the `.isin()` method. This is logically equivalent to using the `|` operator multiple times, but it's easier. For example, below we get all rows corresponding to the states of California, Florida, Texas and Nebraska."
]
},
{
"cell_type": "code",
"execution_count": 94,
"id": "1e2ff2ae-9cb2-4dd6-a81e-98f2dfe2c90d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1707 False\n",
"1706 False\n",
"1705 False\n",
"1704 False\n",
"1703 False\n",
" ... \n",
"4 False\n",
"3 False\n",
"2 False\n",
"1 False\n",
"0 False\n",
"Name: st_name, Length: 1708, dtype: bool"
]
},
"execution_count": 94,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['st_name'].isin(['CA','FL','TX','NE'])"
]
},
{
"cell_type": "code",
"execution_count": 93,
"id": "7c97c42a-00e3-4b90-8e5f-3ba2c419096f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
"
\n",
" \n",
" \n",
" \n",
" 1571 | \n",
" CA | \n",
" 1970 | \n",
" 363645 | \n",
" 2.747132 | \n",
" 0.0900 | \n",
" 20023 | \n",
" 5.000 | \n",
"
\n",
" \n",
" 1570 | \n",
" CA | \n",
" 1971 | \n",
" 380397 | \n",
" 2.631820 | \n",
" 0.0900 | \n",
" 20346 | \n",
" 5.000 | \n",
"
\n",
" \n",
" 1569 | \n",
" CA | \n",
" 1972 | \n",
" 401928 | \n",
" 2.549970 | \n",
" 0.0900 | \n",
" 20585 | \n",
" 5.000 | \n",
"
\n",
" \n",
" 1568 | \n",
" CA | \n",
" 1973 | \n",
" 417463 | \n",
" 2.400647 | \n",
" 0.0900 | \n",
" 20869 | \n",
" 5.167 | \n",
"
\n",
" \n",
" 1567 | \n",
" CA | \n",
" 1974 | \n",
" 464237 | \n",
" 2.162043 | \n",
" 0.0900 | \n",
" 21174 | \n",
" 5.250 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 240 | \n",
" TX | \n",
" 1999 | \n",
" 546353 | \n",
" 3.166952 | \n",
" 0.4455 | \n",
" 20558 | \n",
" 6.250 | \n",
"
\n",
" \n",
" 239 | \n",
" TX | \n",
" 2000 | \n",
" 556051 | \n",
" 3.063962 | \n",
" 0.4455 | \n",
" 20949 | \n",
" 6.250 | \n",
"
\n",
" \n",
" 238 | \n",
" TX | \n",
" 2001 | \n",
" 557425 | \n",
" 2.979188 | \n",
" 0.4455 | \n",
" 21341 | \n",
" 6.250 | \n",
"
\n",
" \n",
" 237 | \n",
" TX | \n",
" 2002 | \n",
" 563150 | \n",
" 2.932819 | \n",
" 0.4455 | \n",
" 21737 | \n",
" 6.250 | \n",
"
\n",
" \n",
" 236 | \n",
" TX | \n",
" 2003 | \n",
" 558837 | \n",
" 2.867468 | \n",
" 0.4455 | \n",
" 22119 | \n",
" 6.250 | \n",
"
\n",
" \n",
"
\n",
"
136 rows × 7 columns
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population salestax\n",
"1571 CA 1970 363645 2.747132 0.0900 20023 5.000\n",
"1570 CA 1971 380397 2.631820 0.0900 20346 5.000\n",
"1569 CA 1972 401928 2.549970 0.0900 20585 5.000\n",
"1568 CA 1973 417463 2.400647 0.0900 20869 5.167\n",
"1567 CA 1974 464237 2.162043 0.0900 21174 5.250\n",
"... ... ... ... ... ... ... ...\n",
"240 TX 1999 546353 3.166952 0.4455 20558 6.250\n",
"239 TX 2000 556051 3.063962 0.4455 20949 6.250\n",
"238 TX 2001 557425 2.979188 0.4455 21341 6.250\n",
"237 TX 2002 563150 2.932819 0.4455 21737 6.250\n",
"236 TX 2003 558837 2.867468 0.4455 22119 6.250\n",
"\n",
"[136 rows x 7 columns]"
]
},
"execution_count": 93,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['st_name'].isin(['CA','FL','TX','NE'])]"
]
},
{
"cell_type": "markdown",
"id": "c6487216-d34a-4665-857a-04b7bb4d514e",
"metadata": {},
"source": [
"### Editing Rows"
]
},
{
"cell_type": "markdown",
"id": "45ab7829-6fb4-4ae2-a64f-7267a1882e5c",
"metadata": {},
"source": [
"We can also use conditional selection to edit certain rows. For example, let's say we found out we made an error with measuring `beer_tax` for the state of Alabama, and the true beer tax is actually 0.1 higher than what we've measured. We can easily use boolean filtering to select the columns where we've made this mistake and fix them!"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "bbc80134-aa0d-4a2e-96d9-c5d674d2f8b4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 72.341130\n",
"1 69.304600\n",
"2 67.149190\n",
"3 63.217026\n",
"4 56.933796\n",
"5 52.171677\n",
"6 49.329280\n",
"7 46.317430\n",
"8 43.049637\n",
"9 38.661655\n",
"10 34.063545\n",
"11 30.878284\n",
"12 29.086386\n",
"13 28.181087\n",
"14 27.014784\n",
"15 26.085838\n",
"16 25.609820\n",
"17 24.708065\n",
"18 23.726425\n",
"19 22.635775\n",
"20 21.475412\n",
"21 20.608194\n",
"22 20.005960\n",
"23 19.424470\n",
"24 18.939516\n",
"25 18.417560\n",
"26 17.889332\n",
"27 17.488075\n",
"28 17.219852\n",
"29 16.847755\n",
"30 16.299862\n",
"31 15.848877\n",
"32 15.602203\n",
"33 15.254544\n",
"Name: beer_tax, dtype: float64"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[df['st_name']=='AL', 'beer_tax']"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "30c7bbca-0058-41c9-9475-d967099a0376",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 72.441130\n",
"1 69.404600\n",
"2 67.249190\n",
"3 63.317026\n",
"4 57.033796\n",
"5 52.271677\n",
"6 49.429280\n",
"7 46.417430\n",
"8 43.149637\n",
"9 38.761655\n",
"10 34.163545\n",
"11 30.978284\n",
"12 29.186386\n",
"13 28.281087\n",
"14 27.114784\n",
"15 26.185838\n",
"16 25.709820\n",
"17 24.808065\n",
"18 23.826425\n",
"19 22.735775\n",
"20 21.575412\n",
"21 20.708194\n",
"22 20.105960\n",
"23 19.524470\n",
"24 19.039516\n",
"25 18.517560\n",
"26 17.989332\n",
"27 17.588075\n",
"28 17.319852\n",
"29 16.947755\n",
"30 16.399862\n",
"31 15.948877\n",
"32 15.702203\n",
"33 15.354544\n",
"Name: beer_tax, dtype: float64"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[df['st_name']=='AL', 'beer_tax'] + 0.1"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "cd414b35-37b4-4ad1-912a-73f3215b2c34",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 72.441130 | \n",
" 2.370 | \n",
" 3450 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 69.404600 | \n",
" 2.370 | \n",
" 3497 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 2 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 67.249190 | \n",
" 2.370 | \n",
" 3539 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 3 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 63.317026 | \n",
" 2.370 | \n",
" 3580 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 4 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 57.033796 | \n",
" 2.370 | \n",
" 3627 | \n",
" 4.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 7 columns
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population salestax\n",
"0 AL 1970 33098 72.441130 2.370 3450 4.0\n",
"1 AL 1971 37598 69.404600 2.370 3497 4.0\n",
"2 AL 1972 42719 67.249190 2.370 3539 4.0\n",
"3 AL 1973 46203 63.317026 2.370 3580 4.0\n",
"4 AL 1974 49769 57.033796 2.370 3627 4.0\n",
"... ... ... ... ... ... ... ...\n",
"1703 WY 1999 12423 0.319894 0.045 492 4.0\n",
"1704 WY 2000 12595 0.309491 0.045 494 4.0\n",
"1705 WY 2001 12808 0.300928 0.045 494 4.0\n",
"1706 WY 2002 13191 0.296244 0.045 499 4.0\n",
"1707 WY 2003 15535 0.289643 0.045 501 4.0\n",
"\n",
"[1708 rows x 7 columns]"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[df['st_name']=='AL', 'beer_tax'] = df.loc[df['st_name']=='AL', 'beer_tax'] + 0.1\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "b037833f-09e5-4479-b6e2-f2e7054692b8",
"metadata": {},
"source": [
"It's that simple! However, since our data was (presumably) correctly measured, let us revert this change."
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "504d089f-bfe0-4160-b86a-59965390a643",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 72.341130 | \n",
" 2.370 | \n",
" 3450 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 69.304600 | \n",
" 2.370 | \n",
" 3497 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 2 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 67.149190 | \n",
" 2.370 | \n",
" 3539 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 3 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 63.217026 | \n",
" 2.370 | \n",
" 3580 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 4 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 56.933796 | \n",
" 2.370 | \n",
" 3627 | \n",
" 4.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 7 columns
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population salestax\n",
"0 AL 1970 33098 72.341130 2.370 3450 4.0\n",
"1 AL 1971 37598 69.304600 2.370 3497 4.0\n",
"2 AL 1972 42719 67.149190 2.370 3539 4.0\n",
"3 AL 1973 46203 63.217026 2.370 3580 4.0\n",
"4 AL 1974 49769 56.933796 2.370 3627 4.0\n",
"... ... ... ... ... ... ... ...\n",
"1703 WY 1999 12423 0.319894 0.045 492 4.0\n",
"1704 WY 2000 12595 0.309491 0.045 494 4.0\n",
"1705 WY 2001 12808 0.300928 0.045 494 4.0\n",
"1706 WY 2002 13191 0.296244 0.045 499 4.0\n",
"1707 WY 2003 15535 0.289643 0.045 501 4.0\n",
"\n",
"[1708 rows x 7 columns]"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[df['st_name']=='AL', 'beer_tax'] = df.loc[df['st_name']=='AL', 'beer_tax'] - 0.1\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "df2d4b89-61aa-4e52-a737-21ff0a62a7df",
"metadata": {},
"source": [
"## Modifying Columns and Indices"
]
},
{
"cell_type": "markdown",
"id": "c9cfa207-9411-4396-b2c6-7dba79f518ed",
"metadata": {},
"source": [
"Now, we move on to modifying the columns in our DataFrames. The first task we will consider is how to add a new column.\n"
]
},
{
"cell_type": "markdown",
"id": "21ac5fd4-8658-42e1-8d41-aa4881fcf067",
"metadata": {},
"source": [
"### Adding Columns\n",
"\n",
"Let's say we want to add a new column where we calculate the total percentage taxed by combining the `beer_tax` and `salestax`. Both of these % taxes compound, so we will need to be careful with the math! Below, we calculate a series with the sum of these two taxes, as a percent."
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "2abe63f4-e00b-4b3e-9736-c26d59821dec",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 79.234775\n",
"1 76.076784\n",
"2 73.835158\n",
"3 69.745707\n",
"4 63.211148\n",
" ... \n",
"1703 4.332690\n",
"1704 4.321871\n",
"1705 4.312965\n",
"1706 4.308094\n",
"1707 4.301229\n",
"Length: 1708, dtype: float64"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"((df['beer_tax'] / 100 + 1) * (df['salestax'] / 100 + 1) - 1) * 100"
]
},
{
"cell_type": "markdown",
"id": "44d9d3c7-9a23-42c3-9f0a-45049a972a5f",
"metadata": {},
"source": [
"Now, we want to set this series equal to a new column named `total_percent_tax` in our dataframe. The syntax for doing this follows `df[New_Column_Name] = Series/Array_Name`, and is shown below."
]
},
{
"cell_type": "code",
"execution_count": 62,
"id": "a28e0a02-9f31-45cf-b6d9-d55596bf5aa6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
" total_percent_tax | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 72.341130 | \n",
" 2.37 | \n",
" 3450 | \n",
" 4.0 | \n",
" 79.234775 | \n",
"
\n",
" \n",
" 1 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 69.304600 | \n",
" 2.37 | \n",
" 3497 | \n",
" 4.0 | \n",
" 76.076784 | \n",
"
\n",
" \n",
" 2 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 67.149190 | \n",
" 2.37 | \n",
" 3539 | \n",
" 4.0 | \n",
" 73.835158 | \n",
"
\n",
" \n",
" 3 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 63.217026 | \n",
" 2.37 | \n",
" 3580 | \n",
" 4.0 | \n",
" 69.745707 | \n",
"
\n",
" \n",
" 4 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 56.933796 | \n",
" 2.37 | \n",
" 3627 | \n",
" 4.0 | \n",
" 63.211148 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population salestax \\\n",
"0 AL 1970 33098 72.341130 2.37 3450 4.0 \n",
"1 AL 1971 37598 69.304600 2.37 3497 4.0 \n",
"2 AL 1972 42719 67.149190 2.37 3539 4.0 \n",
"3 AL 1973 46203 63.217026 2.37 3580 4.0 \n",
"4 AL 1974 49769 56.933796 2.37 3627 4.0 \n",
"\n",
" total_percent_tax \n",
"0 79.234775 \n",
"1 76.076784 \n",
"2 73.835158 \n",
"3 69.745707 \n",
"4 63.211148 "
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['total_percent_tax'] = ((df['beer_tax'] / 100 + 1) * (df['salestax'] / 100 + 1) - 1) * 100\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "5021012b-2c15-4818-965d-77b38f24c2a3",
"metadata": {},
"source": [
"Voila, our new column has been added!\n"
]
},
{
"cell_type": "markdown",
"id": "07017740-ca0d-469e-bae1-e156bcfa3fbb",
"metadata": {},
"source": [
"### Renaming Columns\n",
"\n",
"Looking at the table above, `c_beer` feels like a confusing name. Instead, let us change to `beer_consumption`, a more informative name! To do so, we can use the DataFrame [`rename` method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename.html), as shown below."
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "6e5f00e7-a1e7-409e-ae03-efc10627a8b7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" beer_consumption | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
" total_percent_tax | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 72.341130 | \n",
" 2.370 | \n",
" 3450 | \n",
" 4.0 | \n",
" 79.234775 | \n",
"
\n",
" \n",
" 1 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 69.304600 | \n",
" 2.370 | \n",
" 3497 | \n",
" 4.0 | \n",
" 76.076784 | \n",
"
\n",
" \n",
" 2 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 67.149190 | \n",
" 2.370 | \n",
" 3539 | \n",
" 4.0 | \n",
" 73.835158 | \n",
"
\n",
" \n",
" 3 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 63.217026 | \n",
" 2.370 | \n",
" 3580 | \n",
" 4.0 | \n",
" 69.745707 | \n",
"
\n",
" \n",
" 4 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 56.933796 | \n",
" 2.370 | \n",
" 3627 | \n",
" 4.0 | \n",
" 63.211148 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.0 | \n",
" 4.332690 | \n",
"
\n",
" \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
" 4.321871 | \n",
"
\n",
" \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
" 4.312965 | \n",
"
\n",
" \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.0 | \n",
" 4.308094 | \n",
"
\n",
" \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.0 | \n",
" 4.301229 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 8 columns
\n",
"
"
],
"text/plain": [
" st_name year beer_consumption beer_tax btax_dollars population \\\n",
"0 AL 1970 33098 72.341130 2.370 3450 \n",
"1 AL 1971 37598 69.304600 2.370 3497 \n",
"2 AL 1972 42719 67.149190 2.370 3539 \n",
"3 AL 1973 46203 63.217026 2.370 3580 \n",
"4 AL 1974 49769 56.933796 2.370 3627 \n",
"... ... ... ... ... ... ... \n",
"1703 WY 1999 12423 0.319894 0.045 492 \n",
"1704 WY 2000 12595 0.309491 0.045 494 \n",
"1705 WY 2001 12808 0.300928 0.045 494 \n",
"1706 WY 2002 13191 0.296244 0.045 499 \n",
"1707 WY 2003 15535 0.289643 0.045 501 \n",
"\n",
" salestax total_percent_tax \n",
"0 4.0 79.234775 \n",
"1 4.0 76.076784 \n",
"2 4.0 73.835158 \n",
"3 4.0 69.745707 \n",
"4 4.0 63.211148 \n",
"... ... ... \n",
"1703 4.0 4.332690 \n",
"1704 4.0 4.321871 \n",
"1705 4.0 4.312965 \n",
"1706 4.0 4.308094 \n",
"1707 4.0 4.301229 \n",
"\n",
"[1708 rows x 8 columns]"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.rename(columns={\"c_beer\":\"beer_consumption\"})"
]
},
{
"cell_type": "markdown",
"id": "b2a4b1dc-a155-4376-9bda-c6657d9eed52",
"metadata": {},
"source": [
"As we can see, the `columns` parameter of the `rename` method takes in a dictionary where all the keys are the old column names and all the values are the new column names.\n",
"\n",
"Note, however, that the original DataFrame `df` has not changed yet. Instead, `pandas` methods just return a copy of the modified DataFrame for you to see. To change the original dataframe, you can either manually assign the new DataFrame to `df` or set the `inplace` parameter of the `rename` method to `True`."
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "c14c73e9-f624-4bd9-86ef-bbbc88670882",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
" total_percent_tax | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 72.341130 | \n",
" 2.370 | \n",
" 3450 | \n",
" 4.0 | \n",
" 79.234775 | \n",
"
\n",
" \n",
" 1 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 69.304600 | \n",
" 2.370 | \n",
" 3497 | \n",
" 4.0 | \n",
" 76.076784 | \n",
"
\n",
" \n",
" 2 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 67.149190 | \n",
" 2.370 | \n",
" 3539 | \n",
" 4.0 | \n",
" 73.835158 | \n",
"
\n",
" \n",
" 3 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 63.217026 | \n",
" 2.370 | \n",
" 3580 | \n",
" 4.0 | \n",
" 69.745707 | \n",
"
\n",
" \n",
" 4 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 56.933796 | \n",
" 2.370 | \n",
" 3627 | \n",
" 4.0 | \n",
" 63.211148 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.0 | \n",
" 4.332690 | \n",
"
\n",
" \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
" 4.321871 | \n",
"
\n",
" \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
" 4.312965 | \n",
"
\n",
" \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.0 | \n",
" 4.308094 | \n",
"
\n",
" \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.0 | \n",
" 4.301229 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 8 columns
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population salestax \\\n",
"0 AL 1970 33098 72.341130 2.370 3450 4.0 \n",
"1 AL 1971 37598 69.304600 2.370 3497 4.0 \n",
"2 AL 1972 42719 67.149190 2.370 3539 4.0 \n",
"3 AL 1973 46203 63.217026 2.370 3580 4.0 \n",
"4 AL 1974 49769 56.933796 2.370 3627 4.0 \n",
"... ... ... ... ... ... ... ... \n",
"1703 WY 1999 12423 0.319894 0.045 492 4.0 \n",
"1704 WY 2000 12595 0.309491 0.045 494 4.0 \n",
"1705 WY 2001 12808 0.300928 0.045 494 4.0 \n",
"1706 WY 2002 13191 0.296244 0.045 499 4.0 \n",
"1707 WY 2003 15535 0.289643 0.045 501 4.0 \n",
"\n",
" total_percent_tax \n",
"0 79.234775 \n",
"1 76.076784 \n",
"2 73.835158 \n",
"3 69.745707 \n",
"4 63.211148 \n",
"... ... \n",
"1703 4.332690 \n",
"1704 4.321871 \n",
"1705 4.312965 \n",
"1706 4.308094 \n",
"1707 4.301229 \n",
"\n",
"[1708 rows x 8 columns]"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df # original dataframe has not changed"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "44d5e9b3-ba34-4d28-a7ad-02c663e77722",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" beer_consumption | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
" total_percent_tax | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 72.341130 | \n",
" 2.370 | \n",
" 3450 | \n",
" 4.0 | \n",
" 79.234775 | \n",
"
\n",
" \n",
" 1 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 69.304600 | \n",
" 2.370 | \n",
" 3497 | \n",
" 4.0 | \n",
" 76.076784 | \n",
"
\n",
" \n",
" 2 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 67.149190 | \n",
" 2.370 | \n",
" 3539 | \n",
" 4.0 | \n",
" 73.835158 | \n",
"
\n",
" \n",
" 3 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 63.217026 | \n",
" 2.370 | \n",
" 3580 | \n",
" 4.0 | \n",
" 69.745707 | \n",
"
\n",
" \n",
" 4 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 56.933796 | \n",
" 2.370 | \n",
" 3627 | \n",
" 4.0 | \n",
" 63.211148 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.0 | \n",
" 4.332690 | \n",
"
\n",
" \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
" 4.321871 | \n",
"
\n",
" \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
" 4.312965 | \n",
"
\n",
" \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.0 | \n",
" 4.308094 | \n",
"
\n",
" \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.0 | \n",
" 4.301229 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 8 columns
\n",
"
"
],
"text/plain": [
" st_name year beer_consumption beer_tax btax_dollars population \\\n",
"0 AL 1970 33098 72.341130 2.370 3450 \n",
"1 AL 1971 37598 69.304600 2.370 3497 \n",
"2 AL 1972 42719 67.149190 2.370 3539 \n",
"3 AL 1973 46203 63.217026 2.370 3580 \n",
"4 AL 1974 49769 56.933796 2.370 3627 \n",
"... ... ... ... ... ... ... \n",
"1703 WY 1999 12423 0.319894 0.045 492 \n",
"1704 WY 2000 12595 0.309491 0.045 494 \n",
"1705 WY 2001 12808 0.300928 0.045 494 \n",
"1706 WY 2002 13191 0.296244 0.045 499 \n",
"1707 WY 2003 15535 0.289643 0.045 501 \n",
"\n",
" salestax total_percent_tax \n",
"0 4.0 79.234775 \n",
"1 4.0 76.076784 \n",
"2 4.0 73.835158 \n",
"3 4.0 69.745707 \n",
"4 4.0 63.211148 \n",
"... ... ... \n",
"1703 4.0 4.332690 \n",
"1704 4.0 4.321871 \n",
"1705 4.0 4.312965 \n",
"1706 4.0 4.308094 \n",
"1707 4.0 4.301229 \n",
"\n",
"[1708 rows x 8 columns]"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = df.rename(columns={\"c_beer\":\"beer_consumption\"}) # you can manually reassign the `df` variable\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "b673ec53-76d9-48e3-b50c-123635c167b5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" beer_consumption | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
" total_percent_tax | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 72.341130 | \n",
" 2.370 | \n",
" 3450 | \n",
" 4.0 | \n",
" 79.234775 | \n",
"
\n",
" \n",
" 1 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 69.304600 | \n",
" 2.370 | \n",
" 3497 | \n",
" 4.0 | \n",
" 76.076784 | \n",
"
\n",
" \n",
" 2 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 67.149190 | \n",
" 2.370 | \n",
" 3539 | \n",
" 4.0 | \n",
" 73.835158 | \n",
"
\n",
" \n",
" 3 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 63.217026 | \n",
" 2.370 | \n",
" 3580 | \n",
" 4.0 | \n",
" 69.745707 | \n",
"
\n",
" \n",
" 4 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 56.933796 | \n",
" 2.370 | \n",
" 3627 | \n",
" 4.0 | \n",
" 63.211148 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.0 | \n",
" 4.332690 | \n",
"
\n",
" \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
" 4.321871 | \n",
"
\n",
" \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
" 4.312965 | \n",
"
\n",
" \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.0 | \n",
" 4.308094 | \n",
"
\n",
" \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.0 | \n",
" 4.301229 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 8 columns
\n",
"
"
],
"text/plain": [
" st_name year beer_consumption beer_tax btax_dollars population \\\n",
"0 AL 1970 33098 72.341130 2.370 3450 \n",
"1 AL 1971 37598 69.304600 2.370 3497 \n",
"2 AL 1972 42719 67.149190 2.370 3539 \n",
"3 AL 1973 46203 63.217026 2.370 3580 \n",
"4 AL 1974 49769 56.933796 2.370 3627 \n",
"... ... ... ... ... ... ... \n",
"1703 WY 1999 12423 0.319894 0.045 492 \n",
"1704 WY 2000 12595 0.309491 0.045 494 \n",
"1705 WY 2001 12808 0.300928 0.045 494 \n",
"1706 WY 2002 13191 0.296244 0.045 499 \n",
"1707 WY 2003 15535 0.289643 0.045 501 \n",
"\n",
" salestax total_percent_tax \n",
"0 4.0 79.234775 \n",
"1 4.0 76.076784 \n",
"2 4.0 73.835158 \n",
"3 4.0 69.745707 \n",
"4 4.0 63.211148 \n",
"... ... ... \n",
"1703 4.0 4.332690 \n",
"1704 4.0 4.321871 \n",
"1705 4.0 4.312965 \n",
"1706 4.0 4.308094 \n",
"1707 4.0 4.301229 \n",
"\n",
"[1708 rows x 8 columns]"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.rename(columns={\"c_beer\":\"beer_consumption\"}, inplace = True) # or you can use the inplace method\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "1d3be649-6171-4a13-a539-4641b53bafb5",
"metadata": {},
"source": [
"### Dropping Columns\n",
"\n",
"Finally, let's say we want to drop a column. For this, we can use the DataFrame [`drop` method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html). For example, we drop the `salestax` column below."
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "84e04299-fffa-442e-aeba-9e40d3a3f11c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" beer_consumption | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" total_percent_tax | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 72.341130 | \n",
" 2.370 | \n",
" 3450 | \n",
" 79.234775 | \n",
"
\n",
" \n",
" 1 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 69.304600 | \n",
" 2.370 | \n",
" 3497 | \n",
" 76.076784 | \n",
"
\n",
" \n",
" 2 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 67.149190 | \n",
" 2.370 | \n",
" 3539 | \n",
" 73.835158 | \n",
"
\n",
" \n",
" 3 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 63.217026 | \n",
" 2.370 | \n",
" 3580 | \n",
" 69.745707 | \n",
"
\n",
" \n",
" 4 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 56.933796 | \n",
" 2.370 | \n",
" 3627 | \n",
" 63.211148 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.332690 | \n",
"
\n",
" \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.321871 | \n",
"
\n",
" \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.312965 | \n",
"
\n",
" \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.308094 | \n",
"
\n",
" \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.301229 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 7 columns
\n",
"
"
],
"text/plain": [
" st_name year beer_consumption beer_tax btax_dollars population \\\n",
"0 AL 1970 33098 72.341130 2.370 3450 \n",
"1 AL 1971 37598 69.304600 2.370 3497 \n",
"2 AL 1972 42719 67.149190 2.370 3539 \n",
"3 AL 1973 46203 63.217026 2.370 3580 \n",
"4 AL 1974 49769 56.933796 2.370 3627 \n",
"... ... ... ... ... ... ... \n",
"1703 WY 1999 12423 0.319894 0.045 492 \n",
"1704 WY 2000 12595 0.309491 0.045 494 \n",
"1705 WY 2001 12808 0.300928 0.045 494 \n",
"1706 WY 2002 13191 0.296244 0.045 499 \n",
"1707 WY 2003 15535 0.289643 0.045 501 \n",
"\n",
" total_percent_tax \n",
"0 79.234775 \n",
"1 76.076784 \n",
"2 73.835158 \n",
"3 69.745707 \n",
"4 63.211148 \n",
"... ... \n",
"1703 4.332690 \n",
"1704 4.321871 \n",
"1705 4.312965 \n",
"1706 4.308094 \n",
"1707 4.301229 \n",
"\n",
"[1708 rows x 7 columns]"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.drop(columns=[\"salestax\"])"
]
},
{
"cell_type": "markdown",
"id": "3b38fa5a-a517-4dc5-9ca5-9a5639410264",
"metadata": {},
"source": [
"Once again, the `drop` method has a `columns` parameter that takes in a list of the columns to drop (no need for a dictionary here as there are no key-value pairs). \n",
"\n",
"However, you can also simply pass in a list of the columns to drop and then use the `axis` parameter to specify they represent column names, as done below. We encourage you to read the documentation if you're interested in further detail!"
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "f5eabc61-2a97-44e8-af65-5a4dbf095b59",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" beer_consumption | \n",
" btax_dollars | \n",
" population | \n",
" total_percent_tax | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 2.370 | \n",
" 3450 | \n",
" 79.234775 | \n",
"
\n",
" \n",
" 1 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 2.370 | \n",
" 3497 | \n",
" 76.076784 | \n",
"
\n",
" \n",
" 2 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 2.370 | \n",
" 3539 | \n",
" 73.835158 | \n",
"
\n",
" \n",
" 3 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 2.370 | \n",
" 3580 | \n",
" 69.745707 | \n",
"
\n",
" \n",
" 4 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 2.370 | \n",
" 3627 | \n",
" 63.211148 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.332690 | \n",
"
\n",
" \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.321871 | \n",
"
\n",
" \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.312965 | \n",
"
\n",
" \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.308094 | \n",
"
\n",
" \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.301229 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 6 columns
\n",
"
"
],
"text/plain": [
" st_name year beer_consumption btax_dollars population \\\n",
"0 AL 1970 33098 2.370 3450 \n",
"1 AL 1971 37598 2.370 3497 \n",
"2 AL 1972 42719 2.370 3539 \n",
"3 AL 1973 46203 2.370 3580 \n",
"4 AL 1974 49769 2.370 3627 \n",
"... ... ... ... ... ... \n",
"1703 WY 1999 12423 0.045 492 \n",
"1704 WY 2000 12595 0.045 494 \n",
"1705 WY 2001 12808 0.045 494 \n",
"1706 WY 2002 13191 0.045 499 \n",
"1707 WY 2003 15535 0.045 501 \n",
"\n",
" total_percent_tax \n",
"0 79.234775 \n",
"1 76.076784 \n",
"2 73.835158 \n",
"3 69.745707 \n",
"4 63.211148 \n",
"... ... \n",
"1703 4.332690 \n",
"1704 4.321871 \n",
"1705 4.312965 \n",
"1706 4.308094 \n",
"1707 4.301229 \n",
"\n",
"[1708 rows x 6 columns]"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.drop([\"salestax\",\"beer_tax\"], axis = 'columns')"
]
},
{
"cell_type": "markdown",
"id": "b0c6e70c-e783-4e39-9b0c-45ec7dde814f",
"metadata": {},
"source": [
"Finally, once again the `drop` parameter does not modify the original table. To modify the original table, you can once again reassign it to the original table or use the `inplace` method."
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "62f72213-d7c8-469b-ac3d-c2ec4ecff473",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" total_percent_tax | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 72.341130 | \n",
" 2.370 | \n",
" 3450 | \n",
" 79.234775 | \n",
"
\n",
" \n",
" 1 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 69.304600 | \n",
" 2.370 | \n",
" 3497 | \n",
" 76.076784 | \n",
"
\n",
" \n",
" 2 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 67.149190 | \n",
" 2.370 | \n",
" 3539 | \n",
" 73.835158 | \n",
"
\n",
" \n",
" 3 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 63.217026 | \n",
" 2.370 | \n",
" 3580 | \n",
" 69.745707 | \n",
"
\n",
" \n",
" 4 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 56.933796 | \n",
" 2.370 | \n",
" 3627 | \n",
" 63.211148 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.332690 | \n",
"
\n",
" \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.321871 | \n",
"
\n",
" \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.312965 | \n",
"
\n",
" \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.308094 | \n",
"
\n",
" \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.301229 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 7 columns
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population \\\n",
"0 AL 1970 33098 72.341130 2.370 3450 \n",
"1 AL 1971 37598 69.304600 2.370 3497 \n",
"2 AL 1972 42719 67.149190 2.370 3539 \n",
"3 AL 1973 46203 63.217026 2.370 3580 \n",
"4 AL 1974 49769 56.933796 2.370 3627 \n",
"... ... ... ... ... ... ... \n",
"1703 WY 1999 12423 0.319894 0.045 492 \n",
"1704 WY 2000 12595 0.309491 0.045 494 \n",
"1705 WY 2001 12808 0.300928 0.045 494 \n",
"1706 WY 2002 13191 0.296244 0.045 499 \n",
"1707 WY 2003 15535 0.289643 0.045 501 \n",
"\n",
" total_percent_tax \n",
"0 79.234775 \n",
"1 76.076784 \n",
"2 73.835158 \n",
"3 69.745707 \n",
"4 63.211148 \n",
"... ... \n",
"1703 4.332690 \n",
"1704 4.321871 \n",
"1705 4.312965 \n",
"1706 4.308094 \n",
"1707 4.301229 \n",
"\n",
"[1708 rows x 7 columns]"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = df.drop(columns=[\"salestax\"]) # reassigning to df\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 63,
"id": "41cd79f2-08cf-48e4-bd6f-42dff29502b4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" btax_dollars | \n",
" population | \n",
" total_percent_tax | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 2.370 | \n",
" 3450 | \n",
" 79.234775 | \n",
"
\n",
" \n",
" 1 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 2.370 | \n",
" 3497 | \n",
" 76.076784 | \n",
"
\n",
" \n",
" 2 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 2.370 | \n",
" 3539 | \n",
" 73.835158 | \n",
"
\n",
" \n",
" 3 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 2.370 | \n",
" 3580 | \n",
" 69.745707 | \n",
"
\n",
" \n",
" 4 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 2.370 | \n",
" 3627 | \n",
" 63.211148 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.332690 | \n",
"
\n",
" \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.321871 | \n",
"
\n",
" \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.312965 | \n",
"
\n",
" \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.308094 | \n",
"
\n",
" \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.301229 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 6 columns
\n",
"
"
],
"text/plain": [
" st_name year c_beer btax_dollars population total_percent_tax\n",
"0 AL 1970 33098 2.370 3450 79.234775\n",
"1 AL 1971 37598 2.370 3497 76.076784\n",
"2 AL 1972 42719 2.370 3539 73.835158\n",
"3 AL 1973 46203 2.370 3580 69.745707\n",
"4 AL 1974 49769 2.370 3627 63.211148\n",
"... ... ... ... ... ... ...\n",
"1703 WY 1999 12423 0.045 492 4.332690\n",
"1704 WY 2000 12595 0.045 494 4.321871\n",
"1705 WY 2001 12808 0.045 494 4.312965\n",
"1706 WY 2002 13191 0.045 499 4.308094\n",
"1707 WY 2003 15535 0.045 501 4.301229\n",
"\n",
"[1708 rows x 6 columns]"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.drop([\"salestax\",\"beer_tax\"], axis = 'columns', inplace = True) # using inplace parameter\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "48c5d307-1b4d-4689-88b5-f7db5c6ee9e3",
"metadata": {},
"source": [
"### Modifying the Index"
]
},
{
"cell_type": "markdown",
"id": "7106188c-e900-46a1-a237-beac6720961a",
"metadata": {},
"source": [
"To modify the index of a DataFrame, there are three main methods, as discussed below.\n",
"\n",
"`.set_index()` Sets the index of the DataFrame to a particular column."
]
},
{
"cell_type": "code",
"execution_count": 84,
"id": "e565b2c9-3bca-4187-b66a-82bf455e32af",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
"
\n",
" \n",
" st_name | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 72.341130 | \n",
" 2.370 | \n",
" 3450 | \n",
" 4.0 | \n",
"
\n",
" \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 69.304600 | \n",
" 2.370 | \n",
" 3497 | \n",
" 4.0 | \n",
"
\n",
" \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 67.149190 | \n",
" 2.370 | \n",
" 3539 | \n",
" 4.0 | \n",
"
\n",
" \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 63.217026 | \n",
" 2.370 | \n",
" 3580 | \n",
" 4.0 | \n",
"
\n",
" \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 56.933796 | \n",
" 2.370 | \n",
" 3627 | \n",
" 4.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.0 | \n",
"
\n",
" \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.0 | \n",
"
\n",
" \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 6 columns
\n",
"
"
],
"text/plain": [
" year c_beer beer_tax btax_dollars population salestax\n",
"st_name \n",
"AL 1970 33098 72.341130 2.370 3450 4.0\n",
"AL 1971 37598 69.304600 2.370 3497 4.0\n",
"AL 1972 42719 67.149190 2.370 3539 4.0\n",
"AL 1973 46203 63.217026 2.370 3580 4.0\n",
"AL 1974 49769 56.933796 2.370 3627 4.0\n",
"... ... ... ... ... ... ...\n",
"WY 1999 12423 0.319894 0.045 492 4.0\n",
"WY 2000 12595 0.309491 0.045 494 4.0\n",
"WY 2001 12808 0.300928 0.045 494 4.0\n",
"WY 2002 13191 0.296244 0.045 499 4.0\n",
"WY 2003 15535 0.289643 0.045 501 4.0\n",
"\n",
"[1708 rows x 6 columns]"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.set_index('st_name', inplace = True)"
]
},
{
"cell_type": "markdown",
"id": "af0ffc7e-7553-4d60-8e7a-cac7611aed49",
"metadata": {},
"source": [
"`.reset_index()` converts the current index into a column and makes the new index as a default numbering from 0 to the very last row in the DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 85,
"id": "51648c41-a4a3-40bf-80fc-1644ec826798",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 72.341130 | \n",
" 2.370 | \n",
" 3450 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 69.304600 | \n",
" 2.370 | \n",
" 3497 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 67.149190 | \n",
" 2.370 | \n",
" 3539 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 63.217026 | \n",
" 2.370 | \n",
" 3580 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 56.933796 | \n",
" 2.370 | \n",
" 3627 | \n",
" 4.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 1703 | \n",
" 1703 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1704 | \n",
" 1704 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1705 | \n",
" 1705 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1706 | \n",
" 1706 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1707 | \n",
" 1707 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 8 columns
\n",
"
"
],
"text/plain": [
" index st_name year c_beer beer_tax btax_dollars population \\\n",
"0 0 AL 1970 33098 72.341130 2.370 3450 \n",
"1 1 AL 1971 37598 69.304600 2.370 3497 \n",
"2 2 AL 1972 42719 67.149190 2.370 3539 \n",
"3 3 AL 1973 46203 63.217026 2.370 3580 \n",
"4 4 AL 1974 49769 56.933796 2.370 3627 \n",
"... ... ... ... ... ... ... ... \n",
"1703 1703 WY 1999 12423 0.319894 0.045 492 \n",
"1704 1704 WY 2000 12595 0.309491 0.045 494 \n",
"1705 1705 WY 2001 12808 0.300928 0.045 494 \n",
"1706 1706 WY 2002 13191 0.296244 0.045 499 \n",
"1707 1707 WY 2003 15535 0.289643 0.045 501 \n",
"\n",
" salestax \n",
"0 4.0 \n",
"1 4.0 \n",
"2 4.0 \n",
"3 4.0 \n",
"4 4.0 \n",
"... ... \n",
"1703 4.0 \n",
"1704 4.0 \n",
"1705 4.0 \n",
"1706 4.0 \n",
"1707 4.0 \n",
"\n",
"[1708 rows x 8 columns]"
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.reset_index()"
]
},
{
"cell_type": "markdown",
"id": "6cedb35c-56e5-4eaf-a8dd-52171fbf158b",
"metadata": {},
"source": [
"Finally, you can manually set the index equal to something else by simply reassigning the DataFrame/Series attribute."
]
},
{
"cell_type": "code",
"execution_count": 87,
"id": "ce4a9283-3a05-4e20-a9d2-7144912f12ab",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" st_name | \n",
" year | \n",
" c_beer | \n",
" beer_tax | \n",
" btax_dollars | \n",
" population | \n",
" salestax | \n",
"
\n",
" \n",
" \n",
" \n",
" 1707 | \n",
" AL | \n",
" 1970 | \n",
" 33098 | \n",
" 72.341130 | \n",
" 2.370 | \n",
" 3450 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1706 | \n",
" AL | \n",
" 1971 | \n",
" 37598 | \n",
" 69.304600 | \n",
" 2.370 | \n",
" 3497 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1705 | \n",
" AL | \n",
" 1972 | \n",
" 42719 | \n",
" 67.149190 | \n",
" 2.370 | \n",
" 3539 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1704 | \n",
" AL | \n",
" 1973 | \n",
" 46203 | \n",
" 63.217026 | \n",
" 2.370 | \n",
" 3580 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1703 | \n",
" AL | \n",
" 1974 | \n",
" 49769 | \n",
" 56.933796 | \n",
" 2.370 | \n",
" 3627 | \n",
" 4.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 4 | \n",
" WY | \n",
" 1999 | \n",
" 12423 | \n",
" 0.319894 | \n",
" 0.045 | \n",
" 492 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 3 | \n",
" WY | \n",
" 2000 | \n",
" 12595 | \n",
" 0.309491 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 2 | \n",
" WY | \n",
" 2001 | \n",
" 12808 | \n",
" 0.300928 | \n",
" 0.045 | \n",
" 494 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 1 | \n",
" WY | \n",
" 2002 | \n",
" 13191 | \n",
" 0.296244 | \n",
" 0.045 | \n",
" 499 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 0 | \n",
" WY | \n",
" 2003 | \n",
" 15535 | \n",
" 0.289643 | \n",
" 0.045 | \n",
" 501 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
1708 rows × 7 columns
\n",
"
"
],
"text/plain": [
" st_name year c_beer beer_tax btax_dollars population salestax\n",
"1707 AL 1970 33098 72.341130 2.370 3450 4.0\n",
"1706 AL 1971 37598 69.304600 2.370 3497 4.0\n",
"1705 AL 1972 42719 67.149190 2.370 3539 4.0\n",
"1704 AL 1973 46203 63.217026 2.370 3580 4.0\n",
"1703 AL 1974 49769 56.933796 2.370 3627 4.0\n",
"... ... ... ... ... ... ... ...\n",
"4 WY 1999 12423 0.319894 0.045 492 4.0\n",
"3 WY 2000 12595 0.309491 0.045 494 4.0\n",
"2 WY 2001 12808 0.300928 0.045 494 4.0\n",
"1 WY 2002 13191 0.296244 0.045 499 4.0\n",
"0 WY 2003 15535 0.289643 0.045 501 4.0\n",
"\n",
"[1708 rows x 7 columns]"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.index = np.arange(len(df)-1, -1, -1)\n",
"df"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:sklearn-env]",
"language": "python",
"name": "conda-env-sklearn-env-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}