{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "source": [
    "import pandas as pd"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "# Loading in and Looking at the Data"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Throughout this chapter, we will use `pandas` to explore a dataset on beer sales across 50 continental states in the US. \n",
    "Above, we have already imported in `pandas` as `pd`, a common nickname for `pandas` (similar to how `numpy` is often abbreviated to `np`).\n",
    "\n",
    "The dataset we're using is sourced from [_Salience and Taxation: Theory and Evidence_](https://www.aeaweb.org/articles?id=10.1257/aer.99.4.1145) by Chetty, Looney, and Kroft (AER 2009). The original data can be found at the paper's [replication package](https://www.openicpsr.org/openicpsr/project/113312/version/V1/view), and we have modified it for educational use. The dataset includes 7 columns:\n",
    "- `st_name`: the state abbreviation\n",
    "- `year`: the year the data was recorded\n",
    "- `c_beer`: the quantity of beer consumed, in thousands of gallons\n",
    "- `beer_tax`: the ad valorem tax, as a percentage\n",
    "- `btax_dollars`: the excise tax, represented in dollars per case (24 cans) of beer \n",
    "- `population`: the population of the state, in thousands\n",
    "- `salestax`: the sales tax percentage\n",
    "\n",
    "In this subchapter, we load in the dataset and learn about different ways of viewing the data. Let us begin by loading in the dataset. "
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Loading in the dataset\n",
    "\n",
    "Right now, we have the dataset stored as a [comma-separated values (csv) file](https://www.lifewire.com/csv-file-2622708), in the location `data/beer_tax.csv`. We can load it in using `panda`'s [`read_csv` method](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). Below, we load it in and set it equal to the variable `df` (short for DataFrame)."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "source": [
    "df = pd.read_csv('data/beer_tax.csv')\n",
    "df"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>st_name</th>\n",
       "      <th>year</th>\n",
       "      <th>c_beer</th>\n",
       "      <th>beer_tax</th>\n",
       "      <th>btax_dollars</th>\n",
       "      <th>population</th>\n",
       "      <th>salestax</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AL</td>\n",
       "      <td>1970</td>\n",
       "      <td>33098</td>\n",
       "      <td>72.341130</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3450</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AL</td>\n",
       "      <td>1971</td>\n",
       "      <td>37598</td>\n",
       "      <td>69.304600</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3497</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AL</td>\n",
       "      <td>1972</td>\n",
       "      <td>42719</td>\n",
       "      <td>67.149190</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3539</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>AL</td>\n",
       "      <td>1973</td>\n",
       "      <td>46203</td>\n",
       "      <td>63.217026</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3580</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>AL</td>\n",
       "      <td>1974</td>\n",
       "      <td>49769</td>\n",
       "      <td>56.933796</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3627</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1703</th>\n",
       "      <td>WY</td>\n",
       "      <td>1999</td>\n",
       "      <td>12423</td>\n",
       "      <td>0.319894</td>\n",
       "      <td>0.045</td>\n",
       "      <td>492</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1704</th>\n",
       "      <td>WY</td>\n",
       "      <td>2000</td>\n",
       "      <td>12595</td>\n",
       "      <td>0.309491</td>\n",
       "      <td>0.045</td>\n",
       "      <td>494</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1705</th>\n",
       "      <td>WY</td>\n",
       "      <td>2001</td>\n",
       "      <td>12808</td>\n",
       "      <td>0.300928</td>\n",
       "      <td>0.045</td>\n",
       "      <td>494</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1706</th>\n",
       "      <td>WY</td>\n",
       "      <td>2002</td>\n",
       "      <td>13191</td>\n",
       "      <td>0.296244</td>\n",
       "      <td>0.045</td>\n",
       "      <td>499</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1707</th>\n",
       "      <td>WY</td>\n",
       "      <td>2003</td>\n",
       "      <td>15535</td>\n",
       "      <td>0.289643</td>\n",
       "      <td>0.045</td>\n",
       "      <td>501</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1708 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     st_name  year  c_beer   beer_tax  btax_dollars  population  salestax\n",
       "0         AL  1970   33098  72.341130         2.370        3450       4.0\n",
       "1         AL  1971   37598  69.304600         2.370        3497       4.0\n",
       "2         AL  1972   42719  67.149190         2.370        3539       4.0\n",
       "3         AL  1973   46203  63.217026         2.370        3580       4.0\n",
       "4         AL  1974   49769  56.933796         2.370        3627       4.0\n",
       "...      ...   ...     ...        ...           ...         ...       ...\n",
       "1703      WY  1999   12423   0.319894         0.045         492       4.0\n",
       "1704      WY  2000   12595   0.309491         0.045         494       4.0\n",
       "1705      WY  2001   12808   0.300928         0.045         494       4.0\n",
       "1706      WY  2002   13191   0.296244         0.045         499       4.0\n",
       "1707      WY  2003   15535   0.289643         0.045         501       4.0\n",
       "\n",
       "[1708 rows x 7 columns]"
      ]
     },
     "metadata": {},
     "execution_count": 4
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Whenever we use a dataset, we ought to think about where it came from, how it was generated and what it represents. Doing so is absolutely crucial for good data science. Doing so will help us keep our eyes open for potential sources of bias in the data, avoiding unfortunate mistakes. Additionally, if we do not think about where we source our data from, we cannot truly consider the ethical implications of our analysis.\n",
    "\n",
    "Three important metrics you ought to consider with datasets are granularity, scope and temporality;\n",
    "- Granularity: the granularity of a dataset can best be summarized as what each individual row in the dataset represents. A dataset with finer granularity than another dataset has more detailed records. For example, a dataset where each row represents a residential neighborhood has coarser (the opposite of finer) granularity than a dataset where each row represents a house in a residential neighborhood.\n",
    "- Scope: the scope of the dataset is the subset of the population of interest that is covered by the data. For example, if you're interested in alcohol information for all US states from 1970 until the present, the beer dataset we're using has a scope of just beer data from 1970-2003.\n",
    "- Temporality: the temporality of the dataset describes the time element of the dataset. We encourage you to consider which time period the data is measuring, when the data was originally measured and when it was most recently updated. For example, our beer dataset spans from 1970-2003 and was last updated presumably sometime shortly before 2009 (when the paper was published), but we would need to read the paper and it's appendix to determine when the data was first measured.\n",
    "\n",
    "Finally, real world datasets are often messy. They have inaccuracies and mistakes. Whenever you obtain a dataset, the one of the first steps you ought to do is to see how 'faithful' the data is to the values it claims to describe. To do so, you should check for clearly incorrect values (e.g. a negative age value), possible typos from data entry, signs of data falsification (e.g. incorrect email addresses), unspecified or incorrect units, etc."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## DataFrames, Series and Indices"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "In `pandas`, all tables are stored as data structures called as `DataFrames`. Each DataFrame has an index (the bolded column on the left) and columns (called as `Series`). \n",
    "\n",
    "While the index in our DataFrame ranges from 0 to 1707, this does not have to be the case. In an index, indices don't have to be numbers and they don't have to be unique. For example, let us create a DataFrame `df2` with an index that contains neither numbers nor unique values. Instead, the index is simply `a` for the first 854 rows and `b` for the last 854 rows.\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "source": [
    "df2 = df.copy() # Making a copy of the DataFrame df\n",
    "df2.index = ['a' for _ in range(int(len(df)/2))] + ['b' for _ in range(int(len(df)/2))]\n",
    "df2 # The index is neither numerical nor unique"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>st_name</th>\n",
       "      <th>year</th>\n",
       "      <th>c_beer</th>\n",
       "      <th>beer_tax</th>\n",
       "      <th>btax_dollars</th>\n",
       "      <th>population</th>\n",
       "      <th>salestax</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1970</td>\n",
       "      <td>33098</td>\n",
       "      <td>72.341130</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3450</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1971</td>\n",
       "      <td>37598</td>\n",
       "      <td>69.304600</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3497</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1972</td>\n",
       "      <td>42719</td>\n",
       "      <td>67.149190</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3539</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1973</td>\n",
       "      <td>46203</td>\n",
       "      <td>63.217026</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3580</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1974</td>\n",
       "      <td>49769</td>\n",
       "      <td>56.933796</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3627</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>1999</td>\n",
       "      <td>12423</td>\n",
       "      <td>0.319894</td>\n",
       "      <td>0.045</td>\n",
       "      <td>492</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2000</td>\n",
       "      <td>12595</td>\n",
       "      <td>0.309491</td>\n",
       "      <td>0.045</td>\n",
       "      <td>494</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2001</td>\n",
       "      <td>12808</td>\n",
       "      <td>0.300928</td>\n",
       "      <td>0.045</td>\n",
       "      <td>494</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2002</td>\n",
       "      <td>13191</td>\n",
       "      <td>0.296244</td>\n",
       "      <td>0.045</td>\n",
       "      <td>499</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2003</td>\n",
       "      <td>15535</td>\n",
       "      <td>0.289643</td>\n",
       "      <td>0.045</td>\n",
       "      <td>501</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1708 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   st_name  year  c_beer   beer_tax  btax_dollars  population  salestax\n",
       "a       AL  1970   33098  72.341130         2.370        3450       4.0\n",
       "a       AL  1971   37598  69.304600         2.370        3497       4.0\n",
       "a       AL  1972   42719  67.149190         2.370        3539       4.0\n",
       "a       AL  1973   46203  63.217026         2.370        3580       4.0\n",
       "a       AL  1974   49769  56.933796         2.370        3627       4.0\n",
       "..     ...   ...     ...        ...           ...         ...       ...\n",
       "b       WY  1999   12423   0.319894         0.045         492       4.0\n",
       "b       WY  2000   12595   0.309491         0.045         494       4.0\n",
       "b       WY  2001   12808   0.300928         0.045         494       4.0\n",
       "b       WY  2002   13191   0.296244         0.045         499       4.0\n",
       "b       WY  2003   15535   0.289643         0.045         501       4.0\n",
       "\n",
       "[1708 rows x 7 columns]"
      ]
     },
     "metadata": {},
     "execution_count": 34
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "\n",
    "Each Series is simply a 1D ordering of data, similar to a NumPy array. For example, we pull out the Series of `btax_dollars` in `df` in the cell below."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "source": [
    "df['btax_dollars']"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "0       2.370\n",
       "1       2.370\n",
       "2       2.370\n",
       "3       2.370\n",
       "4       2.370\n",
       "        ...  \n",
       "1703    0.045\n",
       "1704    0.045\n",
       "1705    0.045\n",
       "1706    0.045\n",
       "1707    0.045\n",
       "Name: btax_dollars, Length: 1708, dtype: float64"
      ]
     },
     "metadata": {},
     "execution_count": 5
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "As you can see above, to pull out a singular Series from a DataFrame, we simply need to follow the notation `DataFrame_Name[Series_Name]`.\n",
    "\n",
    "However, if we're instead interested in pulling out multiple Series from our DataFrame, we're essentially interested in pulling out another, smaller DataFrame from the first DataFrame (since multiple Series = DataFrame). So, instead of passing in a single Series name into the notation `df[...]`, we instead pass in a list containing the names of the Series we're interested in. An example is below."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "source": [
    "df[['st_name','year','btax_dollars']]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>st_name</th>\n",
       "      <th>year</th>\n",
       "      <th>btax_dollars</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AL</td>\n",
       "      <td>1970</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AL</td>\n",
       "      <td>1971</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AL</td>\n",
       "      <td>1972</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>AL</td>\n",
       "      <td>1973</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>AL</td>\n",
       "      <td>1974</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1703</th>\n",
       "      <td>WY</td>\n",
       "      <td>1999</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1704</th>\n",
       "      <td>WY</td>\n",
       "      <td>2000</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1705</th>\n",
       "      <td>WY</td>\n",
       "      <td>2001</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1706</th>\n",
       "      <td>WY</td>\n",
       "      <td>2002</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1707</th>\n",
       "      <td>WY</td>\n",
       "      <td>2003</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1708 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     st_name  year  btax_dollars\n",
       "0         AL  1970         2.370\n",
       "1         AL  1971         2.370\n",
       "2         AL  1972         2.370\n",
       "3         AL  1973         2.370\n",
       "4         AL  1974         2.370\n",
       "...      ...   ...           ...\n",
       "1703      WY  1999         0.045\n",
       "1704      WY  2000         0.045\n",
       "1705      WY  2001         0.045\n",
       "1706      WY  2002         0.045\n",
       "1707      WY  2003         0.045\n",
       "\n",
       "[1708 rows x 3 columns]"
      ]
     },
     "metadata": {},
     "execution_count": 7
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Series Attributes"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "When we look at a Series, we get the index on the left and the corresponding values on the right. You can view the index via the attribute `index` and the values via the attribute `values`. Examples are shown below."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "source": [
    "df['btax_dollars'].index \n",
    "# How can you interpret the index here? Hint: look at the prerequiste chapter on NumPy."
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=1708, step=1)"
      ]
     },
     "metadata": {},
     "execution_count": 8
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "source": [
    "df['btax_dollars'].values"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "array([2.37 , 2.37 , 2.37 , ..., 0.045, 0.045, 0.045])"
      ]
     },
     "metadata": {},
     "execution_count": 9
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### DataFrame Attributes"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Three important DataFrame attributes are `index`, `columns`, and `shape`. `index` simply returns the index of the dataset, as shown below."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "source": [
    "df.index"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=1708, step=1)"
      ]
     },
     "metadata": {},
     "execution_count": 10
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "`columns` returns the name of the columns/Series that make up the dataset. Note that each column/Series name must be unique."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "source": [
    "df.columns"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "Index(['st_name', 'year', 'c_beer', 'beer_tax', 'btax_dollars', 'population',\n",
       "       'salestax'],\n",
       "      dtype='object')"
      ]
     },
     "metadata": {},
     "execution_count": 11
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "`shape` returns the dimensions, or the number of rows and the number of columns (in that order), of the dataset."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "source": [
    "df.shape"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "(1708, 7)"
      ]
     },
     "metadata": {},
     "execution_count": 12
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Moreover, `size` is another attributes that returns the number of rows times the number of columns, or the total number of datapoints, in the dataset."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "source": [
    "df.size"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "11956"
      ]
     },
     "metadata": {},
     "execution_count": 49
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Finally, `dtypes` returns the data type of each column in the DataFrame."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "source": [
    "df.dtypes"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "st_name          object\n",
       "year              int64\n",
       "c_beer            int64\n",
       "beer_tax        float64\n",
       "btax_dollars    float64\n",
       "population        int64\n",
       "salestax        float64\n",
       "dtype: object"
      ]
     },
     "metadata": {},
     "execution_count": 50
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Note: the data type for the `st_name` column is `object` even though the column contains strings. Click [here](https://stackoverflow.com/a/21020411) to learn why this happens."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Extracting Data using Labels and Indices"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "There are four primary methods of looking at extracting specific rows/columns from DataFrames. They are :-\n",
    "- `.head` and `.tail` \n",
    "- `.loc` \n",
    "- `.iloc`\n",
    "- `[]`"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 1. `.head` and `.tail`"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "If we just want to select the first or last n rows from a dataset, we can use `DataFrame_Name.head(n)` (for the first n rows) or `DataFrame_Name.tail(n)` (for the last n rows)."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "source": [
    "df.head(5) # Extract the first 5 rows"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>st_name</th>\n",
       "      <th>year</th>\n",
       "      <th>c_beer</th>\n",
       "      <th>beer_tax</th>\n",
       "      <th>btax_dollars</th>\n",
       "      <th>population</th>\n",
       "      <th>salestax</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AL</td>\n",
       "      <td>1970</td>\n",
       "      <td>33098</td>\n",
       "      <td>72.341130</td>\n",
       "      <td>2.37</td>\n",
       "      <td>3450</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AL</td>\n",
       "      <td>1971</td>\n",
       "      <td>37598</td>\n",
       "      <td>69.304600</td>\n",
       "      <td>2.37</td>\n",
       "      <td>3497</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AL</td>\n",
       "      <td>1972</td>\n",
       "      <td>42719</td>\n",
       "      <td>67.149190</td>\n",
       "      <td>2.37</td>\n",
       "      <td>3539</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>AL</td>\n",
       "      <td>1973</td>\n",
       "      <td>46203</td>\n",
       "      <td>63.217026</td>\n",
       "      <td>2.37</td>\n",
       "      <td>3580</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>AL</td>\n",
       "      <td>1974</td>\n",
       "      <td>49769</td>\n",
       "      <td>56.933796</td>\n",
       "      <td>2.37</td>\n",
       "      <td>3627</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  st_name  year  c_beer   beer_tax  btax_dollars  population  salestax\n",
       "0      AL  1970   33098  72.341130          2.37        3450       4.0\n",
       "1      AL  1971   37598  69.304600          2.37        3497       4.0\n",
       "2      AL  1972   42719  67.149190          2.37        3539       4.0\n",
       "3      AL  1973   46203  63.217026          2.37        3580       4.0\n",
       "4      AL  1974   49769  56.933796          2.37        3627       4.0"
      ]
     },
     "metadata": {},
     "execution_count": 13
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "source": [
    "df.tail(5) # Extract the last 5 rows"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>st_name</th>\n",
       "      <th>year</th>\n",
       "      <th>c_beer</th>\n",
       "      <th>beer_tax</th>\n",
       "      <th>btax_dollars</th>\n",
       "      <th>population</th>\n",
       "      <th>salestax</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1703</th>\n",
       "      <td>WY</td>\n",
       "      <td>1999</td>\n",
       "      <td>12423</td>\n",
       "      <td>0.319894</td>\n",
       "      <td>0.045</td>\n",
       "      <td>492</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1704</th>\n",
       "      <td>WY</td>\n",
       "      <td>2000</td>\n",
       "      <td>12595</td>\n",
       "      <td>0.309491</td>\n",
       "      <td>0.045</td>\n",
       "      <td>494</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1705</th>\n",
       "      <td>WY</td>\n",
       "      <td>2001</td>\n",
       "      <td>12808</td>\n",
       "      <td>0.300928</td>\n",
       "      <td>0.045</td>\n",
       "      <td>494</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1706</th>\n",
       "      <td>WY</td>\n",
       "      <td>2002</td>\n",
       "      <td>13191</td>\n",
       "      <td>0.296244</td>\n",
       "      <td>0.045</td>\n",
       "      <td>499</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1707</th>\n",
       "      <td>WY</td>\n",
       "      <td>2003</td>\n",
       "      <td>15535</td>\n",
       "      <td>0.289643</td>\n",
       "      <td>0.045</td>\n",
       "      <td>501</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     st_name  year  c_beer  beer_tax  btax_dollars  population  salestax\n",
       "1703      WY  1999   12423  0.319894         0.045         492       4.0\n",
       "1704      WY  2000   12595  0.309491         0.045         494       4.0\n",
       "1705      WY  2001   12808  0.300928         0.045         494       4.0\n",
       "1706      WY  2002   13191  0.296244         0.045         499       4.0\n",
       "1707      WY  2003   15535  0.289643         0.045         501       4.0"
      ]
     },
     "metadata": {},
     "execution_count": 14
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 2. `.loc`"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "To better illustrate the difference between `.loc` and `iloc`, we will use the DataFrame `df2`, as created earlier. We've reposted it here for ease of access."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "source": [
    "df2"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>st_name</th>\n",
       "      <th>year</th>\n",
       "      <th>c_beer</th>\n",
       "      <th>beer_tax</th>\n",
       "      <th>btax_dollars</th>\n",
       "      <th>population</th>\n",
       "      <th>salestax</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1970</td>\n",
       "      <td>33098</td>\n",
       "      <td>72.341130</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3450</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1971</td>\n",
       "      <td>37598</td>\n",
       "      <td>69.304600</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3497</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1972</td>\n",
       "      <td>42719</td>\n",
       "      <td>67.149190</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3539</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1973</td>\n",
       "      <td>46203</td>\n",
       "      <td>63.217026</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3580</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1974</td>\n",
       "      <td>49769</td>\n",
       "      <td>56.933796</td>\n",
       "      <td>2.370</td>\n",
       "      <td>3627</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>1999</td>\n",
       "      <td>12423</td>\n",
       "      <td>0.319894</td>\n",
       "      <td>0.045</td>\n",
       "      <td>492</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2000</td>\n",
       "      <td>12595</td>\n",
       "      <td>0.309491</td>\n",
       "      <td>0.045</td>\n",
       "      <td>494</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2001</td>\n",
       "      <td>12808</td>\n",
       "      <td>0.300928</td>\n",
       "      <td>0.045</td>\n",
       "      <td>494</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2002</td>\n",
       "      <td>13191</td>\n",
       "      <td>0.296244</td>\n",
       "      <td>0.045</td>\n",
       "      <td>499</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2003</td>\n",
       "      <td>15535</td>\n",
       "      <td>0.289643</td>\n",
       "      <td>0.045</td>\n",
       "      <td>501</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1708 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   st_name  year  c_beer   beer_tax  btax_dollars  population  salestax\n",
       "a       AL  1970   33098  72.341130         2.370        3450       4.0\n",
       "a       AL  1971   37598  69.304600         2.370        3497       4.0\n",
       "a       AL  1972   42719  67.149190         2.370        3539       4.0\n",
       "a       AL  1973   46203  63.217026         2.370        3580       4.0\n",
       "a       AL  1974   49769  56.933796         2.370        3627       4.0\n",
       "..     ...   ...     ...        ...           ...         ...       ...\n",
       "b       WY  1999   12423   0.319894         0.045         492       4.0\n",
       "b       WY  2000   12595   0.309491         0.045         494       4.0\n",
       "b       WY  2001   12808   0.300928         0.045         494       4.0\n",
       "b       WY  2002   13191   0.296244         0.045         499       4.0\n",
       "b       WY  2003   15535   0.289643         0.045         501       4.0\n",
       "\n",
       "[1708 rows x 7 columns]"
      ]
     },
     "metadata": {},
     "execution_count": 35
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "`.loc` helps us get values based on the index/column labels associated with them. For example, let's say we're interested in all the `btax_dollars` values associated with an index label of `a`. We can use `.loc` to get all such values, as shown below."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "source": [
    "df2.loc['a','btax_dollars'] "
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "a    2.370\n",
       "a    2.370\n",
       "a    2.370\n",
       "a    2.370\n",
       "a    2.370\n",
       "     ...  \n",
       "a    0.135\n",
       "a    0.135\n",
       "a    0.135\n",
       "a    0.135\n",
       "a    0.135\n",
       "Name: btax_dollars, Length: 854, dtype: float64"
      ]
     },
     "metadata": {},
     "execution_count": 36
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Above, we first passed in the row labels (or index labels) we're interested in, followed by the column labels we're interested in. If we're interested in multiple row/column labels, we must pass in a list of the labels we're interested in. An example is shown below."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "source": [
    "df2.loc['a',['st_name','year','btax_dollars']]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>st_name</th>\n",
       "      <th>year</th>\n",
       "      <th>btax_dollars</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1970</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1971</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1972</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1973</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1974</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>MO</td>\n",
       "      <td>1993</td>\n",
       "      <td>0.135</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>MO</td>\n",
       "      <td>1994</td>\n",
       "      <td>0.135</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>MO</td>\n",
       "      <td>1995</td>\n",
       "      <td>0.135</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>MO</td>\n",
       "      <td>1996</td>\n",
       "      <td>0.135</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>MO</td>\n",
       "      <td>1997</td>\n",
       "      <td>0.135</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>854 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   st_name  year  btax_dollars\n",
       "a       AL  1970         2.370\n",
       "a       AL  1971         2.370\n",
       "a       AL  1972         2.370\n",
       "a       AL  1973         2.370\n",
       "a       AL  1974         2.370\n",
       "..     ...   ...           ...\n",
       "a       MO  1993         0.135\n",
       "a       MO  1994         0.135\n",
       "a       MO  1995         0.135\n",
       "a       MO  1996         0.135\n",
       "a       MO  1997         0.135\n",
       "\n",
       "[854 rows x 3 columns]"
      ]
     },
     "metadata": {},
     "execution_count": 38
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "We can also use a slice (`:`) to specify that we're interested in all the labels."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "source": [
    "df2.loc[:,['st_name','year','btax_dollars']]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>st_name</th>\n",
       "      <th>year</th>\n",
       "      <th>btax_dollars</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1970</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1971</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1972</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1973</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1974</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>1999</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2000</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2001</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2002</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2003</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1708 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   st_name  year  btax_dollars\n",
       "a       AL  1970         2.370\n",
       "a       AL  1971         2.370\n",
       "a       AL  1972         2.370\n",
       "a       AL  1973         2.370\n",
       "a       AL  1974         2.370\n",
       "..     ...   ...           ...\n",
       "b       WY  1999         0.045\n",
       "b       WY  2000         0.045\n",
       "b       WY  2001         0.045\n",
       "b       WY  2002         0.045\n",
       "b       WY  2003         0.045\n",
       "\n",
       "[1708 rows x 3 columns]"
      ]
     },
     "metadata": {},
     "execution_count": 41
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 3. `.iloc`"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "`.iloc` works similarly to `.loc`, but it looks at the integers associated with each label rather than the label itself. For example, let's extract all the `btax_dollars` values associated with an index label of `a` using `.iloc`."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "source": [
    "df2.iloc[0:854,4]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "a    2.370\n",
       "a    2.370\n",
       "a    2.370\n",
       "a    2.370\n",
       "a    2.370\n",
       "     ...  \n",
       "a    0.135\n",
       "a    0.135\n",
       "a    0.135\n",
       "a    0.135\n",
       "a    0.135\n",
       "Name: btax_dollars, Length: 854, dtype: float64"
      ]
     },
     "metadata": {},
     "execution_count": 42
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Here, we extracted the same Series as with `.loc`, but needed to specify the index positions of the rows/columns we were interested in rather than their labels. We passed in `0:854` to indicate that we were interested in the first 854 rows, and `4` to indicate that we were interested in the fifth column.\n",
    "\n",
    "Similar to `.loc`, we can pass in single values, lists or slices to the arguments of `.iloc`. However, everything we pass in must be an integer (as we're looking at the index positions). Another example is below."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "source": [
    "df2.iloc[0:854,[0, 1, 4]]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>st_name</th>\n",
       "      <th>year</th>\n",
       "      <th>btax_dollars</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1970</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1971</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1972</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1973</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1974</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>MO</td>\n",
       "      <td>1993</td>\n",
       "      <td>0.135</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>MO</td>\n",
       "      <td>1994</td>\n",
       "      <td>0.135</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>MO</td>\n",
       "      <td>1995</td>\n",
       "      <td>0.135</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>MO</td>\n",
       "      <td>1996</td>\n",
       "      <td>0.135</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>MO</td>\n",
       "      <td>1997</td>\n",
       "      <td>0.135</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>854 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   st_name  year  btax_dollars\n",
       "a       AL  1970         2.370\n",
       "a       AL  1971         2.370\n",
       "a       AL  1972         2.370\n",
       "a       AL  1973         2.370\n",
       "a       AL  1974         2.370\n",
       "..     ...   ...           ...\n",
       "a       MO  1993         0.135\n",
       "a       MO  1994         0.135\n",
       "a       MO  1995         0.135\n",
       "a       MO  1996         0.135\n",
       "a       MO  1997         0.135\n",
       "\n",
       "[854 rows x 3 columns]"
      ]
     },
     "metadata": {},
     "execution_count": 43
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 4. `[]`"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "The `[]` is the hardest to describe, but the easiest to write and for many, the easiest to use. The `[]` simply follows the notation `DataFrame_Name[...]`, where `...` can be a single column label (similar to `.loc`), a list of column labels (similar to `.loc`) or a slice of row index positions (similar to `.iloc`). We've included examples for all three below. Do not worry if this seems confusing at first, feel free to use one of the first three methods instead."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### Single Column Label"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "source": [
    "df2['btax_dollars']"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "a    2.370\n",
       "a    2.370\n",
       "a    2.370\n",
       "a    2.370\n",
       "a    2.370\n",
       "     ...  \n",
       "b    0.045\n",
       "b    0.045\n",
       "b    0.045\n",
       "b    0.045\n",
       "b    0.045\n",
       "Name: btax_dollars, Length: 1708, dtype: float64"
      ]
     },
     "metadata": {},
     "execution_count": 46
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### List of Column Labels"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "source": [
    "df2[['st_name','year','btax_dollars']]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>st_name</th>\n",
       "      <th>year</th>\n",
       "      <th>btax_dollars</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1970</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1971</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1972</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1973</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1974</td>\n",
       "      <td>2.370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>1999</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2000</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2001</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2002</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>WY</td>\n",
       "      <td>2003</td>\n",
       "      <td>0.045</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1708 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   st_name  year  btax_dollars\n",
       "a       AL  1970         2.370\n",
       "a       AL  1971         2.370\n",
       "a       AL  1972         2.370\n",
       "a       AL  1973         2.370\n",
       "a       AL  1974         2.370\n",
       "..     ...   ...           ...\n",
       "b       WY  1999         0.045\n",
       "b       WY  2000         0.045\n",
       "b       WY  2001         0.045\n",
       "b       WY  2002         0.045\n",
       "b       WY  2003         0.045\n",
       "\n",
       "[1708 rows x 3 columns]"
      ]
     },
     "metadata": {},
     "execution_count": 47
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### Slice of Row Index Positions"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "source": [
    "df2[0:5]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>st_name</th>\n",
       "      <th>year</th>\n",
       "      <th>c_beer</th>\n",
       "      <th>beer_tax</th>\n",
       "      <th>btax_dollars</th>\n",
       "      <th>population</th>\n",
       "      <th>salestax</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1970</td>\n",
       "      <td>33098</td>\n",
       "      <td>72.341130</td>\n",
       "      <td>2.37</td>\n",
       "      <td>3450</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1971</td>\n",
       "      <td>37598</td>\n",
       "      <td>69.304600</td>\n",
       "      <td>2.37</td>\n",
       "      <td>3497</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1972</td>\n",
       "      <td>42719</td>\n",
       "      <td>67.149190</td>\n",
       "      <td>2.37</td>\n",
       "      <td>3539</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1973</td>\n",
       "      <td>46203</td>\n",
       "      <td>63.217026</td>\n",
       "      <td>2.37</td>\n",
       "      <td>3580</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>AL</td>\n",
       "      <td>1974</td>\n",
       "      <td>49769</td>\n",
       "      <td>56.933796</td>\n",
       "      <td>2.37</td>\n",
       "      <td>3627</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  st_name  year  c_beer   beer_tax  btax_dollars  population  salestax\n",
       "a      AL  1970   33098  72.341130          2.37        3450       4.0\n",
       "a      AL  1971   37598  69.304600          2.37        3497       4.0\n",
       "a      AL  1972   42719  67.149190          2.37        3539       4.0\n",
       "a      AL  1973   46203  63.217026          2.37        3580       4.0\n",
       "a      AL  1974   49769  56.933796          2.37        3627       4.0"
      ]
     },
     "metadata": {},
     "execution_count": 48
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "In the next subchapter, we will talk about conditional selection, which allows us to view all rows that satisfy a certain condition."
   ],
   "metadata": {}
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:sklearn-env]",
   "language": "python",
   "name": "conda-env-sklearn-env-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}