import pandas as pd
import requests
from urllib.parse import urlencode
APIs#
Application Programming Interfaces, commonly known as APIs, are tools that allow different softwares to communicate with each other. Organizations may choose to maintain APIs to provide easy access to their data, in the interest of research. As economists, we may be interested in using these APIs to access their data for further research. A famous example of such an API is the Fred API, which is maintained by the Federal Reserve Bank of St. Louis and allows us to easily retrieve Federal Reserve Data.
Ultimately, there are thousands of APIs and they all have different ways you can interact with them. It is impossible to learn how to interact with every API. Instead, the key skill you should focus on is learning how to read the documentation of APIs. Once you can successfully read and understand an API’s documentation, you will have no trouble working with that API. This chapter will use the Fred API as an example for how to read an APIs documentation. Let us get started. We will be referencing the relevant documentation a lot during this chapter, we recommend you also keep it open for easy reference.
Requests#
As you can see at the top of the documentation, in order to access the Fred API, we must ‘request’ URLs. Python’s requests
library is incredibly popular for making such requests. A very simple example of a request is shown below.
requests.get('https://www.google.com')
<Response [200]>
As you can see, this request returned a response with a status code of 200. You can read more about the various status codes here. In general, a status code in the 200s represents a success and a status code in the 400s or 500s represents a failure.
In order to make a successful request, you need an API key. We talk more about them in the next section.
Parameters#
This section will go through and describe all the parameters in the documentation.
API Key#
In order to make a request, you need an API key. An API key is a unique identifier used to connect to, or perform, an API call. It shows that you are using the software legitimately and are not violating the platforms terms and conditions. For the Fred API, you can easily request an API key here.
If someone else obtains your API key, they can effectively pretend to be you. They can misuse your API key, using it to violate the platforms terms and conditions, and get you banned or worse. Therefore, it is good practice to keep your API keys secret to yourself. In keeping with this practice, we shall not share the API key we used for this notebook, and you must request your own API key if you wish to replicate the results.
api_key = "..." # We have deliberately omitted our API key
File Type#
This parameter indicates which type of file to return as the output. For the Fred API, it is often easiest to work with JSON files as the output. JSON files are an example of semi-structured data, an alternative to structured csv’s we have been using so far. We will not delve into json’s too much for now, other than noting that pandas
has an easy to use .read_json()
method. The API notes that if we choose to return a txt
or xls
file, an application/zip file will be returned to compress the data. So, it is easiest to return a json
file and directly read it in.
Series ID#
The series ID lets the Fred API know which data you are interested in. The Fred API supports over 800,000 different series, you can view the list here.
Real-Time#
It is a very difficult job calculating metrics like a country’s real GDP. For example, read this article on difficulties people encounter when calculating real GDP. Often, economists review old data and revise it based on new findings or updated ways of calculating a metric. When this happens, the FRED keeps a record of the new data as well as old data and how long people believed in those numbers.
For example, the API call below extracts real-time data for the US real GDP in 1947 (series ID: GDPC1).
res = requests.get(f'https://api.stlouisfed.org/fred/series/observations?series_id=GDPC1&realtime_start=1776-07-04&realtime_end=9999-12-31&observation_start=1947-01-01&observation_end=1947-01-01&api_key={api_key}&file_type=json')
res.json()['observations']
[{'realtime_start': '1992-12-22',
'realtime_end': '1996-01-18',
'date': '1947-01-01',
'value': '1239.5'},
{'realtime_start': '1996-01-19',
'realtime_end': '1997-05-06',
'date': '1947-01-01',
'value': '.'},
{'realtime_start': '1997-05-07',
'realtime_end': '1999-10-27',
'date': '1947-01-01',
'value': '1402.5'},
{'realtime_start': '1999-10-28',
'realtime_end': '2000-04-26',
'date': '1947-01-01',
'value': '.'},
{'realtime_start': '2000-04-27',
'realtime_end': '2003-12-09',
'date': '1947-01-01',
'value': '1481.7'},
{'realtime_start': '2003-12-10',
'realtime_end': '2009-07-30',
'date': '1947-01-01',
'value': '1570.5'},
{'realtime_start': '2009-07-31',
'realtime_end': '2011-07-28',
'date': '1947-01-01',
'value': '1772.2'},
{'realtime_start': '2011-07-29',
'realtime_end': '2013-07-30',
'date': '1947-01-01',
'value': '1770.7'},
{'realtime_start': '2013-07-31',
'realtime_end': '2014-07-29',
'date': '1947-01-01',
'value': '1932.6'},
{'realtime_start': '2014-07-30',
'realtime_end': '2017-10-26',
'date': '1947-01-01',
'value': '1934.5'},
{'realtime_start': '2017-10-27',
'realtime_end': '2018-07-26',
'date': '1947-01-01',
'value': '1934.471'},
{'realtime_start': '2018-07-27',
'realtime_end': '2021-07-28',
'date': '1947-01-01',
'value': '2033.061'},
{'realtime_start': '2021-07-29',
'realtime_end': '2023-09-27',
'date': '1947-01-01',
'value': '2034.45'},
{'realtime_start': '2023-09-28',
'realtime_end': '9999-12-31',
'date': '1947-01-01',
'value': '2182.681'}]
Looking at the first entry, you can see that between 22nd December 1992 and 18th January 1996, economists believed the 1947 real GDP to be around 1239.5 billions of dollars. However, between 19th January 1996 and 6th May 1997, the old estimate was disproven but no new estimate was calculated. When the new estimate was calculated on 7th May 1997, it was 1402.5 billions of dollars. Updates were continually made in this fasion, with the most recent update being made in September 2023 and bringing the value up to 2182.681 billions of dollars!
The default value of both realtime_start
and realtime_end
is the current date, so you will only get the most up-to-date figures and estimates. However, if you wish, you can also access older data by setting realtime_start
and realtime_end
to span the time period whose data you are interested in.
Limit#
The limit is the maximum number of results you can return with the API call. As you can see, the Fred API does not allow you to return more than 100,000 results at once, out of fear of crashing the API. Abusing the limit rate of the API often gets you banned. This is another reason to keep your API key secret, you don’t want someone pretending to be you and then abusing the limit rate.
Offset#
Skips the first \(n\) data points, where \(n\) is a whole number. The default value is 0, or skipping no data points.
Sort Order#
Sorts by date in ascending or descending fashion.
Observation Period#
The period whose data you are interested in. While the real-time looks at when an estimate was made, the observation time looks at which year/quarter the estimate is being made for. For example, if you want the most recent/up-to-date data on US GDP over time, your observation period will be 1776 to present while your real-time period will be today. We have made this sample API call below and returned the first 5 rows, how does it compare to the sample API call in the real-time section?
res = requests.get(f'https://api.stlouisfed.org/fred/series/observations?series_id=GDPC1&observation_start=1776-07-04&observation_end=9999-12-31&api_key={api_key}&file_type=json')
res.json()['observations'][:5]
[{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1947-01-01',
'value': '2182.681'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1947-04-01',
'value': '2176.892'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1947-07-01',
'value': '2172.432'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1947-10-01',
'value': '2206.452'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1948-01-01',
'value': '2239.682'}]
Units#
The units to return your data in; examples include just the raw data, change in data from the previous year, percent change from the previous year, etc. Read the documentation to see all the available options.
Frequency#
How frequent should your returned data be? Do you want estimates from every year or every quarter? Read the documentation to see all the possible frequencies (but note that not every series supports all these frequencies).
Important Note: If a series is calculated every year, you cannot get quarterly estimates. However, if a series is calculated every quarter, you can get yearly estimates. To see how to specify how to calculate the yearly estimates, look at aggregation methods below. If you try to get an estimate from a frequency which is not possible with the given series data, your API call will error.
Aggregation Method#
If you are converting a series from higher frequency data (more frequent datapoints) to lower frequency data (more spaced out datapoints), how do you want to aggregate the results? For example, if inflation is calculated every month but you want annual inflation, do you want to take the average of all the months or just look at the last month? In general, there are 3 options: average all the lower frequency data points, sum all the lower frequency data points or just look at the last lowest frequency data point from the relevant period.
Vintage Dates (and Output Type)#
A form of real time data, read the documentation for more information.
Putting it all Together#
To put it all together, look at the sample requests in the documentation. You want to make a request to the a URL which begins with https://api.stlouisfed.org/fred/series/observations
and then includes all the specified parameters. For example, let us all possible extract data from the series for Estimated Percent of People of All Ages in Poverty for United States (series ID: PPAAUS00000A156NCEN).
urlencode()
#
To put all the parameters together easily, Python’s urllib
package has a great parse.urlencode()
method. This method takes in a dictionary with has the names of the parameters as the keys and their settings as the values. It outputs a string which encodes all the parameters. Let us look at an example below.
params = {"series_id": "PPAAUS00000A156NCEN", "api_key": api_key, "file_type": "json"}
params_string = urlencode(params)
# We do not show params_string here to prevent revealing our api_key
For our example, we only need to define the series ID, API key and file_type = json
as our parameters. We make a dictionary with those key-value pairs, pass it in to urlencode()
and voila, we have our string with all the parameters!
Now, we just need to combine this string with https://api.stlouisfed.org/fred/series/observations
to get the URL we must request!
url_request = 'https://api.stlouisfed.org/fred/series/observations' + '?' + params_string
# The ? symbol specifies that the parameters are beginning
Making the Request and Converting to DataFrame#
So, let’s make the request!
req = requests.get(url_request)
req
<Response [200]>
We got a response with a status code of 200, so it worked! As we returned a .json()
output, we can use req.json()
to access our output.
req.json()
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'observation_start': '1600-01-01',
'observation_end': '9999-12-31',
'units': 'lin',
'output_type': 1,
'file_type': 'json',
'order_by': 'observation_date',
'sort_order': 'asc',
'count': 34,
'offset': 0,
'limit': 100000,
'observations': [{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1989-01-01',
'value': '12.8'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1990-01-01',
'value': '.'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1991-01-01',
'value': '.'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1992-01-01',
'value': '.'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1993-01-01',
'value': '15.1'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1994-01-01',
'value': '.'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1995-01-01',
'value': '13.8'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1996-01-01',
'value': '13.7'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1997-01-01',
'value': '13.3'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1998-01-01',
'value': '12.7'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '1999-01-01',
'value': '11.9'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2000-01-01',
'value': '11.3'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2001-01-01',
'value': '11.7'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2002-01-01',
'value': '12.1'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2003-01-01',
'value': '12.5'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2004-01-01',
'value': '12.7'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2005-01-01',
'value': '13.3'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2006-01-01',
'value': '13.3'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2007-01-01',
'value': '13.0'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2008-01-01',
'value': '13.2'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2009-01-01',
'value': '14.3'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2010-01-01',
'value': '15.3'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2011-01-01',
'value': '15.9'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2012-01-01',
'value': '15.9'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2013-01-01',
'value': '15.8'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2014-01-01',
'value': '15.5'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2015-01-01',
'value': '14.7'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2016-01-01',
'value': '14'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2017-01-01',
'value': '13.4'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2018-01-01',
'value': '13.1'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2019-01-01',
'value': '12.3'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2020-01-01',
'value': '11.9'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2021-01-01',
'value': '12.8'},
{'realtime_start': '2024-01-08',
'realtime_end': '2024-01-08',
'date': '2022-01-01',
'value': '12.6'}]}
Looks like all the data we’re interested in is in the observations
key! Let’s read that key into a pandas
DataFrame.
pov_prcnt_df = pd.DataFrame(req.json()['observations'])
pov_prcnt_df.head()
realtime_start | realtime_end | date | value | |
---|---|---|---|---|
0 | 2024-01-08 | 2024-01-08 | 1989-01-01 | 12.8 |
1 | 2024-01-08 | 2024-01-08 | 1990-01-01 | . |
2 | 2024-01-08 | 2024-01-08 | 1991-01-01 | . |
3 | 2024-01-08 | 2024-01-08 | 1992-01-01 | . |
4 | 2024-01-08 | 2024-01-08 | 1993-01-01 | 15.1 |
Seems like we’re just interested in the date
and value
columns.
pov_prcnt_df[['date','value']]
date | value | |
---|---|---|
0 | 1989-01-01 | 12.8 |
1 | 1990-01-01 | . |
2 | 1991-01-01 | . |
3 | 1992-01-01 | . |
4 | 1993-01-01 | 15.1 |
5 | 1994-01-01 | . |
6 | 1995-01-01 | 13.8 |
7 | 1996-01-01 | 13.7 |
8 | 1997-01-01 | 13.3 |
9 | 1998-01-01 | 12.7 |
10 | 1999-01-01 | 11.9 |
11 | 2000-01-01 | 11.3 |
12 | 2001-01-01 | 11.7 |
13 | 2002-01-01 | 12.1 |
14 | 2003-01-01 | 12.5 |
15 | 2004-01-01 | 12.7 |
16 | 2005-01-01 | 13.3 |
17 | 2006-01-01 | 13.3 |
18 | 2007-01-01 | 13.0 |
19 | 2008-01-01 | 13.2 |
20 | 2009-01-01 | 14.3 |
21 | 2010-01-01 | 15.3 |
22 | 2011-01-01 | 15.9 |
23 | 2012-01-01 | 15.9 |
24 | 2013-01-01 | 15.8 |
25 | 2014-01-01 | 15.5 |
26 | 2015-01-01 | 14.7 |
27 | 2016-01-01 | 14 |
28 | 2017-01-01 | 13.4 |
29 | 2018-01-01 | 13.1 |
30 | 2019-01-01 | 12.3 |
31 | 2020-01-01 | 11.9 |
32 | 2021-01-01 | 12.8 |
33 | 2022-01-01 | 12.6 |
We’re finished! We have gotten all of the data from the relevant series.
Once again, this entire chapter was an exercise in reading the documentation of an API. You may come across many APIs in your career; reading their documentation (and perhaps StackOverflow) is the best way to learn how to work with them.