Using GovTech WebAPIs
Carpark Availability Analysis
Problem Statement
Every Friday in Singapore, close to 260,000 Male Muslims 1 converge on the 75 mosques to perform their weekly obligatory Friday Prayers much like how Christians have their weekend services.
The prayers are usually performed during 1-2pm, and in some areas, this can result in congestion as Motorists vie for limited parking spots. This notebook will look at available carpark data of a particular location - Stirling Road - which is situated near the Mujahidin Mosque 2 to understand the congestion timings for preliminary understanding of the congestion.
Outline
In this example, we will look at the following:
- Creating functions in
Python
- Using the
Requests
Library to call web APIs - Reading/Writing data from/to CSVs using
Pandas
- Plotting timeseries data in
Pandas
- Visualising timeseries data using
Seaborn
andMatplotlib
Import Libraries
First, we import libraries needed for this analysis. Below is a table that explains why the libraries are required for our mini project.
Library | Function |
---|---|
requests | Query Data.gov.sg Realtime Carpark API |
datetime | Create Datetime objects to filter date |
json | convert request data into json object to apply data transformations |
pandas | Create DataFrames to store and transform data |
numpy | required library for Pandas |
matplotlib | For Data Visualisations |
import requests
import datetime
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
Get Parking Data
To get the parking data, we query the carpark availability from the data.gov.sg api as shown below. The api requires a payload where we input in a datetime string to get the data we want.
ROOT = 'https://api.data.gov.sg/v1'
URI = '/transport/carpark-availability'
now = datetime.datetime.now()
current_dt = now.isoformat()
print(current_dt)
payload = {'date_time': current_dt}
r = requests.get(ROOT + URI, params=payload)
2019-08-06T00:00:11.741964
In the examples below, we will be largely looking at two key parking locations. You may want to obtain parking locations from here to get the carparks you want.
Q14 : BLK 49 STIRLING RD
Q15 : BLK 41/48 STIRLING RD
Q16 : BLK 45 AND 50 STIRLING RD
Q17 : BLK 52 STIRLING RD
Create function to get data
template = pd.DataFrame(columns=['total_lots','lot_type','lots_available','carpark_number','update_time','request_time'])
loaded_df = template
def get_cp_data():
## load file
loads = pd.read_csv('data.csv')
ROOT = 'https://api.data.gov.sg/v1'
URI = '/transport/carpark-availability'
now = datetime.datetime.now()
current_dt = now.isoformat()
payload = {'date_time': current_dt}
r = requests.get(ROOT + URI, params=payload)
filter_cp = ['Q14', 'Q15', 'Q16', 'Q17']
data = json.loads(r.text)['items'][0]['carpark_data']
filtered_data = [carpark for carpark in data if carpark['carpark_number'] in filter_cp]
template_dict = {
'total_lots' : 9999,
'lot_type' : 'NA',
'lots_available' : 9999,
'carpark_number' : 'NA',
'update_time' : 'NA',
'request_time' : 'NA'
}
for item in filtered_data:
for info in item['carpark_info']:
template_dict['total_lots'] = info['total_lots']
template_dict['lot_type'] = info['lot_type']
template_dict['lots_available'] = info['lots_available']
template_dict['carpark_number'] = item['carpark_number']
template_dict['update_time'] = item['update_datetime']
template_dict['request_time'] = current_dt
loads = loads.append(template_dict, ignore_index=True)
loads.drop_duplicates(['total_lots','lot_type','lots_available','carpark_number','update_time'],inplace=True)
loads.to_csv('data.csv',index=False)
return loads
a = get_cp_data()
Get historical CP Data
def get_historical_cp_data(time):
## load file
loads = pd.read_csv('parking-data.csv')
ROOT = 'https://api.data.gov.sg/v1'
URI = '/transport/carpark-availability'
payload = {'date_time': time}
r = requests.get(ROOT + URI, params=payload)
filter_cp = ['Q14', 'Q15', 'Q16', 'Q17']
data = json.loads(r.text)['items'][0]['carpark_data']
filtered_data = [carpark for carpark in data if carpark['carpark_number'] in filter_cp]
template_dict = {
'total_lots' : 9999,
'lot_type' : 'NA',
'lots_available' : 9999,
'carpark_number' : 'NA',
'update_time' : 'NA',
'request_time' : 'NA'
}
for item in filtered_data:
for info in item['carpark_info']:
template_dict['total_lots'] = info['total_lots']
template_dict['lot_type'] = info['lot_type']
template_dict['lots_available'] = info['lots_available']
template_dict['carpark_number'] = item['carpark_number']
template_dict['update_time'] = item['update_datetime']
template_dict['request_time'] = time
loads = loads.append(template_dict, ignore_index=True)
loads.drop_duplicates(['total_lots','lot_type','lots_available','carpark_number','update_time'],inplace=True)
loads.to_csv('parking-data.csv',index=False)
return loads
Using the get_historical_cp_data()
function, it takes in a datetime object and saves the data into a csv file parking-data.csv
. We next use a for loop with time_steps of 10 minute intervals using Python’s in-built range
function to generate equally spaced intervals from 0 and 600minutes which is fed into the datetime.timedelta
function with minutes
argument.
now = datetime.datetime.now()
idx = 0
for time_step in range(0,600,10):
req_time_f = datetime.datetime.now() - datetime.timedelta(minutes=time_step)
req_time = req_time_f.isoformat()
if idx % 20:
print(f"{idx} : Getting data for {req_time}")
get_historical_cp_data(req_time)
idx += 1
1 : Getting data for 2019-07-26T19:41:37.599284
2 : Getting data for 2019-07-26T19:31:38.851572
3 : Getting data for 2019-07-26T19:21:40.103910
4 : Getting data for 2019-07-26T19:11:41.326232
5 : Getting data for 2019-07-26T19:01:42.631916
6 : Getting data for 2019-07-26T18:51:43.925971
7 : Getting data for 2019-07-26T18:41:45.144224
8 : Getting data for 2019-07-26T18:31:46.334172
9 : Getting data for 2019-07-26T18:21:47.747885
Reading in File
hist_data = pd.read_csv('parking-data.csv')
Visualise Data
First, we set request_time
as the index of our dataframe. Then, we apply a filter such that we are only interested on the data between 9:15 am to 2.30pm.
hist_data.set_index('request_time',inplace=True)
hist_data.index = hist_data.index.astype('datetime64[ns]')
filter_time = hist_data.between_time('11:15','15:30').sort_values(by='request_time')
filter_time.head(2)
total_lots | lot_type | lots_available | carpark_number | update_time | |
---|---|---|---|---|---|
request_time | |||||
2019-07-26 11:22:39.907290 | 0 | H | 0 | Q16 | 2019-07-26T08:34:30 |
2019-07-26 11:22:39.907290 | 40 | C | 25 | Q17 | 2019-07-26T08:34:35 |
We plot out two plots with two axis as we want to see how closely the update time and request time defers. To recap :
request_time
is the input time that we request the API provides.update_time
is the time that the data in the API is recorded in data.gov.sg’s database.
We convert out update_time
table which is in str
format to the datetime
format
filter_time['update_time'] = filter_time['update_time'].astype('datetime64[ns]')
filter_time['request_time'] = filter_time.index
Display the data:
filter_time.head()
total_lots | lot_type | lots_available | carpark_number | update_time | request_time | |
---|---|---|---|---|---|---|
request_time | ||||||
2019-07-26 11:22:39.907290 | 0 | H | 0 | Q16 | 2019-07-26 08:34:30 | 2019-07-26 11:22:39.907290 |
2019-07-26 11:22:39.907290 | 40 | C | 25 | Q17 | 2019-07-26 08:34:35 | 2019-07-26 11:22:39.907290 |
2019-07-26 11:22:39.907290 | 42 | Y | 0 | Q17 | 2019-07-26 08:34:35 | 2019-07-26 11:22:39.907290 |
2019-07-26 11:22:39.907290 | 155 | Y | 0 | Q16 | 2019-07-26 08:34:30 | 2019-07-26 11:22:39.907290 |
2019-07-26 11:22:39.907290 | 40 | C | 15 | Q16 | 2019-07-26 08:34:30 | 2019-07-26 11:22:39.907290 |
#Create a subplot with 2 rows and 1 column
fig, ax = plt.subplots(2,1,figsize=(15, 10))
#Create a mask to select only lot Q16 and type 'C' (car)
mask = (filter_time['2019-07-26']['carpark_number'] == 'Q16') & (filter_time['2019-07-26']['lot_type'] == 'C')
#Plot the first chart
ax[0].axvline(x=pd.to_datetime('2019-07-26 12:00'), color='r', linestyle='--',clip_on=False)
sns.lineplot(x='update_time',y='lots_available',data=filter_time[mask],ax=ax[0],label='lots_available')
sns.lineplot(x='update_time',y='total_lots',data=filter_time[mask],ax=ax[0],label='total_lots')
#Plot the second chart
ax[1].axvline(x=pd.to_datetime('2019-07-26 12:00'), color='r', linestyle='--',clip_on=False)
sns.lineplot(x='request_time',y='lots_available',data=filter_time[mask],ax=ax[1],label='lots_available')
sns.lineplot(x='request_time',y='total_lots',data=filter_time[mask],ax=ax[1],label='total_lots')
From the visualisations, if we look at how the two charts overlay on each other, there is a time lag between the update_time
and request_time
. This means that we generally have to be mindful that the request time and update time don’t give the same results. For example, when I request carpark availability at time 12 noon, i will get the data for approximately 9:30AM (first chart).
Also, if we look at update time, we see that at approximately 12:30pm, on the first chart, carpark availability drops until after 2pm where it picks up again- which is typically at the end of service and people head back to work.
Summary
In this walk through, we went through a brief yet practical demonstration of why and how to extract data from Data.gov.sg APIs. I hope this will be useful for you! Till next time.