Using GovTech WebAPIs

Carpark Availability Analysis

Problem Statement

Every Friday in Singapore, close to 260,000 Male Muslims ¹ converge on the 75 mosques to perform their weekly obligatory Friday Prayers much like how Christians have their weekend services.

The prayers are usually performed during 1-2pm, and in some areas, this can result in congestion as Motorists vie for limited parking spots. This notebook will look at available carpark data of a particular location - Stirling Road - which is situated near the Mujahidin Mosque ² to understand the congestion timings for preliminary understanding of the congestion.

Outline

In this example, we will look at the following:

Creating functions in Python
Using the Requests Library to call web APIs
Reading/Writing data from/to CSVs using Pandas
Plotting timeseries data in Pandas
Visualising timeseries data using Seaborn and Matplotlib

Import Libraries

First, we import libraries needed for this analysis. Below is a table that explains why the libraries are required for our mini project.

Library	Function
requests	Query Data.gov.sg Realtime Carpark API
datetime	Create Datetime objects to filter date
json	convert request data into json object to apply data transformations
pandas	Create DataFrames to store and transform data
numpy	required library for Pandas
matplotlib	For Data Visualisations

import requests
import datetime
import json

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

Get Parking Data

To get the parking data, we query the carpark availability from the data.gov.sg api as shown below. The api requires a payload where we input in a datetime string to get the data we want.

ROOT = 'https://api.data.gov.sg/v1'
URI = '/transport/carpark-availability'

now = datetime.datetime.now()
current_dt = now.isoformat() 
print(current_dt)
payload = {'date_time': current_dt}
r = requests.get(ROOT + URI, params=payload)

2019-08-06T00:00:11.741964

In the examples below, we will be largely looking at two key parking locations. You may want to obtain parking locations from here to get the carparks you want.

Q14 : BLK 49 STIRLING RD
Q15 : BLK 41/48 STIRLING RD
Q16 : BLK 45 AND 50 STIRLING RD
Q17 : BLK 52 STIRLING RD

Create function to get data

template = pd.DataFrame(columns=['total_lots','lot_type','lots_available','carpark_number','update_time','request_time'])
loaded_df = template

def get_cp_data():
    
    ## load file
    loads = pd.read_csv('data.csv')
    
    ROOT = 'https://api.data.gov.sg/v1'
    URI = '/transport/carpark-availability'

    now = datetime.datetime.now()
    current_dt = now.isoformat()
    
    payload = {'date_time': current_dt}
    r = requests.get(ROOT + URI, params=payload)
    
    filter_cp  = ['Q14', 'Q15', 'Q16', 'Q17']
    
    data = json.loads(r.text)['items'][0]['carpark_data']
    
    filtered_data = [carpark for carpark in data if carpark['carpark_number'] in filter_cp]
    
    template_dict = {
        'total_lots' : 9999,
        'lot_type' : 'NA',
        'lots_available' : 9999,
        'carpark_number' : 'NA',
        'update_time' : 'NA',
        'request_time' : 'NA'
    }
    
    for item in filtered_data:
        for info in item['carpark_info']:
            template_dict['total_lots'] = info['total_lots']
            template_dict['lot_type'] = info['lot_type']
            template_dict['lots_available'] = info['lots_available']
            template_dict['carpark_number'] = item['carpark_number']
            template_dict['update_time'] = item['update_datetime']
            template_dict['request_time'] = current_dt
            
            loads = loads.append(template_dict, ignore_index=True)
    
    loads.drop_duplicates(['total_lots','lot_type','lots_available','carpark_number','update_time'],inplace=True)
    loads.to_csv('data.csv',index=False)
    
    return loads

a = get_cp_data()

Get historical CP Data

def get_historical_cp_data(time):
    
    ## load file
    loads = pd.read_csv('parking-data.csv')
    
    ROOT = 'https://api.data.gov.sg/v1'
    URI = '/transport/carpark-availability'
    
    payload = {'date_time': time}
    r = requests.get(ROOT + URI, params=payload)
    
    filter_cp  = ['Q14', 'Q15', 'Q16', 'Q17']
    
    data = json.loads(r.text)['items'][0]['carpark_data']
    
    filtered_data = [carpark for carpark in data if carpark['carpark_number'] in filter_cp]
    
    template_dict = {
        'total_lots' : 9999,
        'lot_type' : 'NA',
        'lots_available' : 9999,
        'carpark_number' : 'NA',
        'update_time' : 'NA',
        'request_time' : 'NA'
    }
    
    for item in filtered_data:
        for info in item['carpark_info']:
            template_dict['total_lots'] = info['total_lots']
            template_dict['lot_type'] = info['lot_type']
            template_dict['lots_available'] = info['lots_available']
            template_dict['carpark_number'] = item['carpark_number']
            template_dict['update_time'] = item['update_datetime']
            template_dict['request_time'] = time
            
            loads = loads.append(template_dict, ignore_index=True)
    
    loads.drop_duplicates(['total_lots','lot_type','lots_available','carpark_number','update_time'],inplace=True)
    loads.to_csv('parking-data.csv',index=False)
    
    return loads

Using the get_historical_cp_data() function, it takes in a datetime object and saves the data into a csv file parking-data.csv. We next use a for loop with time_steps of 10 minute intervals using Python’s in-built range function to generate equally spaced intervals from 0 and 600minutes which is fed into the datetime.timedelta function with minutes argument.

now = datetime.datetime.now()
idx = 0 
for time_step in range(0,600,10):
    req_time_f = datetime.datetime.now() - datetime.timedelta(minutes=time_step)
    req_time = req_time_f.isoformat() 
    if idx % 20:
        print(f"{idx} : Getting data for {req_time}")
    get_historical_cp_data(req_time)
    idx += 1

1 : Getting data for 2019-07-26T19:41:37.599284
2 : Getting data for 2019-07-26T19:31:38.851572
3 : Getting data for 2019-07-26T19:21:40.103910
4 : Getting data for 2019-07-26T19:11:41.326232
5 : Getting data for 2019-07-26T19:01:42.631916
6 : Getting data for 2019-07-26T18:51:43.925971
7 : Getting data for 2019-07-26T18:41:45.144224
8 : Getting data for 2019-07-26T18:31:46.334172
9 : Getting data for 2019-07-26T18:21:47.747885

Reading in File

hist_data = pd.read_csv('parking-data.csv')

Visualise Data

First, we set request_time as the index of our dataframe. Then, we apply a filter such that we are only interested on the data between 9:15 am to 2.30pm.

hist_data.set_index('request_time',inplace=True)

hist_data.index = hist_data.index.astype('datetime64[ns]')

filter_time = hist_data.between_time('11:15','15:30').sort_values(by='request_time')

filter_time.head(2)

	total_lots	lot_type	lots_available	carpark_number	update_time
request_time
2019-07-26 11:22:39.907290	0	H	0	Q16	2019-07-26T08:34:30
2019-07-26 11:22:39.907290	40	C	25	Q17	2019-07-26T08:34:35

We plot out two plots with two axis as we want to see how closely the update time and request time defers. To recap :

request_time is the input time that we request the API provides.
update_time is the time that the data in the API is recorded in data.gov.sg’s database.

We convert out update_time table which is in str format to the datetime format

filter_time['update_time'] = filter_time['update_time'].astype('datetime64[ns]')
filter_time['request_time'] = filter_time.index

Display the data:

filter_time.head()

	total_lots	lot_type	lots_available	carpark_number	update_time	request_time
request_time
2019-07-26 11:22:39.907290	0	H	0	Q16	2019-07-26 08:34:30	2019-07-26 11:22:39.907290
2019-07-26 11:22:39.907290	40	C	25	Q17	2019-07-26 08:34:35	2019-07-26 11:22:39.907290
2019-07-26 11:22:39.907290	42	Y	0	Q17	2019-07-26 08:34:35	2019-07-26 11:22:39.907290
2019-07-26 11:22:39.907290	155	Y	0	Q16	2019-07-26 08:34:30	2019-07-26 11:22:39.907290
2019-07-26 11:22:39.907290	40	C	15	Q16	2019-07-26 08:34:30	2019-07-26 11:22:39.907290

#Create a subplot with 2 rows and 1 column
fig, ax = plt.subplots(2,1,figsize=(15, 10))

#Create a mask to select only lot Q16 and type 'C' (car)
mask = (filter_time['2019-07-26']['carpark_number'] == 'Q16') & (filter_time['2019-07-26']['lot_type'] == 'C')

#Plot the first chart
ax[0].axvline(x=pd.to_datetime('2019-07-26 12:00'), color='r', linestyle='--',clip_on=False)
sns.lineplot(x='update_time',y='lots_available',data=filter_time[mask],ax=ax[0],label='lots_available')
sns.lineplot(x='update_time',y='total_lots',data=filter_time[mask],ax=ax[0],label='total_lots')

#Plot the second chart
ax[1].axvline(x=pd.to_datetime('2019-07-26 12:00'), color='r', linestyle='--',clip_on=False)
sns.lineplot(x='request_time',y='lots_available',data=filter_time[mask],ax=ax[1],label='lots_available')
sns.lineplot(x='request_time',y='total_lots',data=filter_time[mask],ax=ax[1],label='total_lots')

From the visualisations, if we look at how the two charts overlay on each other, there is a time lag between the update_time and request_time. This means that we generally have to be mindful that the request time and update time don’t give the same results. For example, when I request carpark availability at time 12 noon, i will get the data for approximately 9:30AM (first chart).

Also, if we look at update time, we see that at approximately 12:30pm, on the first chart, carpark availability drops until after 2pm where it picks up again- which is typically at the end of service and people head back to work.

Summary

In this walk through, we went through a brief yet practical demonstration of why and how to extract data from Data.gov.sg APIs. I hope this will be useful for you! Till next time.

References

Singapore Malay Muslim Community in Figures Booklet 2015, Mendaki Singapore

MUIS Mosque Directory - Mujahidin Mosque

Last updated on Aug 5, 2019

Edit this page