Using GovTech WebAPIs

Carpark Availability Analysis

Problem Statement

Every Friday in Singapore, close to 260,000 Male Muslims 1 converge on the 75 mosques to perform their weekly obligatory Friday Prayers much like how Christians have their weekend services.

The prayers are usually performed during 1-2pm, and in some areas, this can result in congestion as Motorists vie for limited parking spots. This notebook will look at available carpark data of a particular location - Stirling Road - which is situated near the Mujahidin Mosque 2 to understand the congestion timings for preliminary understanding of the congestion.

Outline

In this example, we will look at the following:

  1. Creating functions in Python
  2. Using the Requests Library to call web APIs
  3. Reading/Writing data from/to CSVs using Pandas
  4. Plotting timeseries data in Pandas
  5. Visualising timeseries data using Seaborn and Matplotlib

Import Libraries

First, we import libraries needed for this analysis. Below is a table that explains why the libraries are required for our mini project.

Library Function
requests Query Data.gov.sg Realtime Carpark API
datetime Create Datetime objects to filter date
json convert request data into json object to apply data transformations
pandas Create DataFrames to store and transform data
numpy required library for Pandas
matplotlib For Data Visualisations
import requests
import datetime
import json

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

Get Parking Data

To get the parking data, we query the carpark availability from the data.gov.sg api as shown below. The api requires a payload where we input in a datetime string to get the data we want.

ROOT = 'https://api.data.gov.sg/v1'
URI = '/transport/carpark-availability'

now = datetime.datetime.now()
current_dt = now.isoformat() 
print(current_dt)
payload = {'date_time': current_dt}
r = requests.get(ROOT + URI, params=payload)
2019-08-06T00:00:11.741964

In the examples below, we will be largely looking at two key parking locations. You may want to obtain parking locations from here to get the carparks you want.

Q14 : BLK 49 STIRLING RD
Q15 : BLK 41/48 STIRLING RD
Q16 : BLK 45 AND 50 STIRLING RD
Q17 : BLK 52 STIRLING RD

Create function to get data

template = pd.DataFrame(columns=['total_lots','lot_type','lots_available','carpark_number','update_time','request_time'])
loaded_df = template
def get_cp_data():
    
    ## load file
    loads = pd.read_csv('data.csv')
    
    ROOT = 'https://api.data.gov.sg/v1'
    URI = '/transport/carpark-availability'

    now = datetime.datetime.now()
    current_dt = now.isoformat()
    
    payload = {'date_time': current_dt}
    r = requests.get(ROOT + URI, params=payload)
    
    filter_cp  = ['Q14', 'Q15', 'Q16', 'Q17']
    
    data = json.loads(r.text)['items'][0]['carpark_data']
    
    filtered_data = [carpark for carpark in data if carpark['carpark_number'] in filter_cp]
    
    template_dict = {
        'total_lots' : 9999,
        'lot_type' : 'NA',
        'lots_available' : 9999,
        'carpark_number' : 'NA',
        'update_time' : 'NA',
        'request_time' : 'NA'
    }
    
    for item in filtered_data:
        for info in item['carpark_info']:
            template_dict['total_lots'] = info['total_lots']
            template_dict['lot_type'] = info['lot_type']
            template_dict['lots_available'] = info['lots_available']
            template_dict['carpark_number'] = item['carpark_number']
            template_dict['update_time'] = item['update_datetime']
            template_dict['request_time'] = current_dt
            
            loads = loads.append(template_dict, ignore_index=True)
    
    loads.drop_duplicates(['total_lots','lot_type','lots_available','carpark_number','update_time'],inplace=True)
    loads.to_csv('data.csv',index=False)
    
    return loads
a = get_cp_data()

Get historical CP Data

def get_historical_cp_data(time):
    
    ## load file
    loads = pd.read_csv('parking-data.csv')
    
    ROOT = 'https://api.data.gov.sg/v1'
    URI = '/transport/carpark-availability'
    
    payload = {'date_time': time}
    r = requests.get(ROOT + URI, params=payload)
    
    filter_cp  = ['Q14', 'Q15', 'Q16', 'Q17']
    
    data = json.loads(r.text)['items'][0]['carpark_data']
    
    filtered_data = [carpark for carpark in data if carpark['carpark_number'] in filter_cp]
    
    template_dict = {
        'total_lots' : 9999,
        'lot_type' : 'NA',
        'lots_available' : 9999,
        'carpark_number' : 'NA',
        'update_time' : 'NA',
        'request_time' : 'NA'
    }
    
    for item in filtered_data:
        for info in item['carpark_info']:
            template_dict['total_lots'] = info['total_lots']
            template_dict['lot_type'] = info['lot_type']
            template_dict['lots_available'] = info['lots_available']
            template_dict['carpark_number'] = item['carpark_number']
            template_dict['update_time'] = item['update_datetime']
            template_dict['request_time'] = time
            
            loads = loads.append(template_dict, ignore_index=True)
    
    loads.drop_duplicates(['total_lots','lot_type','lots_available','carpark_number','update_time'],inplace=True)
    loads.to_csv('parking-data.csv',index=False)
    
    return loads

Using the get_historical_cp_data() function, it takes in a datetime object and saves the data into a csv file parking-data.csv. We next use a for loop with time_steps of 10 minute intervals using Python’s in-built range function to generate equally spaced intervals from 0 and 600minutes which is fed into the datetime.timedelta function with minutes argument.

now = datetime.datetime.now()
idx = 0 
for time_step in range(0,600,10):
    req_time_f = datetime.datetime.now() - datetime.timedelta(minutes=time_step)
    req_time = req_time_f.isoformat() 
    if idx % 20:
        print(f"{idx} : Getting data for {req_time}")
    get_historical_cp_data(req_time)
    idx += 1
1 : Getting data for 2019-07-26T19:41:37.599284
2 : Getting data for 2019-07-26T19:31:38.851572
3 : Getting data for 2019-07-26T19:21:40.103910
4 : Getting data for 2019-07-26T19:11:41.326232
5 : Getting data for 2019-07-26T19:01:42.631916
6 : Getting data for 2019-07-26T18:51:43.925971
7 : Getting data for 2019-07-26T18:41:45.144224
8 : Getting data for 2019-07-26T18:31:46.334172
9 : Getting data for 2019-07-26T18:21:47.747885

Reading in File

hist_data = pd.read_csv('parking-data.csv')

Visualise Data

First, we set request_time as the index of our dataframe. Then, we apply a filter such that we are only interested on the data between 9:15 am to 2.30pm.

hist_data.set_index('request_time',inplace=True)
hist_data.index = hist_data.index.astype('datetime64[ns]')
filter_time = hist_data.between_time('11:15','15:30').sort_values(by='request_time')
filter_time.head(2)

total_lots lot_type lots_available carpark_number update_time
request_time
2019-07-26 11:22:39.907290 0 H 0 Q16 2019-07-26T08:34:30
2019-07-26 11:22:39.907290 40 C 25 Q17 2019-07-26T08:34:35

We plot out two plots with two axis as we want to see how closely the update time and request time defers. To recap :

  • request_time is the input time that we request the API provides.
  • update_time is the time that the data in the API is recorded in data.gov.sg’s database.

We convert out update_time table which is in str format to the datetime format

filter_time['update_time'] = filter_time['update_time'].astype('datetime64[ns]')
filter_time['request_time'] = filter_time.index

Display the data:

filter_time.head()

total_lots lot_type lots_available carpark_number update_time request_time
request_time
2019-07-26 11:22:39.907290 0 H 0 Q16 2019-07-26 08:34:30 2019-07-26 11:22:39.907290
2019-07-26 11:22:39.907290 40 C 25 Q17 2019-07-26 08:34:35 2019-07-26 11:22:39.907290
2019-07-26 11:22:39.907290 42 Y 0 Q17 2019-07-26 08:34:35 2019-07-26 11:22:39.907290
2019-07-26 11:22:39.907290 155 Y 0 Q16 2019-07-26 08:34:30 2019-07-26 11:22:39.907290
2019-07-26 11:22:39.907290 40 C 15 Q16 2019-07-26 08:34:30 2019-07-26 11:22:39.907290
#Create a subplot with 2 rows and 1 column
fig, ax = plt.subplots(2,1,figsize=(15, 10))

#Create a mask to select only lot Q16 and type 'C' (car)
mask = (filter_time['2019-07-26']['carpark_number'] == 'Q16') & (filter_time['2019-07-26']['lot_type'] == 'C')

#Plot the first chart
ax[0].axvline(x=pd.to_datetime('2019-07-26 12:00'), color='r', linestyle='--',clip_on=False)
sns.lineplot(x='update_time',y='lots_available',data=filter_time[mask],ax=ax[0],label='lots_available')
sns.lineplot(x='update_time',y='total_lots',data=filter_time[mask],ax=ax[0],label='total_lots')

#Plot the second chart
ax[1].axvline(x=pd.to_datetime('2019-07-26 12:00'), color='r', linestyle='--',clip_on=False)
sns.lineplot(x='request_time',y='lots_available',data=filter_time[mask],ax=ax[1],label='lots_available')
sns.lineplot(x='request_time',y='total_lots',data=filter_time[mask],ax=ax[1],label='total_lots')

From the visualisations, if we look at how the two charts overlay on each other, there is a time lag between the update_time and request_time. This means that we generally have to be mindful that the request time and update time don’t give the same results. For example, when I request carpark availability at time 12 noon, i will get the data for approximately 9:30AM (first chart).

Also, if we look at update time, we see that at approximately 12:30pm, on the first chart, carpark availability drops until after 2pm where it picks up again- which is typically at the end of service and people head back to work.

Summary

In this walk through, we went through a brief yet practical demonstration of why and how to extract data from Data.gov.sg APIs. I hope this will be useful for you! Till next time.

References

  1. Singapore Malay Muslim Community in Figures Booklet 2015, Mendaki Singapore
  2. MUIS Mosque Directory - Mujahidin Mosque
Previous