Build Different

Do Taxi Drivers Take the Fastest Route?

Kia Eisinga
Jun 18, 2019 • Last edit on Sep 20, 202215 min read
City traffic viewed from above the tall buildings.

Going the Distance with TomTom Maps APIs

Last December, I was in Porto, Portugal visiting a friend for her birthday. I took the cab from the airport to her house and on the way, the driver was struggling to find the correct route. We stopped and turned multiple times and he was smiling uncomfortably at me in the mirror. Meanwhile, the meter kept running. By the time I got to her apartment, it had hit the 40-euro mark.

It made me think about the all-too-familiar tale about taxi drivers taking unnecessary long routes in order to increase the fare. I used to think it was just a myth, as most taxi drivers use navigation and, I presume, make more money on pick-up fees and tips (in other words, short and frequent rides). So, I was set on finding the answer to this burning question: do taxi drivers take unnecessarily long routes?

Going the distance

Luckily, it was not hard to find a public dataset on taxi trips in the city of Porto and, working at TomTom, I was already aware of the public APIs the company offers. Combining the two and with the help of my colleague Sander, I was able to fit the pieces of the puzzle.

Let me show how we got to our results! Here are the steps we will go through:

  1. Getting open source data from Kaggle
  2. Setting up your TomTom API key
  3. Displaying an interactive TomTom map with Folium
  4. How to use the TomTom Routing API
  5. Taxi trip analysis
  6. Data visualization with Folium

We will be using a Python Jupyter Notebook for this exercise.

Step 1: Get open source data

The dataset we used was the Taxi Trajectory dataset from Kaggle (). It describes a complete year (from 01/07/2013 to 30/06/2014) of trajectories from 442 different taxis.

1# First we import a bunch of libraries
2import pandas as pd
3import numpy as np
4import folium # map visualisation package
5import requests # this we use for API calls
6import json
7import matplotlib.pyplot as plt
8import branca.colormap as cm
9from dateutil import tz
10import datetime
11import time
12from tqdm import tqdm
13
14# Then we load the data
15df = pd.read_csv("/set/path/to/train.csv")
16df.head()
TRIP_IDCALL_TYPEORIGIN_CALLORIGIN_STANDTAXI_IDTIMESTAMPDAY_TYPEMISSING_DATAPOLYLINE
01372636858620000589CNaNNaN200005891372636858AFalse[[-8.618643,41.141412],[-8.618499,41.141376],[...
11372637303620000596BNaN7.0200005961372637303AFalse[[-8.639847,41.159826],[-8.640351,41.159871],[...
21372636951620000320CNaNNaN200003201372636951AFalse[[-8.612964,41.140359],[-8.613378,41.14035],[-...
31372636951620000320CNaNNaN200003201372636951AFalse[[-8.612964,41.140359],[-8.613378,41.14035],[-...
31372636854620000520CNaNNaN200005201372636854AFalse[[-8.574678,41.151951],[-8.574705,41.151942],[...
41372637091620000337CNaNNaN200003371372637091AFalse[[-8.645994,41.18049],[-8.645949,41.180517],[-...
df.shape
(1710670, 9)

As you can see, there are almost two million taxi trips recorded in the dataset. This data enables us to say something about the overall behavior of taxi drivers in Porto.

Making an assessment: Each of these trips have a record of a corresponding polyline (trajectory). We can plot the trajectory on the map and, by using the TomTom Maps APIs, check if it corresponds to the fastest route

Step 2: Get TomTom API key

You can request your own API key on the TomTom Developer Portal, which gives you 2,500 free API calls a day. This is more than enough to get a good idea of whether taxi drivers are frequently taking detours or not.

To request an API key, you need to:

api_key = "insert your own API key here"

Step 3: Display interactive map with Folium

We use an open source library called folium () to display the TomTom map within our Jupyter notebook.

1# Initiate the map with the TomTom maps API
2def initialise_map(api_key=api_key, location=[41.161178, -8.648490], zoom=14, style = "main"):
3 """
4 The initialise_map function initialises a clean TomTom map
5 """
6 maps_url = "http://{s}.api.tomtom.com/map/1/tile/basic/"+style+"/{z}/{x}/{y}.png?tileSize=512&key="
7 TomTom_map = folium.Map(
8 location = location, # on what coordinates [lat, lon] we want to initialise our map
9 zoom_start = zoom, # with what zoom level we want to initialise our map, from 0 to 22
10 tiles = str(maps_url + api_key),
11 attr = 'TomTom')
12 return TomTom_map
13
14# Save map as TomTom_map
15TomTom_map = initialise_map()
16TomTom_map

TomTom Map initialized

To add a polyline to the map, use the following code:

1def polyline_to_list(polyline):
2 """
3 The polyline_to_list_lists function transforms the raw polyline to a list of tuples
4 input: '[[-8.639847,41.159826],[-8.640351,41.159871]'
5 output: [[41.159826, -8.639847],[41.159871, -8.640351]]
6 """
7 trip = json.loads(polyline) # json.loads converts the string to a list
8 coordinates_list = [list(reversed(coordinates)) for coordinates in trip]
9 # transform list (reverse values and put it in a list of lists)
10 return coordinates_list
11
12# Plot polyline on the map
13polyline = polyline_to_list(df['POLYLINE'][1])
14folium.PolyLine(polyline).add_to(TomTom_map)
15TomTom_map

TomTom Map route displayed

Step 4: How to use the TomTom Routing API

The first taxi trajectory in the dataset is now plotted on the map. Did this taxi driver take the fastest route? We can use the TomTom Routing API to find out.

Of course, traffic situations also influence the routes taken by taxi drivers. Fortunately, there is a way to account for that with the TomTom Routing API, by passing it a timestamp.

Let’s take a closer look at how it works. TomTom uses historic traffic to predict what the fastest route will be in the future. We do this by using temporal speed graphs called Speed Profiles.

Speed profiles

For each road segment, we have a graph that shows the distribution of average speed (kmph) throughout the day. Provided we pass the correct day and time (e.g. Monday 16:05) to the Routing API, it will take into account the correct historic traffic distribution when calculating a route.

Convert UNIX time to ISO date

In order to be able to pass it to the API, we have to convert our UNIX timestamp to the similar weekday in the future, as we can only call the Routing API for current or future routes. We use the following function:

1def convertUnixTimeToDate(timestamp):
2 """
3 The convertUnixTimeToDate function transforms a UNIX timestamp to a ISO861 dateTime format
4 for a date in the future
5 input: 1372636858
6 output: '2024-05-3T00:00:58Z'
7 """
8
9 # Portugal is in UTC+0 time zone, first get the right time zone:
10 UTC = tz.gettz('UTC')
11
12 # Then convert our timestamp to the right format:
13 timeTrip = datetime.datetime.fromtimestamp(timestamp,tz=UTC)
14 weekday = timeTrip.strftime("%A") # get day of the week
15 timeofday = timeTrip.strftime("%H:%M:%SZ") # get time of the day
16
17 # Some hardcoded weekday dates in the future. Not the most elegant solution but fast:
18 convertWeekdays = {
19 "Monday":"2024-05-3T",
20 "Tuesday":"2024-05-4T",
21 "Wednesday":"2024-05-5T",
22 "Thursday":"2024-05-6T",
23 "Friday":"2024-05-7T",
24 "Saturday":"2024-05-8T",
25 "Sunday":"2024-05-9T"}
26
27 routingTime = convertWeekdays[weekday] + timeofday
28
29 return routingTime

Dealing with noisy GPS traces

The polylines in the Kaggle dataset are bit noisy. Fortunately, the TomTom Routing API has a route reconstruction option to deal with noisy GPS traces. By supplying supporting points as input to the Routing API, it can reconstruct a route which is matched to the TomTom map.

You can see an example in the image below:

TomTom Map GPS traces

In the next section, we will supply the Routing API with supporting points to ensure it calculates the duration for the correct trace.

Now that we are ready, let's call the TomTom Routing API

We define a function that lets us call the Routing API. As input, you provide it with a polyline from the Kaggle dataset, the corresponding UNIX departure time, your personal TomTom API key and whether you want to compute the taxi route or the fastest route.

1def call_routing_api(polyline, departure_time, api_key=api_key, taxi_route=True):
2 """
3 Input is a polyline of a taxi route, a UNIX departure time, and whether to get the results for the taxi
4 route or fastest route
5 Output is the traffic delay in seconds, travel time of the route, route points from the Routing API and
6 the full response from the API
7 """
8
9 coordinates_list = polyline_to_list(polyline) # transform polyline to list of tuples
10
11 lat1, lon1 = coordinates_list[0] # origin coordinates of the trip
12 lat2, lon2 = coordinates_list[-1] # destination coordinates of the trip
13
14 # Set the URL for the Routing API
15 routing_url = "https://api.tomtom.com/routing/1/calculateRoute/"
16 url = str(routing_url + str(lat1) + ',' + str(lon1) + ':' + str(lat2) + ',' + str(lon2) +
17 "/json?maxAlternatives=0&departAt=" + convertUnixTimeToDate(departure_time) +
18 "&traffic=true&key=" + api_key)
19
20 # Add support points for the route reconstruction:
21 body = {"supportingPoints": []}
22
23 if taxi_route == True:
24 support_points = polyline_to_list(polyline) # use the whole polyline
25 for point in support_points:
26 body["supportingPoints"].append({"latitude": point[0],"longitude": point[1]})
27 else:
28 support_points = polyline_to_list(polyline)[-1] # use only the final coordinate
29 body["supportingPoints"].append({"latitude": support_points[0],"longitude": support_points[1]})
30
31 # Send the API call to TomTom:
32 n = 0
33 while True:
34 n+=1
35 try:
36 response = requests.post(url,json=body)
37
38 # Call was succesful"
39 if response.status_code == 200:
40 break
41
42 # Call broke QPS limit, sleep for one second:
43 elif response.status_code == 403:
44 time.sleep(1)
45 except:
46 print("error", str(response.status_code))
47
48 # Stop after 4 attempts:
49 if n > 4:
50 break
51 # Return None if the call was not succesful
52 if response.status_code == 200:
53 response = response.json()
54
55 delay = response['routes'][0]["summary"]['trafficDelayInSeconds']
56 travel_time = response['routes'][0]["summary"]['travelTimeInSeconds']
57 points = response['routes'][0]['legs'][0]['points']
58 route_points = [[point['latitude'], point['longitude']] for point in points]
59
60 return delay, travel_time, route_points, response
61 else:
62 return None, None, None, None

Step 5: Taxi trip analysis

The routing function is now ready to be used in our analysis. Let's start with checking the route we plotted earlier:

1# First we calculate the travel time and route for the original taxi trip:
2delay_taxi, travel_time_taxi, route_points_taxi, response_taxi = call_routing_api(df['POLYLINE'][1], df['TIMESTAMP'][1], taxi_route=True)
3
4print("The taxi route will take you:", travel_time_taxi, 'seconds')
5
6# Next we calculate the travel time and route for the fastest route:
7delay_fastest, travel_time_fastest, route_points_fastest, response_fastest = call_routing_api(df['POLYLINE'][1], df['TIMESTAMP'][1], taxi_route=False)
8
9print("The fastest route will take you:", travel_time_fastest, 'seconds')
10
11The taxi route will take you: 657 seconds
12The fastest route will take you: 657 seconds

The travel time is the same, let's also check the routes by plotting them on the map.

1# Initialise TomTom map
2TomTom_map = initialise_map(location=[41.164962,-8.656301], zoom=15)
3
4# Plot the points of the original route on the map
5polyline = polyline_to_list(df['POLYLINE'][1])
6folium.PolyLine(polyline, color="blue", weight=2, opacity=1).add_to(TomTom_map)
7
8# Plot the points of the original reconstructed route on the map
9folium.PolyLine(route_points_taxi, color="black", weight=2, opacity=1).add_to(TomTom_map)
10
11# Plot fastest route on the map
12folium.PolyLine(route_points_fastest, color="red", weight=2, opacity=1).add_to(TomTom_map)
13
14TomTom_map

TomTom Map route fastest

It seems like this taxi driver was honest and took the fastest route, hooray!

Time to scale up

The previous example shows us the difference in seconds between the fastest route and the route that was taken by the taxi driver. The lower the average number, the more honest our taxi drivers are.

To answer the question, we posed at the beginning of the article, we will use a random sample of 1200 taxi trips. This will give us a good idea of whether taxi drivers in Porto take the faster route or not.

1# retrieve 1200 random samples
2random_sample = df.sample(1200, random_state=123)
3random_sample = random_sample.reset_index().drop('index', axis=1) # reset index so we can iterate
4
5# initialise dictionary in which we will store our results
6results = {"Fastest_traveltime": [], "Taxi_traveltime": [],"Polyline" :[]}
7# For each polyline in random_sample, call the call_routing_api function twice, once to retrieve the travel time
8# for the fastest route and once for the travel time of the taxi route
9
10for i in tqdm( range(len(random_sample)) ):
11 if random_sample['POLYLINE'][i] != '[]': # check if polyline is not empty
12
13 # travel time fastest route
14 results['Fastest_traveltime'].append(
15 call_routing_api(random_sample['POLYLINE'][i], random_sample['TIMESTAMP'][i], taxi_route=False)[1])
16
17 # travel time taxi route
18 results['Taxi_traveltime'].append(
19 call_routing_api(random_sample['POLYLINE'][i], random_sample['TIMESTAMP'][i], taxi_route=True)[1])
20
21 # add departurePoint to results:
22 polyline = polyline_to_list(random_sample['POLYLINE'][i])
23 results['Polyline'].append(polyline)
24100.|██████████| 1200/1200 [13:34<00:00, 1.58it/s]
1# save results as pandas dataframe
2results = pd.DataFrame(results)
3
4# calculate the difference in minutes between the two routes
5results['Difference_min'] = (results['Taxi_traveltime'] - results['Fastest_traveltime']) / 60
6
7# calculate the relative difference between the two routes
8results['Relative_diff'] = (results['Taxi_traveltime'] - results['Fastest_traveltime']) / results['Fastest_traveltime']
9
10# keep only the trips that are long enough to make a proper comparison
11results = results[results['Fastest_traveltime'] > 60] # trips should take at least 1 minute
12
13# display dataframe
14results.head()
Fastest_traveltimeTaxi_traveltimePolylineDifference_minRelative_diff
010401351.0[[41.161815, -8.602632], [41.161914, -8.602533...5.1833330.299038
1470470.0[[41.161086, -8.604126], [41.161509, -8.603937...0.0000000.000000
2560679.0[[41.14602, -8.612442], [41.146452, -8.612208]...1.9833330.212500
3520555.0[[41.150637, -8.647785], [41.150727, -8.648802...0.5833330.067308
48721033.0[[41.162436, -8.644959], [41.162481, -8.644986...2.6833330.184633
print("Maximum relative difference is", round(max(results['Relative_diff']), 2))
Maximum relative difference is 11.74

A large number like this is of course not realistic. Apparently, some GPS traces are still too noisy causing these outliers. Let's filter them out:

# filter out outliers
results = results[results['Relative_diff'] < 2]

Final results

Now that we have our results, can we see what the mean difference in minutes is between the fastest route and the taxi route?

1print("Mean:", np.mean(results['Difference_min']))
2print("Standard deviation:", np.std(results['Difference_min']))
3Mean: 2.5056110102843316
4Standard deviation: 3.538457376554713

We can also plot the distribution of the difference in minutes and the relative difference, respectively.

1# Plot histogram
2difference_min = sorted(np.array(results['Difference_min']))
3fig = plt.figure(figsize=(15,8))
4plt.hist(difference_min, bins=15)
5plt.title("Difference in minutes with the fastest route")
6plt.xlabel("Minutes")
7plt.ylabel('Counts')

TomTom Map graph 1

1# Plot histogram
2difference_min = sorted(np.array(results['Relative_diff']))
3fig = plt.figure(figsize=(15,8))
4plt.hist(difference_min, bins=15)
5plt.title("Relative difference with fastest route")
6plt.xlabel("Relative difference")
7plt.ylabel('Counts')

TomTom Map graph 2

Another way to represent the data is by using percentiles.

1# Get some percentiles of the relative difference
2results['Relative_diff'].quantile([0.1, 0.23, 0.24, 0.5, 0.6, 0.75, 0.835, 0.9, 0.95, 0.98, 1])
30.100 0.000000
40.230 0.001009
50.240 0.004345
60.500 0.138331
70.600 0.199668
80.750 0.351191
90.835 0.491842
100.900 0.703618
110.950 0.941956
120.980 1.344909
131.000 1.970732
14Name: Relative_diff, dtype: float64

From this we can conclude that although most taxi drivers are taking the fastest - or a similar - route (around 23%), there are still quite a lot of trips (16.5%) where taxi drivers take more than 50% longer than the calculated fastest route.

Step 6: Using Folium to Visualize the Data

First, we make a linear color scale, where green is no delay and red is more than 50% delay:

1# Create a linear color scale
2linear_color = cm.LinearColormap(['green', 'yellow', 'red'], vmin=0, vmax=0.5)
3linear_color

green yellow red colors

We can plot taxi trip delays on the map.

The following plot will show the points where taxi trips started, with the corresponding delay they encountered along the way:

1# Plot the starting point on the map
2TomTom_map_bubble = initialise_map(api_key=api_key, location=[41.161178, -8.648490], zoom=13, style = "night")
3
4for index, row in results[:1000].iterrows(): # limit number of data points plotted to 1000
5
6 popup_string = "Relative delay = " + str(round(100* row["Relative_diff"], 1)) + "%"
7
8 folium.Circle(
9 location = row["Polyline"][0],
10 popup= popup_string,
11 radius=30,
12 color=linear_color(row["Relative_diff"]), #get_color(row["Relative_diff"]),
13 fill=True,
14 ).add_to(TomTom_map_bubble)
15
16TomTom_map_bubble

TomTom Map bubble

We can also make a (similar) plot that shows the taxi traces with their corresponding delay:

1# On this map we visualize all the polylines of the taxi trips using the same color scheme
2TomTom_map_lines = initialise_map(api_key=api_key, location=[41.161178, -8.648490], zoom=13, style = "night")
3for index, row in results[:500].iterrows(): # limit number of polylines plotted to 500
4
5 folium.PolyLine(row["Polyline"],
6 color=linear_color(row["Relative_diff"]),
7 weight=1.0,
8 opacity=1.5
9 ).add_to(TomTom_map_lines)
10
11TomTom_map_lines

TomTom Map lines

Route planning saves time and money

Overall, we can conclude that:

  1. Taxi trips would be 21% shorter if all taxis in Porto used TomTom navigation.
  2. In 23% of the cases, your taxi driver will take the fastest or a similar route.
  3. However, 40% of taxi trips will take more than 20% longer than the TomTom route.

Are taxi drivers taking a detour 40% of the time?

No. Taxi drivers are generally very knowledgeable about the cities they drive in and their experience allows them to outsmart traffic. While the analysis we’ve just performed allows us to draw conclusions, it is important to keep these factors in mind:

  • The Kaggle taxi traces are quite noisy, which means that the GPS points recorded may deviate from reality and will not always be matched to the correct street. This means there is a margin of error in the results, and not all detours seemingly taken may be real.
  • The taxi traces are dated: from 2013 and 2014. The analysis was done based on a map of 2019. As a result, there may be faster routes available today (e.g. due to new road infrastructure, new or improved traffic lights, new roundabouts, etc.) that did not exist at the time that the taxi rides took place.
  • The speed profiles used in this analysis are a simplified version of reality and do not include traffic incidents. At the time that the taxi rides took place, there may have been road blockages or severe traffic jams – none of which have been taken into account in the analysis, but which may have forced the taxi driver to take a detour nevertheless.

You can find the TomTom Maps APIs documentation at to see what else is available. Our APIs cover everything from geocoding to points of interest (restaurants, hospitals, etc.), to solve the needs of developers and their customers alike.

Sander Pluimers co-authored this article.

Get the developer
newsletter.

No marketing fuff. Tech content only.
Thanks for contacting us

We will reach out to you soon.
Blog cards
tomtom tech news