Where to Wait for a Taxi?
Xudong Zheng
State Key Laboratory of
Software Development
Environment
Beihang University
Beijing, China
zhengxudong@nlsde.
buaa.edu.cn
Xiao Liang
State Key Laboratory of
Software Development
Environment
Beihang University
Beijing, China
liangxiao@nlsde.
buaa.edu.cn
Ke Xu
�
State Key Laboratory of
Software Development
Environment
Beihang University
Beijing, China
kexu@nlsde.buaa.edu.cn
ABSTRACT
People often have the demand to decide where to wait for a
taxi in order to save their time. In this paper, to address this
problem, we employ the non-homogeneous Poisson process
(NHPP) to model the behavior of vacant taxis. According to
the statistics of the parking time of vacant taxis on the roads
and the number of the vacant taxis leaving the roads in his-
tory, we can estimate the waiting time at di�erent times on
road segments. We also propose an approach to make recom-
mendations for potential passengers on where to wait for a
taxi based on our estimated waiting time. Then we evaluate
our approach through the experiments on simulated passen-
gers and actual trajectories of 12,000 taxis in Beijing. The
results show that our estimation is relatively accurate and
could be regarded as a reliable upper bound of the waiting
time in probability. And our recommendation is a trade-
o� between the waiting time and walking distance, which
would bring practical assistance to potential passengers. In
addition, we develop a mobile application TaxiWaiter on
Android OS to help the users wait for taxis based on our
approach and historical data.
Categories and Subject Descriptors
H.2.8 [Database Management]: data mining, spatial data-
bases and GIS
General Terms
Modeling, Statistics, Experimentation
Keywords
Vacant taxi, waiting time, poisson distribution
1. INTRODUCTION
Taxis play an important role in the transportation of cities.
For example, Beijing has more than 60,000 taxis to provide
�Corresponding author.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
UrbComp ’12, August 12, 2012. Beijing, China.
Copyright 2012 ACM 978-1-4503-1542-5/08/2012 ...$15.00.
services for about 2,000,000 passengers every day. Howev-
er, there are still many people often annoyed with waiting
for taxis. It is not only because of the imbalance between
supply and demand, but also due to the lack of the vacan-
t taxis information provided to passengers. For example,
if a person does not know that there is another road near-
by with more vacant taxis passing by, he/she would spend
much more time on waiting for a taxi in current location.
Experienced passengers could choose a better road to wait
for taxis based on historical experiences. But more people
have little knowledge about like how long they would take
to wait for taxis here or where is better to wait for taxis,
especially in some strange places for them, which may a�ect
the their travels and schedules very much.
In this paper, we propose a method to estimate the waiting
time for a vacant taxi at a given time and place, and then
provide an approach to make recommendations for potential
passengers on where to wait for a taxi. To make this estima-
tion, we establish a model to describe the behavior of vacant
taxis. Our model is based on the following observations:
� The higher proportion of time with vacant taxis parking
beside the road, the more chances you can take a taxi
immediately here. This situation usually occurs during
the idle time around some popular places.
� The more vacant taxis leaving a road, the more chances
you can take a taxi quickly. The waiting time is a�ected
not only by the number of vacant taxis entering a road,
but also by how many people want to take taxis here at
that time. Because we do not have the data to directly
show the demand for taxis, we think the number of vacant
taxis leaving a road approximately reveals the remaining
chance for a passenger to take a taxi here after the demand
on the road is all met.
Motivated by the two observations above, we adopt the non-
homo-geneous Poisson process (NHPP) [11] to model the
events of vacant taxis' leaving and derive the probability
distribution of the waiting time. Then we could perform es-
timations and recommendations based on the distribution.
We also do some experiments to demonstrate that our ap-
proach is practicable and then develop a mobile application
to help people wait for taxis.
Our study is built upon the GPS trajectories of taxis in Bei-
jing, China. This data is collected from more than 12,000
taxis, which account for about one-�fth of total ones in the
city. We select the data between Oct. 2010 and Jan. 2011
to study. Each GPS record contains the identi�er of a taxi,
current position, timestamp, service status, and some oth-
er information. The data sampling interval of each taxi is
about 60 seconds.
The major contributions of our work include:
� We employ the NHPP to model the behavior of vacant
taxis, which could approximate the real situation well and
have a simple form to derive the probability distribution
of waiting time.
� We estimate the waiting time for vacant taxis at di�erent
time on road segments, analyze the con�dence of our esti-
mation, and design a recommender system for the people
who want to take taxis.
� We conduct a lot of experiments to evaluate our approach
on simulated passengers and actual trajectories of taxis.
The results show that our estimation is relatively accurate
and our recommendation would be helpful to potential
passengers.
� We put our approach into practice by developing a mobile
application which could help the user �nd an appropriate
place to wait for a taxi.
The rest of this paper is organized as follows. In Section
2, we give an overview of the related work. Section 3 in-
troduces our model used to estimate the waiting time for a
vacant taxi. Section 4 describes the data processing of our
work. In Section 5, we analyze the results of our estimat-
ed waiting time. Then, we discuss the recommendation for
passengers in Section 6. Section 7 shows the experiments
and evaluations on our approach. Section 8 introduces an
application we developed to help people wait for taxis. Fi-
nally, we make a conclusion and propose some future work
in Section 9.
2. RELATEDWORK
2.1 Recommendations about Taxicabs
Recent years have witnessed the explosive research interest
on taxi trajectories [1, 6, 7, 14, 18]. Moreover, many works
have also been done to investigate the recommendations for
taxi passengers or drivers [2, 5, 9, 13, 17].
Phithakkitnukoon et al. [9] study the prediction of vacan-
t taxis number to provide the information for tourists or
taxi service providers. They employ the method based on
the na�ve Bayesian classi�er and obtain the prior probability
distribution from the historical data. However their method
divides the region of the city into one-kilometer square grid-
s which are too rough to provide practical information for
passengers. In addition, their data is only from the traces
of 150 taxis in Lisbon, which might not be enough to re-
veal laws of vacant taxis for the reason of weak statistical
signi�cance.
Ge et al. [2] develop a recommender system for taxi drivers
which has the ability in recommending a sequence of pick-up
points or parking positions so as to maximize a taxi driver's
pro�t. They estimate the probability of pick-up events for
each candidate point, and then propose an algorithm to dis-
cover a route with minimal potential travel distance before
having customer. Li et al. [5] study the strategies for taxi
drivers as well. They use L1-Norm SVM to discover the most
discriminative features to distinguish the performance of the
taxis, and then extract some driving patterns to improve the
performance of the taxis. However, all these studies do not
concern about the recommendation for passengers.
Yuan et al. [17] propose an approach to make recommenda-
tions for both taxi drivers and passengers. They establish
a probabilistic model to describe the probability to pick-up
passengers, the duration before the next trip, and the dis-
tance of the next trip for a vacant taxi. Then they provide
some di�erent strategies for taxi drivers, each of which is
based on the optimization of one aspect (probability of pick-
up, cruising time, or pro�t). They further extend their work
in [16]. Although their methods could also provide recom-
mendations for passengers, their research mainly focuses on
drivers. Comparing with them, our research stands on the
view of passengers and pays more attentions to estimating
the waiting time for vacant taxis.
Yang et al. [13] study the equilibrium of taxi market from
the standpoint of economics. They use a bilateral searching
and meeting function to characterize the search frictions be-
tween vacant taxis and unserved customers. They build a
model to describe the relationship between the supply and
demand for taxis, and analyze some in
uencing factors on
customer waiting time. But in our study, because of lack-
ing of explicit data for the demand of taxis, we actually
estimate the gap between supply and demand through the
number of vacant taxis. Moreover, the object of their study
is an aggregate taxi market, but our target is to estimate the
waiting time for a vacant taxi at a given time and position
in a microscopic view.
2.2 Map Matching
Map matching is a main step of data preprocessing in our
work. It refers to the process of mapping the GPS points to
the road segments to recover a complete path of a trajecto-
ry. Quddus makes a survey of map matching algorithms in
[10], including geometric, topological, probabilistic, and oth-
er advanced algorithms. He also discusses the performances
and limitations of them. Lou et al. [8] propose a new algo-
rithm for low-sampling-rate GPS data, which considers the
temporal and spatial constraints on the trajectories, then
constructs a weighted candidate graph to choose the most
appropriate path. Yuan et al. further improve Lou's method
in [15] later.
2.3 Non-homogeneous Poisson Process
Poisson process is a stochastic process that is often used to
study the occurrence of events. It assumes the arriving rate
of events � is always stable, and has the Poisson distribu-
tion of counting and exponential distribution of inter-event
time. Non-homogeneous Poisson process [11] is a Poisson
process with a time-dependent arriving rate function �(t).
This model is more
exible and appropriate to depict the
human-related activities because these activities often vary
over time and have periodicity. [3] studies the NHPP having
cyclic behavior, and [4] introduces a method to estimate the
�(t) in NHPP using a piecewise linear function.
3. MODEL
Here we propose a model to describe the waiting time for
a vacant taxi, and derive the probability distribution of it.
Then we estimate the waiting time using the expectation of
the distribution. Finally, we analyze the con�dence level of
our estimation.
3.1 Motivation
The time to wait for a taxi re
ects the availability of taxis on
a road. Waiting for a taxi on a road could be divided into the
two situations: 1) there are some vacant taxis just stopping
beside the road, then you could take the taxi immediately;
2) there are no vacant taxis at hand, you should wait for the
coming of next vacant taxi.
We could denote the probability of the �rst situation as
pimm, and the waiting time in the second situation as tnext.
Then the random variable of actual waiting time twait could
be represented as:
twait = (1� pimm) � tnext
Then, we will discuss how to estimate pimm and tnext.
3.2 Estimation of Waiting Time
Let's consider pimm at �rst. We could approximate it by the
proportion of time when there are some vacant taxis parking
beside the road. We de�ne the parking time of vacant taxis,
i.e. tpark, as the duration with at least one vacant taxi
parking on the road to wait for passengers. Therefore, p^imm
for a road r during a timeslot T could be represented as:
p^r;Timm =
tr;Tpark
�T
Here �T denotes the span of timeslot T . By identifying of
some appropriate stops of taxis, we can calculate p^imm for
each road during each timeslot.
For tnext, intuitively, the number of vacant taxis leaving a
road during a timeslot in
uences how long you probably
spend on waiting for a taxi here. Because vacant taxis de-
parting a road often means that they do not �nd any pas-
sengers on the road and then you have a great chance to
take it if you are there. We denote the number of vacant
taxis which have left a road as Nvacant, and de�ne the leav-
ing frequency of vacant taxis, i.e. �, for a road segment r
during a timeslot T as:
�r;T =
Nr;Tvacant
�T
Human-related activities vary over time, so do taxis. There-
fore, we employ the NHPP to model the events of vacant
taxis leaving roads. The rate parameter of NHPP is a time-
dependent function �(t), and we further assume the rate
function has a cycle of 24 hours. For simplicity, we adopt
the piecewise linear function as the rate function of NHPP
and regard each hour as a timeslot. This model assumes �
for a road is stable during a timeslot and the same timeslot
in di�erent days.
To validate our assumptions, we have done the KS-Tests
for Poisson distribution on the data of each same timeslot
in di�erent days. To avoid the e�ect of the sparseness, we
select the roads with enough data. We conduct the tests
on the top 10,000 and top 30,000 roads for comparison1.
Figure 1 shows the proportion of successful KS-Tests at 95%
con�dence level. We could see that the proportion for top
10,000 roads is larger than that for top 30,000 roads, and
the proportion in the wee hours is rather small. These are
because the data in the wee hours and unpopular roads are
sparse and more
uctuant. Considering that our approach is
mainly related to most of the passengers on popular roads in
active time, so our hypotheses basically hold for most cases.
0 5 10 15 20 25
20%
30%
40%
50%
60%
70%
80%
90%
100%
Hour of Day
Pr
op
or
tio
n
of
S
uc
ce
ss
fu
l T
es
ts
data from top 10,000 roads
data from top 30,000 roads
Figure 1: Proportion of successful KS-tests for the
hypothesis of Poisson distribution
Under the Poisson hypothesis within a timeslot, we could
derive the probability distribution of the waiting time for
the next vacant taxi during the timeslot2. According to the
Poisson process, the probability of the next event occurring
within t is [12]:
Pftnext � tg = 1� Pftnext > tg
= 1� PfN(t) = 0g
= 1� e���t
Here N(t) represents the count of the events occurring with-
in t, and PfN(t) = kg = e���t � (��t)k
k!
. Then the probability
density function of tnext is:
p(t) = � � e���t
Thus, we can deduce the expectation of tnext:
E[tnext] =
Z 1
0
t � � � e���t � dt
=
1
�
Notice � in our model denotes the leaving frequency of va-
cant taxis. Therefore with this conclusion, you could realize
why the more vacant taxis leaving means the shorter waiting
time for taxis. And we could regard the expectation as the
estimation of tnext.
1We select the top roads by the number of pick-up events
on it.
2For simplicity, we omit the superscript of parameters in the
following derivation. It must be noted that the value of the
parameter is di�erent for various roads and timeslots.
Then we also need to estimate the �. Here we employ the
maximum likelihood estimation (MLE). If we observe the
number of the vacant taxis leaving from a road at the same
timeslot T for k days, we denote the count of the ith day is
Ni, then the likelihood function is:
L(�) =
kY
i=1
(� ��T )Ni
Ni!
e����T
Setting d ln(L(�))
d�
= 0 and solving �, we obtain the MLE:
�^ =
Pk
i=1Ni
k ��T =
�N
�T
This conclusion means that we could estimate � just by
counting the leaving of vacant taxis in history.
As a consequence, we could estimate the actual waiting time
for vacant taxis as:
t^wait = (1� p^imm) � t^next
= (1� p^imm) � 1
�^
3.3 Confidence of Estimation
Now we will analyze the con�dence level of our estimation.
Let's consider the lower-sided con�dence interval of tnext.
We denote 1 � � quantile of the distribution of tnext as
tnext1�� , then we could get:Z tnext1��
0
t � �e��t � dt = 1� �
The quantile could be solved as:
tnext1�� =
ln(��1)
�
This result shows that, we have 1 � � con�dence level of
which the waiting time would be no longer than ln(��1)
times of the t^next we estimated. If we set the upper bound
of con�dence interval equals to t^next, namely:
ln(��1)
�
=
1
�
We can get � = 1
e
, which means the probability of the wait-
ing time less than our estimation should be 1� 1
e
, which is
about 63.21%. These conclusions imply that our estimation
could be regarded as a reliable upper bound the possible
waiting time in probability.
4. DATA PROCESSING
Our data processing starts with map matching. We have to
map the trajectories of taxis to the roads and calculate the
entering and leaving time of taxis to the roads. We employ
the map matching algorithm proposed by Lou et al. [8].
In addition, we �lter some trajectories which seem unusual,
such as keeping vacant status too long (5 hours), or staying
on the same road too long (2 hours).
Then, according to the model we have established, the pro-
cessing is divided into two parts. The �rst part is the cal-
culation of parking time of vacant taxis. The key step is
to identify the stopping taxis that are waiting for passen-
gers. We should eliminate the situations of waiting tra�c
lights or other purpose stops. We regard the taxi staying
on a road with moderate duration (between 5 minutes and
2 hours) and rather low speed (less than 3.6km/h) as valid.
Because too short time of stopping may be caused by tra�c
lights and too long time of stopping means no desire to take
passengers or some unexpected situations.
The second part is to calculate the estimation of leaving
frequency of vacant taxis � for each road during each hour.
Because the MLE of � is
�N
�T
, our task is just to count the
number of vacant taxis leaving each road in each timeslot
of one hour. And we also �lter some outliers before making
the average.
We process the trajectories happened during about three
month, and calculate the averages �tr;Tpark and
�Nr;Tvacant, then
the estimated waiting time could be represented as:
t^r;Twait = (1�
�tr;Tpark
�T
) � �T
�Nr;Tvacant
=
�T � �tr;Tpark
�Nr;Tvacant
Here �T is the span of a timeslot, i.e. one hour.
However, our estimation of waiting time could not be applied
to all roads, because there are some roads forbidding taxis
to pick-up passengers. For these roads, there may be many
taxis leaving from but few passengers getting on. Due to
lack of the data indicating which road forbids the pick-up
of passengers, we develop a method to detect these roads
through analyzing the trajectories. We de�ne the pick-up
rate, denoted as �r for each road segment r:
�r =
number of pick-up on the road segment r
number of vacant taxis entering the road segment r
If there is a road with enough samples (more than 100 vacant
taxis entering) and very low pick-up rate (less than 0.03),
we will regard it as invalid to wait for taxis. For these roads,
we do not make estimations of waiting time.
It is also worthy to be noted that our data is from the taxis
which account for 1/5 of the total ones in Beijing. If we
assume these 1/5 taxis are randomly distributed in the city,
the waiting time would approximately be shortened to 1/5
of our estimation. We also could measure the actual scale
factor by in-the-�eld study. But regardless of what the ac-
curate factor is, the relative order of the