[UrbComp 2012] Where to Wait for a Taxi下载_在线阅读_8

is_481922

暂无简介

[UrbComp 2012] Where to Wait for a Taxi Where to Wait for a Taxi? Xudong Zheng State Key Laboratory of Software Development Environment Beihang University Beijing, China zhengxudong@nlsde. buaa.edu.cn Xiao Liang State Key Laboratory of Software Development Environment Beihang University Beijing, Chi...

Where to Wait for a Taxi? Xudong Zheng State Key Laboratory of Software Development Environment Beihang University Beijing, China zhengxudong@nlsde. buaa.edu.cn Xiao Liang State Key Laboratory of Software Development Environment Beihang University Beijing, China liangxiao@nlsde. buaa.edu.cn Ke Xu � State Key Laboratory of Software Development Environment Beihang University Beijing, China kexu@nlsde.buaa.edu.cn ABSTRACT People often have the demand to decide where to wait for a taxi in order to save their time. In this paper, to address this problem, we employ the non-homogeneous Poisson process (NHPP) to model the behavior of vacant taxis. According to the statistics of the parking time of vacant taxis on the roads and the number of the vacant taxis leaving the roads in his- tory, we can estimate the waiting time at di�erent times on road segments. We also propose an approach to make recom- mendations for potential passengers on where to wait for a taxi based on our estimated waiting time. Then we evaluate our approach through the experiments on simulated passen- gers and actual trajectories of 12,000 taxis in Beijing. The results show that our estimation is relatively accurate and could be regarded as a reliable upper bound of the waiting time in probability. And our recommendation is a trade- o� between the waiting time and walking distance, which would bring practical assistance to potential passengers. In addition, we develop a mobile application TaxiWaiter on Android OS to help the users wait for taxis based on our approach and historical data. Categories and Subject Descriptors H.2.8 [Database Management]: data mining, spatial data- bases and GIS General Terms Modeling, Statistics, Experimentation Keywords Vacant taxi, waiting time, poisson distribution 1. INTRODUCTION Taxis play an important role in the transportation of cities. For example, Beijing has more than 60,000 taxis to provide �Corresponding author. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. UrbComp ’12, August 12, 2012. Beijing, China. Copyright 2012 ACM 978-1-4503-1542-5/08/2012 ...$15.00. services for about 2,000,000 passengers every day. Howev- er, there are still many people often annoyed with waiting for taxis. It is not only because of the imbalance between supply and demand, but also due to the lack of the vacan- t taxis information provided to passengers. For example, if a person does not know that there is another road near- by with more vacant taxis passing by, he/she would spend much more time on waiting for a taxi in current location. Experienced passengers could choose a better road to wait for taxis based on historical experiences. But more people have little knowledge about like how long they would take to wait for taxis here or where is better to wait for taxis, especially in some strange places for them, which may a�ect the their travels and schedules very much. In this paper, we propose a method to estimate the waiting time for a vacant taxi at a given time and place, and then provide an approach to make recommendations for potential passengers on where to wait for a taxi. To make this estima- tion, we establish a model to describe the behavior of vacant taxis. Our model is based on the following observations: � The higher proportion of time with vacant taxis parking beside the road, the more chances you can take a taxi immediately here. This situation usually occurs during the idle time around some popular places. � The more vacant taxis leaving a road, the more chances you can take a taxi quickly. The waiting time is a�ected not only by the number of vacant taxis entering a road, but also by how many people want to take taxis here at that time. Because we do not have the data to directly show the demand for taxis, we think the number of vacant taxis leaving a road approximately reveals the remaining chance for a passenger to take a taxi here after the demand on the road is all met. Motivated by the two observations above, we adopt the non- homo-geneous Poisson process (NHPP) [11] to model the events of vacant taxis' leaving and derive the probability distribution of the waiting time. Then we could perform es- timations and recommendations based on the distribution. We also do some experiments to demonstrate that our ap- proach is practicable and then develop a mobile application to help people wait for taxis. Our study is built upon the GPS trajectories of taxis in Bei- jing, China. This data is collected from more than 12,000 taxis, which account for about one-�fth of total ones in the city. We select the data between Oct. 2010 and Jan. 2011 to study. Each GPS record contains the identi�er of a taxi, current position, timestamp, service status, and some oth- er information. The data sampling interval of each taxi is about 60 seconds. The major contributions of our work include: � We employ the NHPP to model the behavior of vacant taxis, which could approximate the real situation well and have a simple form to derive the probability distribution of waiting time. � We estimate the waiting time for vacant taxis at di�erent time on road segments, analyze the con�dence of our esti- mation, and design a recommender system for the people who want to take taxis. � We conduct a lot of experiments to evaluate our approach on simulated passengers and actual trajectories of taxis. The results show that our estimation is relatively accurate and our recommendation would be helpful to potential passengers. � We put our approach into practice by developing a mobile application which could help the user �nd an appropriate place to wait for a taxi. The rest of this paper is organized as follows. In Section 2, we give an overview of the related work. Section 3 in- troduces our model used to estimate the waiting time for a vacant taxi. Section 4 describes the data processing of our work. In Section 5, we analyze the results of our estimat- ed waiting time. Then, we discuss the recommendation for passengers in Section 6. Section 7 shows the experiments and evaluations on our approach. Section 8 introduces an application we developed to help people wait for taxis. Fi- nally, we make a conclusion and propose some future work in Section 9. 2. RELATEDWORK 2.1 Recommendations about Taxicabs Recent years have witnessed the explosive research interest on taxi trajectories [1, 6, 7, 14, 18]. Moreover, many works have also been done to investigate the recommendations for taxi passengers or drivers [2, 5, 9, 13, 17]. Phithakkitnukoon et al. [9] study the prediction of vacan- t taxis number to provide the information for tourists or taxi service providers. They employ the method based on the na�ve Bayesian classi�er and obtain the prior probability distribution from the historical data. However their method divides the region of the city into one-kilometer square grid- s which are too rough to provide practical information for passengers. In addition, their data is only from the traces of 150 taxis in Lisbon, which might not be enough to re- veal laws of vacant taxis for the reason of weak statistical signi�cance. Ge et al. [2] develop a recommender system for taxi drivers which has the ability in recommending a sequence of pick-up points or parking positions so as to maximize a taxi driver's pro�t. They estimate the probability of pick-up events for each candidate point, and then propose an algorithm to dis- cover a route with minimal potential travel distance before having customer. Li et al. [5] study the strategies for taxi drivers as well. They use L1-Norm SVM to discover the most discriminative features to distinguish the performance of the taxis, and then extract some driving patterns to improve the performance of the taxis. However, all these studies do not concern about the recommendation for passengers. Yuan et al. [17] propose an approach to make recommenda- tions for both taxi drivers and passengers. They establish a probabilistic model to describe the probability to pick-up passengers, the duration before the next trip, and the dis- tance of the next trip for a vacant taxi. Then they provide some di�erent strategies for taxi drivers, each of which is based on the optimization of one aspect (probability of pick- up, cruising time, or pro�t). They further extend their work in [16]. Although their methods could also provide recom- mendations for passengers, their research mainly focuses on drivers. Comparing with them, our research stands on the view of passengers and pays more attentions to estimating the waiting time for vacant taxis. Yang et al. [13] study the equilibrium of taxi market from the standpoint of economics. They use a bilateral searching and meeting function to characterize the search frictions be- tween vacant taxis and unserved customers. They build a model to describe the relationship between the supply and demand for taxis, and analyze some in uencing factors on customer waiting time. But in our study, because of lack- ing of explicit data for the demand of taxis, we actually estimate the gap between supply and demand through the number of vacant taxis. Moreover, the object of their study is an aggregate taxi market, but our target is to estimate the waiting time for a vacant taxi at a given time and position in a microscopic view. 2.2 Map Matching Map matching is a main step of data preprocessing in our work. It refers to the process of mapping the GPS points to the road segments to recover a complete path of a trajecto- ry. Quddus makes a survey of map matching algorithms in [10], including geometric, topological, probabilistic, and oth- er advanced algorithms. He also discusses the performances and limitations of them. Lou et al. [8] propose a new algo- rithm for low-sampling-rate GPS data, which considers the temporal and spatial constraints on the trajectories, then constructs a weighted candidate graph to choose the most appropriate path. Yuan et al. further improve Lou's method in [15] later. 2.3 Non-homogeneous Poisson Process Poisson process is a stochastic process that is often used to study the occurrence of events. It assumes the arriving rate of events � is always stable, and has the Poisson distribu- tion of counting and exponential distribution of inter-event time. Non-homogeneous Poisson process [11] is a Poisson process with a time-dependent arriving rate function �(t). This model is more exible and appropriate to depict the human-related activities because these activities often vary over time and have periodicity. [3] studies the NHPP having cyclic behavior, and [4] introduces a method to estimate the �(t) in NHPP using a piecewise linear function. 3. MODEL Here we propose a model to describe the waiting time for a vacant taxi, and derive the probability distribution of it. Then we estimate the waiting time using the expectation of the distribution. Finally, we analyze the con�dence level of our estimation. 3.1 Motivation The time to wait for a taxi re ects the availability of taxis on a road. Waiting for a taxi on a road could be divided into the two situations: 1) there are some vacant taxis just stopping beside the road, then you could take the taxi immediately; 2) there are no vacant taxis at hand, you should wait for the coming of next vacant taxi. We could denote the probability of the �rst situation as pimm, and the waiting time in the second situation as tnext. Then the random variable of actual waiting time twait could be represented as: twait = (1� pimm) � tnext Then, we will discuss how to estimate pimm and tnext. 3.2 Estimation of Waiting Time Let's consider pimm at �rst. We could approximate it by the proportion of time when there are some vacant taxis parking beside the road. We de�ne the parking time of vacant taxis, i.e. tpark, as the duration with at least one vacant taxi parking on the road to wait for passengers. Therefore, p^imm for a road r during a timeslot T could be represented as: p^r;Timm = tr;Tpark �T Here �T denotes the span of timeslot T . By identifying of some appropriate stops of taxis, we can calculate p^imm for each road during each timeslot. For tnext, intuitively, the number of vacant taxis leaving a road during a timeslot in uences how long you probably spend on waiting for a taxi here. Because vacant taxis de- parting a road often means that they do not �nd any pas- sengers on the road and then you have a great chance to take it if you are there. We denote the number of vacant taxis which have left a road as Nvacant, and de�ne the leav- ing frequency of vacant taxis, i.e. �, for a road segment r during a timeslot T as: �r;T = Nr;Tvacant �T Human-related activities vary over time, so do taxis. There- fore, we employ the NHPP to model the events of vacant taxis leaving roads. The rate parameter of NHPP is a time- dependent function �(t), and we further assume the rate function has a cycle of 24 hours. For simplicity, we adopt the piecewise linear function as the rate function of NHPP and regard each hour as a timeslot. This model assumes � for a road is stable during a timeslot and the same timeslot in di�erent days. To validate our assumptions, we have done the KS-Tests for Poisson distribution on the data of each same timeslot in di�erent days. To avoid the e�ect of the sparseness, we select the roads with enough data. We conduct the tests on the top 10,000 and top 30,000 roads for comparison1. Figure 1 shows the proportion of successful KS-Tests at 95% con�dence level. We could see that the proportion for top 10,000 roads is larger than that for top 30,000 roads, and the proportion in the wee hours is rather small. These are because the data in the wee hours and unpopular roads are sparse and more uctuant. Considering that our approach is mainly related to most of the passengers on popular roads in active time, so our hypotheses basically hold for most cases. 0 5 10 15 20 25 20% 30% 40% 50% 60% 70% 80% 90% 100% Hour of Day Pr op or tio n of S uc ce ss fu l T es ts data from top 10,000 roads data from top 30,000 roads Figure 1: Proportion of successful KS-tests for the hypothesis of Poisson distribution Under the Poisson hypothesis within a timeslot, we could derive the probability distribution of the waiting time for the next vacant taxi during the timeslot2. According to the Poisson process, the probability of the next event occurring within t is [12]: Pftnext � tg = 1� Pftnext > tg = 1� PfN(t) = 0g = 1� e��t Here N(t) represents the count of the events occurring with- in t, and PfN(t) = kg = e��t � (��t)k k! . Then the probability density function of tnext is: p(t) = � � e��t Thus, we can deduce the expectation of tnext: E[tnext] = Z 1 0 t � � � e��t � dt = 1 � Notice � in our model denotes the leaving frequency of va- cant taxis. Therefore with this conclusion, you could realize why the more vacant taxis leaving means the shorter waiting time for taxis. And we could regard the expectation as the estimation of tnext. 1We select the top roads by the number of pick-up events on it. 2For simplicity, we omit the superscript of parameters in the following derivation. It must be noted that the value of the parameter is di�erent for various roads and timeslots. Then we also need to estimate the �. Here we employ the maximum likelihood estimation (MLE). If we observe the number of the vacant taxis leaving from a road at the same timeslot T for k days, we denote the count of the ith day is Ni, then the likelihood function is: L(�) = kY i=1 (� ��T )Ni Ni! e��T Setting d ln(L(�)) d� = 0 and solving �, we obtain the MLE: �^ = Pk i=1Ni k ��T = �N �T This conclusion means that we could estimate � just by counting the leaving of vacant taxis in history. As a consequence, we could estimate the actual waiting time for vacant taxis as: t^wait = (1� p^imm) � t^next = (1� p^imm) � 1 �^ 3.3 Confidence of Estimation Now we will analyze the con�dence level of our estimation. Let's consider the lower-sided con�dence interval of tnext. We denote 1 � � quantile of the distribution of tnext as tnext1�� , then we could get:Z tnext1�� 0 t � �e��t � dt = 1� � The quantile could be solved as: tnext1�� = ln(��1) � This result shows that, we have 1 � � con�dence level of which the waiting time would be no longer than ln(��1) times of the t^next we estimated. If we set the upper bound of con�dence interval equals to t^next, namely: ln(��1) � = 1 � We can get � = 1 e , which means the probability of the wait- ing time less than our estimation should be 1� 1 e , which is about 63.21%. These conclusions imply that our estimation could be regarded as a reliable upper bound the possible waiting time in probability. 4. DATA PROCESSING Our data processing starts with map matching. We have to map the trajectories of taxis to the roads and calculate the entering and leaving time of taxis to the roads. We employ the map matching algorithm proposed by Lou et al. [8]. In addition, we �lter some trajectories which seem unusual, such as keeping vacant status too long (5 hours), or staying on the same road too long (2 hours). Then, according to the model we have established, the pro- cessing is divided into two parts. The �rst part is the cal- culation of parking time of vacant taxis. The key step is to identify the stopping taxis that are waiting for passen- gers. We should eliminate the situations of waiting tra�c lights or other purpose stops. We regard the taxi staying on a road with moderate duration (between 5 minutes and 2 hours) and rather low speed (less than 3.6km/h) as valid. Because too short time of stopping may be caused by tra�c lights and too long time of stopping means no desire to take passengers or some unexpected situations. The second part is to calculate the estimation of leaving frequency of vacant taxis � for each road during each hour. Because the MLE of � is �N �T , our task is just to count the number of vacant taxis leaving each road in each timeslot of one hour. And we also �lter some outliers before making the average. We process the trajectories happened during about three month, and calculate the averages �tr;Tpark and �Nr;Tvacant, then the estimated waiting time could be represented as: t^r;Twait = (1� �tr;Tpark �T ) � �T �Nr;Tvacant = �T � �tr;Tpark �Nr;Tvacant Here �T is the span of a timeslot, i.e. one hour. However, our estimation of waiting time could not be applied to all roads, because there are some roads forbidding taxis to pick-up passengers. For these roads, there may be many taxis leaving from but few passengers getting on. Due to lack of the data indicating which road forbids the pick-up of passengers, we develop a method to detect these roads through analyzing the trajectories. We de�ne the pick-up rate, denoted as �r for each road segment r: �r = number of pick-up on the road segment r number of vacant taxis entering the road segment r If there is a road with enough samples (more than 100 vacant taxis entering) and very low pick-up rate (less than 0.03), we will regard it as invalid to wait for taxis. For these roads, we do not make estimations of waiting time. It is also worthy to be noted that our data is from the taxis which account for 1/5 of the total ones in Beijing. If we assume these 1/5 taxis are randomly distributed in the city, the waiting time would approximately be shortened to 1/5 of our estimation. We also could measure the actual scale factor by in-the-�eld study. But regardless of what the ac- curate factor is, the relative order of the

本文档为【[UrbComp 2012] Where to Wait for a Taxi】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

[UrbComp 2012] Where to Wait for a Taxi

热门搜索

历史搜索