Team control number
7070
Problem chosen
B
_______________________________________________________________
2010 Mathematical Contest in Modeling (MCM) Summary Sheet
Summary
Since the mystery case of Jack the Ripper occurred, the study of serial crime
has been a fruitful area in mathematical research for decades. Here we attempt
to build a model to generate a geographical profile and predict the next location
of crime based on the past information to aid the local investigation.
The model contains two schemes and a prediction based on results of the two
schemes. In scheme one, according to the theory developed by Newton and
Swoope, the geographic mean of past crime sites could probably be the
offender’s residence. Then a “best-fit” function to describe the relationship
between the crime frequency and the distance from the residence to each
crime site could be estimated, whose error term could be relatively small
because of the use of a variety of fitting methods. Next, we can generate a
geographic profile based on the relationship between crime frequency and the
distance from each crime site to residence. After that, the relationship between
time interval of each crime and location of crime is estimated by linear
regression to adjust the geographic profile we have obtained. In Scheme Two,
different scores, which are calculated by weighting the factors, are assigned to
each unit of the divided map. The area with higher scores has a higher crime
frequency. Finally, we combine the results of two schemes by weighting
summation. An estimator called crime index is introduced to rank the level of
crime possibility based on the combined results. According to crime index, the
final geographic profile illustrated by different colors could be generated. The
areas with highest crime possibility measured by crime index are the prediction
of the next location of crime, which can be obtained from the geographic profile
directly by the local policy.
Furthermore, some specific examples are provided to explain and test the
model. An executive summary is also attached to outline when and how to use
our model to predict location of next crime.
Team # 7070 1 / 35
Contents
Introduction.......................................................................................................................................2
Assumptions......................................................................................................................................2
Model ................................................................................................................................................3
Scheme One ..............................................................................................................................3
Scheme Two............................................................................................................................15
Combination and Prediction....................................................................................................17
Example ..........................................................................................................................................19
Scheme One ............................................................................................................................20
Scheme Two............................................................................................................................24
Combination and Prediction....................................................................................................27
Discussion.......................................................................................................................................29
Weakness.................................................................................................................................29
Strengths..................................................................................................................................30
Further consideration ......................................................................................................................31
Executive summary.........................................................................................................................32
References.......................................................................................................................................34
Team # 7070 2 / 35
Introduction
Since 1888, serial crime became part of our cultural lexicon (Rumbelow, 1988).
The famous mystery case, Jack the Ripper, was certainly not the first nor last of
this type. This kind of crime could have a large negative effect on the society
and also cause the shock, fear, anger, and panic among the community.
Therefore, increasing concerns about this crime has leaded to many researches.
This paper aims to generate a geographical profile to predict the next location
of crime according to the past crime information. Two different schemes, based
on the geographic profiling theory (D.Kim Rossmo, 2000) and the weights
method are used. In addition, some examples are given to explain and text the
model as well.
Assumptions
1. Criminal offenders are rational:
1) Offenders make decisions that benefit themselves by least effects.
2) The crime is planned before it happens.
2. Serial crime is defined as crime involves at least five separate events with
an emotional cooling-off period between homicides. The offender plans his
crime during the cool off period and when the time is right, he select the
location and proceeds with his plan.
3. The locations of the each crime have some relevance with the offenders’
residence.
4. All the crimes are happened in the local place.
5. The location of next crime is in connection with all the time and locations of
the past crimes.
6. The locations of crime selected by the offender are affected by some
common factors, for example traffic convenience, population density and
Team # 7070 3 / 35
crime rate in the local place.
Model
In order to aid the local police agency in their investigations of serial criminals,
a model contains three main parts have been developed: Scheme I, scheme II
to generate a geographic profile and then a prediction method about the next
crime based on the previous profile. In the mean time, an executive summary
will be attached as an overview and guidance of how to use the model.
Scheme One
This scheme aims to construct a geographic profile, which presents the
different level of crime frequency, based on the locations and time of past serial
crimes. According to the theory of D. Kim Rossmo (2000)---- “Crimes often
occur in relatively close proximity to the home of the offender”, we assume
that all of the locations of crime are distributed around one “center of mass”,
which can be regarded as the residence of the offender.
Variables
C i ( ix , iy ) the i th location of the crime measured by latitude ix ,longitude iy .
R( x , y ) the residence of the offender measured by latitude x ,longitude y .
d i the distance between the i th location and the offender’s residence
f the frequency of crime in a certain area
Step one---find the most possible location of offender’s residence
Based on the theory of the geographic profiling, the residence could be re
regarded as the geographic mean of each location. It can be easily caculated by
using n series of data which are availiable from past crimes as:
Team # 7070 4 / 35
( ) ( ) ),(,yx,
n
y
n
x
RyxRR ii ∑∑==
n the total number of the serial crime at present
Step two---find the distance between crime location and the
residence
After estimated the residence of the offender, the distance between each crime
site and residence id could be calculated as
( ) ( )22 iii yybxxad −+−=
a the distance per longitude, which is 111km
b the distance per latitude , which is also approximately 111km
Step three---find the frequency of crime in a certain area
First, we find ),...,,max( 21 ndddl = , then we divide l into k intervals, k is the
number of the interval and decided by the specific features of the serial crime
and the local place. For example, we have =l 6km and then we make k=6,
then we get interval (0,1),(1,2) (2,3), (3,4) (4,5), (5, 6).
Second, k concentric circles could be drawn using the radius r=
k
l
,
k
l2
,…, l . If
k=6 and l=6, we have 6 concentric circle and the radius of them are 1km,
2km,…,6km.
Third, we could get a number of cyclic areas constructed by the concentric
circles, denoted as area A, A 2 ,…, A n . Now the number of crime in each area
could be counted, denoted as a 1 , a 2 ,…, a n .
Final, according to the date we get from the above, the frequency of crime in
each area could be calculated as
n
i
i
Af = .
Step four---study the relationship between the frequency of crime in
each area if and the distance between crime location and the
Team # 7070 5 / 35
residence id
Based on the shape of the data, the previous experience and the distance
decay theory(Hoshua David Kent, 1994), different fitting methods are used to
describe the relationship between the two variables, including the logarithmic,
negative exponential, truncated negative exponential and polynomial. This is
the example of a Euclidean Distance decay Model:
The first three regression lines are shown as follow.
Therefore, a better function could be chosen with the largest correlated
correlation R 2 to describe the relationship between the two variables. Then we
get the “best-fit” line on the graph.
Step Five---generate the geographic profile by the different levels of
crime frequency
After getting the “best-fit” line, two horizontal lines could be plotted on the
graph to divide the frequency into three levels, denoted by low frequency,
medium frequency and high frequency. Then based on the different frequency
of crime and the distance between crime site and residence, a geographic
Team # 7070 6 / 35
profile could be got. For example, the “best fit” line is the negative exponential
line, the graph is
The geographical will look like:
R(x, y)
high
low
medium
Team # 7070 7 / 35
Step six---take the time factor into consideration
First, the time interval between each past crime is denoted as it
( .1,...,2,1 −= ni )
Second, explore a relationship between the time interval it and the distance
between crime location and residence id ( ni ,...,3,2= ). By taking advantage of
the data we get about it and id , do linear regression using different methods.
If no obvious relationship comes out, ignore the effect of time factor on crime
location. If obvious relationship is found, adjust the geographical profile we
got from step five and get a new profile about the crime location. For example,
if it said the time interval and the distance is positive correlated, it means after
a certain period of time, the area with high frequency should be put a little far
away the residence.
Example
A real serial crime case in Texas,USA is introduced to explain and text the
above model.
During 1959 to 1983, Henry Lee Lucas created a serial murder case mainly in
Texas, USA. Although he had admitted to over 1000 murders, there were 11 of
the murders that he had been convicted of. In those 11 of the murders, except
one in Michigan and one in West Virginia, the rest of nine were located in Texas,
which are listed below.
Date Place
August 1970 Kauffman County, Texas
November 1977 Harrison County, Texas
October 1979 Willimson County, Texas
September 1981 Brownfield, Texas
August 1982 Denton, Texas
Team # 7070 8 / 35
September 1982 Ringgold, Texas
December 1982 Hale County, Texas
Match 1983 Montgomery County, Texas
April 1984 Montgomery County, Texas
By the help of the software, GoogleTM Earth , the location of crimes can be
found in graph below.
The latitude and longitude of each location of crime are listed in table below.
N(°) W(°)
32.633 96.317
32.550 94.300
30.750 97.683
33.167 102.267
Team # 7070 9 / 35
33.200 97.933
33.817 97.933
34.100 101.867
30.317 95.467
30.317 95.467
In order to text scheme one, the data from the first murder to the eighth could
be used to form the scheme. Then, estimate the location of the ninth murder by
the method.
Step one---find the residence of the offender
For
( ) ),(,
n
y
n
x
RyxR ii ∑∑=
In the example:
n=8;
n
∑ ix
=32.319;
n
y∑ i
=97.587;
( )yxR , =(32.319,97.587 )
The residence in the map can be pointed out.
Team # 7070 10 / 35
Step two--- calculate the distance between crime location and the
offender’s residence
According to the formula ( ) ( )22 iii yybxxad −+−=
(a the distance per longitude, which is 111km
b the distance per latitude, which is also approximately 111km )
Then we get
crime time d(km)
1 145.28
2 365.76
3 174.43
4 527.90
5 110.90
6 170.68
7 514.55
Team # 7070 11 / 35
8 323.68
Step three---find the frequency of crime in a certain area
First, find kmdddl n 55.514),...,,max( 21 ==
And in order to easy calculation, we make
kml 600=
.
Then we divide l into 6 intervals as (0,100), (100,200), (200,300), (300,400),
(400,500), (500,600).
Second, six concentric circles could be drawn using the radius r=100, 200,…,
600.
Third, we could get six cyclic areas constructed by the concentric circles,
denoted as area A, A 2 ,…, A 6 . Now the number of crime in each area could be
counted, which are 0, 4, 0, 2, 0, 2.
Finally, according to the date we get from the above, the frequency of crime in
each area could be calculated as
n
Af ii = .
Then we get if
Interval Frequency
(0,100) 0
(100,200) 50%
(200,300) 0
(300,400) 25%
(400,500) 0
(500,600) 25%
Step four---explore the relationship between the frequency of crime
in each area if and the distance between crime location and the
residence id .
Team # 7070 12 / 35
In this case, polynomial fitting method with degree of five is regarded to get
the “best fit” line. (A hypothesis)
Step Five---generate the geographic profile by the different levels of
crime frequency.
After getting the “best-fit” line through the polynomial fitting method, two
horizontal lines with f=1.25 and f=0.3 are plotted on the graph to divide the
frequency into three levels, denoted by low frequency, medium frequency and
high frequency.
Team # 7070 13 / 35
Then we can get
Boundary point of each interval Distance
r1 133
r2 166
r3 250
r4 310
r5 500
r6 625
Then based on the different frequency and the distance between crime location
and residence, a geographic profile could be got as
Team # 7070 14 / 35
Red region 0.3