Masters Defense

  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Masters Defense as PDF for free.

More details

  • Words: 2,931
  • Pages: 52
Bayesian Network Modeling of Offender Behavior for Criminal Profiling Masters Defense Kelli Crews Baumgartner Laboratory for Intelligent Systems & Control Mechanical Engineering Duke University 10/18/2007

1

Outline • • • • • • •

Background to Criminal Profiling Motivation Bayesian Networks (BNs) Methodology Results Future Works Acknowledgments

10/18/2007

2

The Case of the Mad Bomber • • • •

Male Eastern European Late 40’s to early 50’s Lived in Connecticut with a maiden aunt or sister • Paranoia • Wearing a doublebreasted suit, buttoned

10/18/2007

• • • •

George Meteskey Slavic descent 54 years old Lived in Connecticut with two maiden sisters • Acute Paranoia • When apprehended, was wearing a double breasted suitbuttoned 3

Background • Goals of criminal profiling: – Concentrate criminal investigations – Interrogation strategies

• Collaboration with law enforcement

10/18/2007

4

Previous Research • Early 1980’s: FBI developed the organized/disorganized dichotomy • 1985 -1994: Dr. David Canter expanded the FBI model: interpersonal coherence, time and place, criminal characteristics, criminal behavior, and forensic awareness • 1999-Present: G. Salfati and D. Canter explore the expressive/instrumental dichotomy 10/18/2007

5

Expressive vs. Instrumental • Uses multidimensional scaling (MDS) – Non-metric multidimensional scaling procedure – Plots association coefficients for each crime scene behavior – similarly themed actions will co-occur in the same region of the plot, and vice versa

• Single offender-single victim homicides by the British police (82 cases and 247 cases) 10/18/2007

6

Results of MDS Research • 62% of the cases exhibited a majority of the crime scene characteristics in a single theme • 74% of all offenders could also be classified as either expressive or instrumental • 55% of the cases exhibited the same theme in both their crime scene actions and in the offender background characteristics • High Frequency crime scene behaviors can be eliminated 10/18/2007

7

Motivation • Develop a network model linking the profile of the offender to his/her decisions and behaviors at the crime scene • Determine correlations between input and output variables based on data from real cases – Input variables: Crime Scene Analysis (CSA), victimology assessment, and medical examination. Output variables: Offender profile: sex of offender, prior convictions, relationship with victim – Training data: Solved cases (inputs/outputs known) • Apply software to produce the offender profile for unsolved cases, given the input variables 10/18/2007

8

Criminal Profiling Software Development •

Use expert knowledge to initialize BN variables



Train the NM with solved cases



Test NM with validation cases by inputting the crime scene variables and comparing the predicted offender variables to the observed offender variables. Data

10/18/2007

Initial Model

Inputs

Trained Model

Outputs

9

Belief Networks • Bayesian Networks, B=(S,Θ)

P(X1) X1

– Probabilistic Network

P(x1,1 )

P(x1,2 )

0.4

0.6

xi,j =x variable,state P(X2|X1)

X2

P(x2,1 | X1 ) P(x2,2 | X1 )

X3

P(X3 | X1)

P(x3,1 | X1 ) P(x3,2 | X1 )

X1=x1,1

0.8

0.2

X1=x1,1

0.1

0.9

X1=x1,2

0.3

0.7

X1=x1,2

0.5

0.5

10/18/2007

10

Example of Inference P(X1)

• X2 and X3 are observed • X1 is unknown, to be inferred P(X2|X1)

• Possible states :

10/18/2007

X1

X2

X3

P(X3 | X1)

⎧ X 1 = {x1,1 ,..., x1,r1 } ⎪ ⎨ X 2 = {x 2,1 ,..., x 2,r 2 } ⎪ X = {x ,..., x } 3,1 3, r 3 ⎩ 3 11

Bayes Theorem • Bayes Rule* to infer X1 when X2=x2,r2 and X3=x3,r3: P ( X 1 | x 2 , r 2 , x 3, r 3 ) =

P ( x 2 , r 2 , x 3, r 3 | X 1 ) P ( X 1 ) P( x 2,r 2 , x s ,r 3 )

• Marginalization of the observed variables: r1

3

k =1

j =2

P( x 2,r 2 , x3,r 3 ) = ∑ P( X 1 = x1,k )∏ P( x j ,rj | x k ) • The posterior probability distribution: r1

∑ P( X h =1

10/18/2007

1

= x1,h | x 2,r 2 , x3,r 3 ) = 1 F. Jensen, Bayesian Networks and Decision Graphs, 2001

12

BN Training • Set of training data, T, to learn BN, B=(S,Θ) • Use structural learning algorithm to find Soptimal – Search method needed to decrease search space – Scoring Metric

• Maximum Likelihood Estimation (MLE) algorithm to learn CPTs, P(Θoptimal |Soptimal, T )

10/18/2007

13

Joint Probability Scoring Metric • Relative measure of the level of compatibility of Sh given the training data T, P(Sh |T) • P(Sh |T) α P(Sh,T): P ( S h , Τ) = P( S h | Τ) P (Τ) ⇒ P ( S ih , Τ) P ( S ih , Τ) P( S ih | Τ) P (Τ) = = h h P( S j | Τ) P ( S j , Τ) P ( S hj , Τ) P (Τ) ∴ P ( S ih | Τ) < P ( S hj | Τ) ⇔ P ( S ih , Τ) < P ( S hj , Τ) 10/18/2007

14

Scoring Metric Assumptions 1. 2. 3. 4. 5.

All variables are discrete All structures are equally likely All variables are known with no missing variables All cases in T occur independently given a BN model No prior knowledge of the numerical properties to assign to Bh with structure Sh before observing T B h = (S h , Θ h ) ∈ Β P( S h , T ) = ∫ f (T | S h , Θ h ) f (Θ h | S h ) P( S h )dΘ h ⇒ Θ

ri ( r 1 )! − i P( S h , T ) = P ( S h ) ⋅ ∏∏ N ijk ! ∏ i =1 j =1 ( N ij + ri − 1)! k =1 n

10/18/2007

qi

G.F. Cooper et al. , Machine Learning, 1992.

15

Variable Definition for Scoring Metric ri ( r 1 )! − i P( S h , T ) = P( S h ) ⋅ ∏∏ N ijk ! ∏ i =1 j =1 ( N ij + ri − 1)! k =1 n

qi

n: Number of model variables qi: Number of unique instantiations of πi, where πi=pa(Xi) ri: Number of possible states for Xi Nijk: Number of cases in T that Xi=xi,k and πi is instantiated as wij, where k=(1,…,ri) r Nij: N ij = ∑ N ijk i

k =1

10/18/2007

G.F. Cooper et al. , Machine Learning, 1992.

16

K2 Algorithm • Maximizes scoring metric • Node Ordering: X1 ÆX2ÆX3 • Limit on number of parents h P ( S optimal , T ) = max [ P ( S , T )] ⇒ h S

qi ri n ⎧⎪ ⎫⎪ K 2 ( r 1 )! − h i max P( S ) ⋅ ∏∏ N ijk !⎬ ⎯⎯→ ⎨ ∏ h B ⎪⎩ ⎪⎭ i =1 j =1 ( N ij + ri − 1)! k =1 qi ri ⎧⎪ ⎫⎪ (ri − 1)! max g =∏ N ijk !⎬ ⎨ ∏ h B ⎪⎩ ⎪⎭ j =1 ( N ij + ri − 1)! k =1 10/18/2007

G.F. Cooper et al. , Machine Learning, 1992.

17

K2’ Algorithm P(X4)

X4

P(X2|X1), X4)

10/18/2007

P(X1)| X4) X1 X2

X3

P(X3 | X1)

18

X4

X1 X2

X3

K2’ Algorithm

• Inhibit nodal connections between Input nodes – d-seperation: X3 ⊥ X2, X4 iff X1 is known – X2 and X3 are not affected by X4 Æ X1 relationship if parents are known • Everything else same as K2 ⊥ 10/18/2007

19

The Learned Structure • Initial Structure is an empty graph • Final Structure is learned from T O1 O2 … Om I1

I2 … Ik : Only for K2

10/18/2007

20

Parameter Learning • Maximum Likelihood Estimator (MLE) for parameter learning due to no missing values (EM algorithm otherwise) • MLE determines the parameters that maximize the probability (likelihood) of T t

f (T | Θ h , S h ) = ∏ f (C i | Θ h , S h ) = L(Θ h | T , S h ) i =1

t

Λ = ln L = ∑ ln[L(Θ h | T , S h )] ⇒ i =1

∂Λ =0 h ∂Θ 10/18/2007

21

Modeling Example •

Train a Network Model for a simple problem with two inputs and two outputs: – Inputs, CSA

(1) Place of aggression characteristics (2) Amount of disorder provoked from fight/struggle

– Outputs, criminal profile:

(1) Gender of the offender (2) Presence of sexual relations between victim and offender



Train model with simulated cases using Matlab



Use evidence of the inputs to infer outputs in new cases

10/18/2007

22

Example: Variable Definition Inputs from Crime Scene Analysis (CSA) Checklist (evidence): • Characteristics about Place of Aggression, node PA (1) (2) (3) (4)



Not crowded external place (remote) Semi-crowded external place( semi-remote) Crowded external place Inner place (room, building, office)

DiSorder Provoked by fight/struggle, node DS (1) (2) (3) (4)

In room/area where corpse is found On all the area/room/study/office/store In the vicinities of area/room/study/office/store No disorder provoked

Outputs from Network Model, criminal profile: • Gender of offender, node G (1) (2)



Male Female

Presence of Sexual Relations between victim and offender, node SR

10/18/2007

(1) (2)

Yes No

23

Network Model 1 Solved Case: PA = 2 (Semi-remote) DS = 4 (No disorder) G = 1 (Male) SR = 2 (No)

Network variables: PA (Place of Aggression), DS (DiSorder provoked by fight), G (Gender of offender), SR (Sexual Relations between victim and offender). 10/18/2007

24

Results: Percent Binary Error for Validation Set • Error metric: if x = x* , then error = 0, if x ≠ x*, then error = 1 Percentage of Each Output Node Inferred Incorrectly vs. Number of Training Cases

10/18/2007

25

Full CP Model • 247 cases: – 200 training cases (T) – 47 validation cases (V)

• 57 total variables – 36 CS variables (inputs) – 21 CP variables (outputs)

10/18/2007

26

Internal Stability • Internal stability refers to the consistency of predictions

10/18/2007

27

Overall Performance Summary • Overall Predictive Accuracy (OPA):

OPA(%) =

K C ,CL ( ≥50%)

Kt : Total predictions

Kt

× 100

KC,CL(≥50%): Total correct predictions with CL ≥50% Algorithm Accuracy (%) Correct Predictions (number of nodes) 10/18/2007

K2 64.1% 633

K2’ 79% 780 28

Confidence Level of Predictions • Confidence Level Accuracy (CLA):

CLA(%) = Algorithm: CL:

K C ,CL K CL

× 100

K2 || K2’ KCL

KC,CL

CLA(%)

Δ (KC,CL)

0.5 ≤CL< 0.7 225 || 262 140 || 162 62.2 || 61.8

22

0.7≤CL< 0.9

405 || 470 334 || 386 82.5 || 82.1

52

0.9 ≤ CL

168 || 255 159 || 232

73

10/18/2007

94.6 || 91

29

Zero Marginal Probability Variables Algorithm CL KCL

K2 || K2’ ≥ 50% 798 || 987

ZMP Variables

K t − ( K w + K ZMP ) × 100 PA(%) = Kt 1. Add more training cases 2. Declare variable independencies (K2’) 3. Decrease number of system variables 10/18/2007

30

High Frequency Variables 3. Decrease number of system variables: – –

High Frequency variables are present in more than 50% of cases High Frequency Model (HFM): CP model with HF Variables removed

CS Behavior Face not hidden

Frequency 88.4%

Victim found at scene where killed

78.9%

Victim found face up

61.1%

Multiple wounds to the body

52.2%

10/18/2007

31

HFM Overall Predictive Accuracy • Negligible accuracy increase for K2’ • Decrease in the number of ZMP variables for K2 Algorithm:

K2HFM K2’HFM

K2

K2’

OPA (%)

66%

79.6%

64.1%

79%

KC,≥50%

652

786

633

780

KZMP

168

0

189

0

10/18/2007

32

Frequency of Occurrence • Frequency of occurrence is the number of times the variable was present in the dataset • Frequency method (F) predicts the states of V by the more apparent state in T • The CP by F is the same over all V

10/18/2007

33

OPA for K2’ and F • Overall Predictive Accuracy: Algorithm

K2’

F

Accuracy (%)

79%

79.3%

Correct Predictions 780 (number of nodes)

10/18/2007

784

34

Confidence Level of Predictions • Confidence Level Accuracy (CLA):

Algorithm: CL:

K2’ || F KCL

KC,CL

CLA(%)

Δ (KC,CL)

0.5 ≤ CL< 0.7

262 || 329 162 || 216

65.7 || 61.8

-54

0.7 ≤ CL< 0.9

470 || 470 386 || 396

82.1 || 84.3

-10

0.9 ≤ CL< 0.95

139 || 141 125 || 126

90 || 89.3

-1

0.95 ≤ CL

116 || 47

92.2 || 98

61

10/18/2007

107 || 46

35

Information Entropy • Information Entropy (H) quantifies the certainty/uncertainty of a model • Amount of information is related to the confidence of the prediction: less entropy means more confidence in the long run ri

H = −∑ pi log( pi ) i =1

10/18/2007

36

H for K2’ and F • K2’ model H is an infeasible calculation • An average of each variable’s H is a suitable measure k

H ( X 1,..., X k ) = ∑ H ( X i | X i −1 ,..., X 1 ) ⇒ i =1 k

H ( X 1,..., X k ) ≤ ∑ H ( X i ) i =1

• H(K2’)=0.43 vs. H(F)=0.48 10/18/2007

37

CL Ranges for Predictions

10/18/2007

38

Crime Scene Variables • Input variables from Crime Scene (evidence) Input Variable

Definition

I1, pen

Foreign Object Penetration

I2, hid

Face Hidden

I3, blnd

Blindfolded

I4, blnt

Wounds caused by blunt Instrument

I5, suff

Suffocation (other than strangulation)

10/18/2007

39

Criminal Profile Variables • Output variables comprising the criminal profile Output Variable

Definition

O1, yoff

Young offender (17-21 years old)

O2, thft

Criminal record of theft

O3, frd

Criminal record of fraud

O4, brg

Criminal record of burglary

O5, rlt

Relationship with victim

O6, unem

Unemployed at time of offense

O7, male

Male

O8, famr

Familiar with area of offense occurrence

10/18/2007

40

Predicted Case vs. Actual Case • K2’ Profile • Frequency Profile -A- Young: A (0.813) Young: A (0.805) -A- Theft: A (0.75) Theft: A (0.54) -P- Fraud: A (0.76) Fraud: A (0.67) -A- Burglary: A (1) Burglary: A (0.67) Relationship: A (0.64) -A- Relationship: A (1) Unemployed: P (0.52) -P- Unemployed: P (0.79) -A- Male: A (1) Male: P (0.9) Familiar w/ area: P (0.86) Familiar w/ area: P (0.91) -P10/18/2007

41

Predicted Case vs. Actual Case • K2’ Profile • Frequency Profile -A- Young: A (0.83) Young: A (0.81) -P- Theft: P (0.52) Theft: A (0.54) -A- Fraud: A (0.67) Fraud: A (0.67) -P- Burglary: A (0.54) Burglary: A (0.67) Relationship: A (0.64) -A- Relationship: A (0.57) Unemployed: P (0.52) -P- Unemployed: P (0.53) -P- Male: P (0.82) Male: P (0.9) Familiar w/ area: P (0.86) Familiar w/ area: P (0.86) -P10/18/2007

42

Predicted Case vs. Actual Case • K2’ Profile • Frequency Profile -A- Young: A (0.83) Young: A (0.81) -A- Theft: P (0.97) Theft: A (0.54) -P- Fraud: A (0.67) Fraud: A (0.67) -A- Burglary: P (0.60) Burglary: A (0.67) Relationship: A (0.64) -P- Relationship: A (0.59) Unemployed: P (0.52) -A- Unemployed: P (0.61) -A- Male: P (1) Male: P (0.9) Familiar w/ area: P (0.86) Familiar w/ area: P (0.87) -P10/18/2007

43

Predicted Case vs. Actual Case • K2’ Profile • Frequency Profile -A- Young: A (0.93) Young: A (0.81) -A- Theft: A (0.75) Theft: A (0.54) -P- Fraud: P (0.73) Fraud: A (0.67) -A- Burglary: A (0.82) Burglary: A (0.67) Relationship: A (0.64) -P- Relationship: P (0.56) Unemployed: P (0.52) -A- Unemployed: A (0.87) -P- Male: P (0.89) Male: P (0.9) Familiar w/ area: P (0.86) Familiar w/ area: P (0.78) -P10/18/2007

44

Evidence: pen: penetration blnd: blindfolded hid: face hidden blnt: blunt instrument suff: suffocation CP: yoff: young offender frd: fraud rlt: relationship w/ victim thft: theft brg: burglary famr: familiar w/ area unmp: unemployed 10/18/2007

Slice of K2’ DAG …

yoff

thft





pen

frd

brg

unmp suff

rlt

blnd

male

famr

hid

blnt 45

Conclusions • Due to the absence of ZMP variables, the K2’ structural learning algorithm requires fewer cases compared to K2 • A benefit of a BN model over the naïve frequency approach to acquire a CP is the range of confidence levels of the BN model due to the evidence • Because all of the variables are binary, the frequency approach is more susceptible to better performance than if the variables had many states 10/18/2007

46

Further Research • Develop a search algorithm that increases performance for BN • Incorporate Salfati’s Expressive/ Instrumental dichotomy to supervise training of a BN model • Apply method to other fields • Combine NN and BN methods to improve model performance 10/18/2007

47

Neural Network Research • Neural networks implemented similar to BN Algorithm: Nodes Predicted Correctly

K2’ 780

NN 739

Overall PA (%)

79%

75%

10/18/2007

48

Acknowledgments • Special thanks to – Dr. Silvia Ferrari, advisor – Dr. Gabrielle Salfati, John Jay College of Criminal Justice – Dr. Marco Strano, President of the International Crime Analysis Association (ICAA) – My Masters Committee 10/18/2007

49

April Fools' Day Origin • April 1 was originally New Years Day in France • 1582 Pope Gregory decreed the Gregorian calendar was to replace the Julian calendar • January 1 is New Years day according to new calendar • Those who refused to accept the new calendar were April fools 10/18/2007

50

Questions?

10/18/2007

51

Laboratory for Intelligent Systems & Control Mechanical Engineering Duke University

10/18/2007

52

Related Documents

Masters Defense
October 2019 37
Masters
May 2020 31
Masters
June 2020 20
Defense
June 2020 25
Defense
December 2019 55
Cas-masters
November 2019 43