Exploring and Predicting Severe Road Traffic Crashes with Machine Learning Models

Richard Wen
rwen@ryerson.ca

Introduction

Traffic Collisions Wordwide

  • 3700+ traffic collisions per day
  • Over a million lives lost every year

Vision Zero

  • Road safety policy for designing safer transportation
  • ~22\% reduction in traffic deaths

Data as a Hint

  • Analyze and evaluate effects of engineering and policy changes
  • Gives us an idea of whether certain designs or policies work
  • Massive amounts of open data available

Experiment Objectives

  1. Explore and gain insight on severe road collisions with open data
  2. Test some basic machine learning models to predict the collisions
  3. Evaluate how changes to model parameters affect the collisions

Data Exploration

Datasets

Killed or Seriously Injured (KSI) Collisions

  • Police reports where officer attended collision event
  • Only major (admitted to hospital) or fatal (death) injuries
  • Every row is one person involved
Out[7]:
X Y Index_ ACCNUM YEAR DATE TIME Hour STREET1 STREET2 OFFSET ROAD_CLASS District WardNum WardNum_X WardNum_Y Division Division_X Division_Y LATITUDE LONGITUDE LOCCOORD ACCLOC TRAFFCTL VISIBILITY LIGHT RDSFCOND ACCLASS IMPACTYPE INVTYPE INVAGE INJURY FATAL_NO INITDIR VEHTYPE MANOEUVER DRIVACT DRIVCOND PEDTYPE PEDACT PEDCOND CYCLISTYPE CYCACT CYCCOND PEDESTRIAN CYCLIST AUTOMOBILE MOTORCYCLE TRUCK TRSN_CITY_ EMERG_VEH PASSENGER SPEEDING AG_DRIV REDLIGHT ALCOHOL DISABILITY Hood_ID Neighbourh ObjectId
0 -79.412438 43.767462 80221198 4003162994 2014 2014-10-24T04:00:00.000Z 2315 23 YONGE ST HILLCREST AVE Major Arterial North York 18 0 0 32 0 0 43.767462 -79.412438 Intersection Non Intersection No Control Clear Dark, artificial Dry Non-Fatal Injury Sideswipe Passenger unknown None 0 Yes Yes Yes 51 Willowdale East (51) 12001
1 -79.516246 43.718318 80565670 6001093797 2016 2016-06-22T04:00:00.000Z 2315 23 120 BEVERLY HILLS DR 65 m South of Collector Etobicoke York 7 0 0 31 0 0 43.718318 -79.516246 Mid-Block Non Intersection No Control Clear Dark, artificial Dry Non-Fatal Injury Pedestrian Collisions Driver 25 to 29 None 0 South Automobile, Station Wagon Going Ahead Driving Properly Normal Yes Yes Yes 26 Downsview-Roding-CFB (26) 12002
2 -79.516246 43.718318 80565671 6001093797 2016 2016-06-22T04:00:00.000Z 2315 23 120 BEVERLY HILLS DR 65 m South of Collector Etobicoke York 7 0 0 31 0 0 43.718318 -79.516246 Mid-Block Non Intersection No Control Clear Dark, artificial Dry Non-Fatal Injury Pedestrian Collisions Passenger 30 to 34 None 0 Yes Yes Yes 26 Downsview-Roding-CFB (26) 12003
3 -79.516246 43.718318 80565672 6001093797 2016 2016-06-22T04:00:00.000Z 2315 23 120 BEVERLY HILLS DR 65 m South of Collector Etobicoke York 7 0 0 31 0 0 43.718318 -79.516246 Mid-Block Non Intersection No Control Clear Dark, artificial Dry Non-Fatal Injury Pedestrian Collisions Pedestrian 10 to 14 Major 0 East Vehicle hits the pedestrian walking or running... Crossing, no Traffic Control Inattentive Yes Yes Yes 26 Downsview-Roding-CFB (26) 12004
4 -79.374309 43.662909 80632379 6002153175 2016 2016-12-04T05:00:00.000Z 2315 23 CARLTON STREET HOMEWOOD AVENUE Minor Arterial Toronto and East York 13 0 0 51 0 0 43.662909 -79.374309 Intersection At Intersection No Control Rain Dark, artificial Wet Non-Fatal Injury Pedestrian Collisions Driver 75 to 79 None 0 East Automobile, Station Wagon Turning Left Failed to Yield Right of Way Inattentive Yes Yes Yes 73 Moss Park (73) 12005

KSI Data Cleaning

The following data cleaning was applied to the KSI data:

  • Universal Time Coordinated (UTC) to date object
  • Columns with Yes to 1 and 0 otherwise
  • Aggregate by ACCNUM (represents unique collision identifier)
  • Convert to geospatial data using LONGITUDE and LATITUDE
Out[10]:
ACCNUM LATITUDE LONGITUDE DATE YEAR ROAD_CLASS RDSFCOND TRAFFCTL VISIBILITY LIGHT DISTRICT LOCCOORD IMPACTYPE PEDESTRIAN CYCLIST AUTOMOBILE MOTORCYCLE TRUCK TRSN_CITY_ EMERG_VEH PASSENGER SPEEDING AG_DRIV REDLIGHT ALCOHOL DISABILITY PEDESTRIAN_COUNT CYCLIST_COUNT AUTOMOBILE_COUNT MOTORCYCLE_COUNT TRUCK_COUNT TRSN_CITY_COUNT EMERG_VEH_COUNT PASSENGER_COUNT SPEEDING_COUNT AG_DRIV_COUNT REDLIGHT_COUNT ALCOHOL_COUNT DISABILITY_COUNT geometry
ACCNUM
128407 128407 43.854145 -79.169690 2009-09-01 04:00:00+00:00 2009 Local Dry No Control Clear Dark Scarborough Mid-Block SMV Other 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 5 0 0 0 0 5 5 5 0 0 0 POINT (-79.16969 43.85415)
977199 977199 43.687510 -79.396922 2013-11-15 05:00:00+00:00 2013 Major Arterial Dry No Control Clear Daylight Toronto and East York Mid-Block Pedestrian Collisions 1 0 1 0 0 1 0 1 0 0 0 0 0 6 0 6 0 0 6 0 6 0 0 0 0 0 POINT (-79.39692 43.68751)
1012986 1012986 43.737645 -79.243690 2008-01-05 05:00:00+00:00 2008 Major Arterial Wet No Control Rain Dark Scarborough Intersection Pedestrian Collisions 1 0 1 0 0 0 0 0 0 0 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 POINT (-79.24369 43.73765)
1012988 1012988 43.684345 -79.564990 2008-01-05 05:00:00+00:00 2008 Local Wet No Control Clear Dark Etobicoke York Mid-Block Pedestrian Collisions 1 0 1 0 0 0 0 0 0 0 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 POINT (-79.56499 43.68435)
1013236 1013236 43.737545 -79.420590 2008-01-07 05:00:00+00:00 2008 Major Arterial Wet Traffic Signal Clear Dark North York Intersection Pedestrian Collisions 1 0 1 0 0 0 0 0 0 0 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 POINT (-79.42059 43.73755)
There are 4405 unique collisions from 2008 to 2018.

Centrelines

  • 2.5 meter positional accuracy
  • Streets, rivers, highways, shorelines, trails, utility corridors
  • Not for accurate utility mapping
Out[19]:
GEO_ID LFN_ID LF_NAME ADDRESS_L ADDRESS_R OE_FLAG_L OE_FLAG_R LONUML HINUML LONUMR HINUMR FNODE TNODE FCODE FCODE_DESC JURIS_CODE OBJECTID geometry
0 30079678 19155 Waterfront Trl None None N N 0 0 0 0 30079676 30079656 204001 Trail CITY OF TORONTO 189008.0 LINESTRING (-79.54478 43.58583, -79.54478 43.5...
1 30079680 19166 Marie Curtis Park Trl None None N N 0 0 0 0 30079676 30079679 204001 Trail CITY OF TORONTO 189011.0 LINESTRING (-79.54478 43.58583, -79.54483 43.5...
2 30079677 19155 Waterfront Trl None None N N 0 0 0 0 30008708 30079676 204001 Trail CITY OF TORONTO 189009.0 LINESTRING (-79.54454 43.58611, -79.54464 43.5...
3 30082310 10685 Island Rd None None N N 0 0 0 0 30008708 30082309 201600 Other PRIVATE 191750.0 LINESTRING (-79.54454 43.58611, -79.54433 43.5...
4 30008940 19155 Waterfront Trl None None N N 0 0 0 0 30008711 30008708 204001 Trail CITY OF TORONTO 56495.0 LINESTRING (-79.54386 43.58668, -79.54400 43.5...
There are 69378 centrelines.

Centrelines Data Cleaning

  • Filter FCODES for roads, busways, laneways, ramps, walkways, and highways (left)
  • Ignore other FCODES (right)
FCODE Description FCODE Description
201100 Highway 202001 Major Railway
201101 Highway Ramp 202002 Minor Railway
201200 Major Arterial Road 202003 Railway under construction/proposed
201201 Major Arterial Road Ramp 203001 River
201300 Minor Arterial Road 203002 Creek/Tributary
201301 Minor Arterial Road Ramp 204001 Trail
201400 Collector Road 205001 Hydro Line
201401 Collector Road Ramp 206001 Major Shoreline
201500 Local Road 206002 Minor Shoreline (Land locked)
201600 Other Road
201601 Other Ramp
201700 Laneways
201800 Pending
201803 Access Road
201801 Busway
204002 Walkway
Out[21]:
GEO_ID LFN_ID LF_NAME ADDRESS_L ADDRESS_R OE_FLAG_L OE_FLAG_R LONUML HINUML LONUMR HINUMR FNODE TNODE FCODE FCODE_DESC JURIS_CODE OBJECTID geometry
3 30082310 10685 Island Rd None None N N 0 0 0 0 30008708 30082309 201600 Other PRIVATE 191750.0 LINESTRING (-79.54454 43.58611, -79.54433 43.5...
7 30075947 1047 Ansell Ave 25-25 8-30 O E 25 25 8 30 13470675 30075940 201500 Local CITY OF TORONTO 184980.0 LINESTRING (-79.54310 43.59292, -79.54298 43.5...
8 9950476 1962 Lake Shore Blvd W 3795-3815 None O N 3795 3815 0 0 13470681 13470690 201200 Major Arterial CITY OF TORONTO 57.0 LINESTRING (-79.54180 43.59258, -79.54216 43.5...
9 7641209 1629 Fortieth St None 89-107 N O 0 0 89 107 13470713 13470699 201500 Local CITY OF TORONTO 78.0 LINESTRING (-79.54102 43.59092, -79.54139 43.5...
12 9950042 2007 Lloyd George Ave 30-42 29-35 E O 30 42 29 35 13470641 13470616 201500 Local CITY OF TORONTO 110.0 LINESTRING (-79.54414 43.59431, -79.54451 43.5...
There are now 51514 features (removed 17864 from original 69378 features).
The Coordinate Reference System (CRS) is {'init': 'epsg:4326'}.

Join Centrelines with KSI

  1. Re-project centrelines and KSI to UTM-17N
  2. Buffer centreline by 2.5 meters
  3. Spatially join centrelines with KSI data (left, intersects)
  4. Number of collisions for each centreline feature
Out[26]:
Out[27]:
GEO_ID COLLISIONS FCODE FCODE_DESC geometry DATE YEAR ROAD_CLASS RDSFCOND TRAFFCTL VISIBILITY LIGHT DISTRICT LOCCOORD IMPACTYPE PEDESTRIAN CYCLIST AUTOMOBILE MOTORCYCLE TRUCK TRSN_CITY_ EMERG_VEH PASSENGER SPEEDING AG_DRIV REDLIGHT ALCOHOL DISABILITY PEDESTRIAN_COUNT CYCLIST_COUNT AUTOMOBILE_COUNT MOTORCYCLE_COUNT TRUCK_COUNT TRSN_CITY_COUNT EMERG_VEH_COUNT PASSENGER_COUNT SPEEDING_COUNT AG_DRIV_COUNT REDLIGHT_COUNT ALCOHOL_COUNT DISABILITY_COUNT
GEO_ID
159 159 1 201500 Local POLYGON ((636804.503 4841695.436, 636804.440 4... 2013-05-18 04:00:00+00:00 2013.0 Local Dry Stop Sign Clear Dark Toronto and East York Intersection Rear End 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0
214 214 1 201400 Collector POLYGON ((632096.486 4841579.170, 632096.424 4... 2014-04-09 04:00:00+00:00 2014.0 Major Arterial Dry Traffic Signal Clear Daylight North York Intersection Turning Movement 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0
241 241 1 201400 Collector POLYGON ((630802.680 4841390.617, 630802.924 4... 2016-07-13 04:00:00+00:00 2016.0 Major Arterial Dry Traffic Signal Clear Daylight North York Intersection Turning Movement 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0
242 242 1 201200 Major Arterial POLYGON ((630660.818 4841616.343, 630660.796 4... 2016-07-13 04:00:00+00:00 2016.0 Major Arterial Dry Traffic Signal Clear Daylight North York Intersection Turning Movement 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0
267 267 1 201300 Minor Arterial POLYGON ((632155.890 4841394.890, 632126.740 4... 2014-04-09 04:00:00+00:00 2014.0 Major Arterial Dry Traffic Signal Clear Daylight North York Intersection Turning Movement 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0
A total of 6047 collisions were joined to the centrelines.

Calculate Centreline Geometric Measures

  • Line Length
  • Number of vertices
  • Sinuosity (how curved roads are)
    • $Sinuosity = \frac{actual\ line\ length}{straight\ line\ length}$
Out[30]:
line_length line_vertices line_sinuosity
GEO_ID
108 168.519392 22 1.032565
117 38.993662 6 1.047115
118 90.234621 5 1.004596
120 93.130354 2 1.000000
121 116.666287 3 1.000003

Modelling

Road Collision Prediction

Predict the number of road collisions per centreline feature $i$ given centreline characteristics and KSI variables $X_1 \dots X_N$ (Dummy coded):

$$ Collisions_{i} = f(X_1 \dots X_N) $$
Out[31]:
line_length line_vertices line_sinuosity FCODE_DESC_Access Road FCODE_DESC_Busway FCODE_DESC_Collector FCODE_DESC_Collector Ramp FCODE_DESC_Expressway FCODE_DESC_Expressway Ramp FCODE_DESC_Laneway FCODE_DESC_Local FCODE_DESC_Major Arterial FCODE_DESC_Major Arterial Ramp FCODE_DESC_Minor Arterial FCODE_DESC_Minor Arterial Ramp FCODE_DESC_Other FCODE_DESC_Other Ramp FCODE_DESC_Pending FCODE_DESC_Walkway ROAD_CLASS_ ROAD_CLASS_Collector ROAD_CLASS_Expressway ROAD_CLASS_Laneway ROAD_CLASS_Local ROAD_CLASS_Major Arterial ROAD_CLASS_Minor Arterial ROAD_CLASS_Other ROAD_CLASS_Pending RDSFCOND_ RDSFCOND_Dry RDSFCOND_Ice RDSFCOND_Loose Sand or Gravel RDSFCOND_Loose Snow RDSFCOND_Other RDSFCOND_Packed Snow RDSFCOND_Slush RDSFCOND_Wet TRAFFCTL_ TRAFFCTL_No Control TRAFFCTL_Pedestrian Crossover TRAFFCTL_Police Control TRAFFCTL_Stop Sign TRAFFCTL_Streetcar (Stop for) TRAFFCTL_Traffic Controller TRAFFCTL_Traffic Signal TRAFFCTL_Yield Sign VISIBILITY_ VISIBILITY_Clear VISIBILITY_Drifting Snow VISIBILITY_Fog, Mist, Smoke, Dust VISIBILITY_Freezing Rain VISIBILITY_Other VISIBILITY_Rain VISIBILITY_Snow VISIBILITY_Strong wind LIGHT_Dark LIGHT_Dark, artificial LIGHT_Dawn LIGHT_Dawn, artificial LIGHT_Daylight LIGHT_Daylight, artificial LIGHT_Dusk LIGHT_Dusk, artificial LIGHT_Other DISTRICT_ DISTRICT_Etobicoke York DISTRICT_North York DISTRICT_Scarborough DISTRICT_Toronto East York DISTRICT_Toronto and East York LOCCOORD_ LOCCOORD_Intersection LOCCOORD_Mid-Block LOCCOORD_Park, Private Property, Public Lane
GEO_ID
108 168.519394 22.0 1.032565 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
117 38.993660 6.0 1.047115 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
118 90.234619 5.0 1.004596 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
120 93.130356 2.0 1.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
121 116.666290 3.0 1.000003 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Total of 74 variables.

Variable Selection

  • Random Forest Regressor to select most important variables
  • Based on mean variable importance
Out[33]:
line_length line_vertices line_sinuosity TRAFFCTL_No Control TRAFFCTL_Traffic Signal VISIBILITY_Clear LOCCOORD_Intersection LOCCOORD_Mid-Block
GEO_ID
108 168.519394 22.0 1.032565 0.0 0.0 0.0 0.0 0.0
117 38.993660 6.0 1.047115 0.0 0.0 0.0 0.0 0.0
118 90.234619 5.0 1.004596 0.0 0.0 0.0 0.0 0.0
120 93.130356 2.0 1.000000 0.0 0.0 0.0 0.0 0.0
121 116.666290 3.0 1.000003 0.0 0.0 0.0 0.0 0.0
Total of 8 selected variables.

Machine Learning Models

  • Linear Regression (LR): linear relationships, simple
  • Random Forest Regressor (RFR): ensemble of decision trees
  • Multi Layer Perceptron Regressor (MLPR): neural network optimizing for loss
Mean R^2: 0.7370366518577595
Mean R^2: 0.6589371374001527
Mean R^2: 0.7311185287417293

Predicting Number of Road Collisions

  • Predict on each feature
  • Sum to compare totals
Out[42]:
LR RFR MLPR ACTUAL
GEO_ID
108 0.002339 0.0 -0.023309 0
117 0.000202 0.0 -0.003689 0
118 0.000756 0.0 -0.016249 0
120 0.000623 0.0 -0.013447 0
121 0.000890 0.0 -0.015374 0

Experiments

What If We?

  • Changed the number of traffic signals?
  • Made roads longer?
Out[61]:
line_length line_vertices line_sinuosity TRAFFCTL_No Control TRAFFCTL_Traffic Signal VISIBILITY_Clear LOCCOORD_Intersection LOCCOORD_Mid-Block
count 51514.000000 51514.000000 51514.000000 51514.000000 51514.000000 51514.000000 51514.000000 51514.000000
mean 125.588562 6.065555 1.049492 0.031914 0.039620 0.071301 0.069748 0.013084
std 110.451050 10.725196 0.434922 0.175772 0.195067 0.257329 0.254724 0.113635
min 0.732433 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 51.638315 2.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000
50% 94.591522 2.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000
75% 163.392254 5.000000 1.003284 0.000000 0.000000 0.000000 0.000000 0.000000
max 1877.720459 254.000000 45.297741 1.000000 1.000000 1.000000 1.000000 1.000000

Conclusion

Summary

  • Road collisions not random, can be predicted
  • Temporal effect seen, but not thoroughly examined
  • Models can be unstable, more work needed for hyperparameter tuning

Future Work

  • Incorporate Temporal Measures and Other Geographic Entities
  • Examine Social Media Data
  • Build Software for Experimenting with All Variables

Richard Wen

PhD Candidate, Geomatics Engineering
rwen@ryerson.ca

More Details at: github.com/rrwen/experiment-predict-toronto-geocrash