Decision Tree for Crime Assessment

How to Leverage the Decision Tree Approach on Crimes? 

Recursively dividing the data into subgroups according to the values of various input attributes, decision trees make predictions based on the dominant class in each subset. Using this procedure, a tree-like structure is produced, with each node denoting a feature and each branch indicating a potential value for that feature. The ability to pinpoint the most significant contributing elements to crime and base forecasts on those characteristics makes decision trees especially valuable for forecasting crime in Chicago. Decision trees might be used, for instance, to forecast the chance of a crime occuring based on factors like location, time of day, weather, and crime type. Decision trees have the benefit of being simple to understand, which is beneficial in a high-stakes field like crime prediction. 

The tree structure of decision trees can be used to identify which factors and combinations of factors are crucial for predicting crime. This information can be used by law enforcement officers to focus their efforts on the places and times where crime is most likely to occur, thereby minimizing crime rates. The ability of decision trees to handle a combination of continuous and categorical input variables is especially valuable in the domain of crime prediction, where data such as crime type (categorical) and weather (continuous) are common.  

In this study, Decision Trees are used to Classify whether or not Crime in Certain Areas has been incarcerated or not. This application can be critical in determining the conditions or criteria required for an arrest to be made. The patterns of variables that lead to an Arrest, if determined can be valuable to take pre-emptive actions in real-world scenarios.

The Application 

Data Preparation 

Python's a stringent requirement that the data is numeric in terms of Data Type. Naturally, Decision Trees can be worked out with Categorical Data. But programmatically, it is the categorical variables are required to be encoded to represent a numeric form in order to be processed.  pd.factorise approach is used to encode the categorical values to be represented as a unique numeric quantity. 

Features are selectively picked ( referencing a research study that achieved good results on the same features ). The Target for classification is the Arrest Column with Binary Values True or False. 

Features -  Month, Day, Description, Location Description, Community Area, Domestic, Primary Type, District, SocioEconomic-Status 

Target - Arrest

Class Imbalance in Original Dataset

 Balanced Classes - DownSampling

Training and Testing Distributions

80% Train set

20% Test set

Following the general convention, the entire Data set is Split into Two Disjoint Distributions of the illustrated proportions. 

The training set is used to create the decision tree by recursively partitioning the data based on the values of the features in the dataset. At each node of the tree, the algorithm selects the feature that provides the best split of the data, based on some evaluation metric such as information gain or Gini impurity. The process continues until the data is fully partitioned, or until some stopping criteria are met.

Once the decision tree is built using the training set, we evaluate its performance on the test set. We do this by applying the decision tree to each sample in the test set and comparing the predicted outcome to the true outcome. The accuracy of the model is calculated as the proportion of correctly classified samples in the test set.

The purpose of using a separate test set is to evaluate the performance of the decision tree on new, unseen data. This helps to ensure that the model has not simply memorized the training set, but rather has learned to generalize to new data. By evaluating the performance on a separate test set, we can estimate how well the decision tree will perform on new data in the future.

Sample Illustrations of Train and Test 

Original Dataset with Feature Columns

Encoded Dataset with Feature Columns

X ~ y 

Features     ~ Target

Decision Trees & Performance Analysis

Criterion ~ Gini 

Max Depth 41

Tree 1

Quantified Performance

Train Accuracy ~  98%, Test Accuracy ~ 74%

Classfication Report

Tree 1 Visualised ( Pruned Visually until Depth 4 )

Criterion ~ Entropy 

                                                              Max Depth 45

Tree 2

Quantified Performance

Train Accuracy ~  98%, Test Accuracy ~ 74%

Classfication Report

Tree 2 Visualised ( Pruned Visually until Depth 4 )

Grid Search CV to find the Best Estimator

GridSearchCV is a useful technique in machine learning for finding the optimal combination of hyperparameters for a given algorithm. By exhaustively searching through a specified set of hyperparameters and evaluating the performance of the model using cross-validation, GridSearchCV can help to identify the combination of hyperparameters that yields the best performance for the given task. For decision trees, this can be particularly useful for optimizing the performance of the algorithm and improving the accuracy of the model.

Grid Search Cross Validation Illustration

Criterion ~ Entropy 

                             Max Depth 9, Min Sample Leaf 4

Tree

( Best Estimator )

Quantified Performance

Train Accuracy ~  79%, Test Accuracy ~ 79%

Classfication Report

Tree 3 Visualised ( Pruned Visually until Depth 4 )

Results & Conclusion

Based on the given information, we can conclude that the first decision tree estimator was overfitting the training data, as evidenced by the large difference between the train and test accuracies. Despite having a high accuracy on the training data, the overcomplicated structure of the tree with a max depth of 41 was not able to generalize well to new data. The second estimator with criterion entropy and a higher max depth did not improve the overall accuracy, but did result in a slight increase in the other evaluation metrics. However, the best estimator was obtained using Grid Search CV, which identified the optimal combination of hyperparameters (max depth of 9 and min sample leaf of 4). This decision tree had a good balance between bias and variance, resulting in both train and test accuracies of 79%. Overall, Grid Search CV was able to identify the best parameters to prevent overfitting and optimize the performance of the decision tree algorithm.

Insights and Takeaways

Even though the accuracy achieved by the best estimator tree, the performance compared to a real-world scenario is sub-optimal. But on the contrary, upon closer observation of the Tree's it developed the following inferences can be assumed that the decision tree might have considered. 

→ The decision tree might have identified certain areas in Chicago that are more prone to crime than others. By analyzing the patterns in crime data, the decision tree could have identified specific neighborhoods or blocks where crime is more likely to occur. 

→ The decision tree could have found that certain types of crime are more likely to occur at particular times of the day. For instance, the tree might have found that burglaries are more common during the day when people are at work, while violent crimes are more likely to happen at night. 

→ The decision tree could have identified certain demographic factors that are associated with higher crime rates. For example, it might have found that poverty, unemployment, and population density are all factors that contribute to higher crime rates. 

Source Code