Support Vector Machines for Crime

Estimating Incarcerations through Support Vector Machines 

The utilization of Support Vector Machines (SVM) for this particular undertaking presents numerous potential advantages. Support Vector Machines (SVMs) are deemed highly appropriate for scenarios where the number of features is relatively high in comparison to the number of samples. This is frequently observed in datasets pertaining to criminal activities. Furthermore, Support Vector Machines (SVMs) possess the capability to discern intricate non-linear associations between the input characteristics and the outcome variable, a crucial aspect in forecasting arrests in the Chicago crime dataset.

Through the anticipation of the probability of an apprehension for a particular offense, law enforcement entities can enhance their resource allocation to deter criminal activity and promote the welfare of the community. If an SVM model predicts a high probability of arrest for a specific type of crime in a particular location, law enforcement agencies may opt to augment patrols or other preventive measures in that region.

The Application

Data Preparation 

Similar to the Applications of other supervised techniques used in this project, the requirement of numerical data is mandatory in order to perform SVM in Python. pd.factorise is used to encode the categorical data into a uniquely represented numerical value. 

In contrast to the decision tree model, for SVM the features are again selectively chosen to yield better performance. The chosen features are listed below: 

Features - dayofweek, Community Area, Location Description, Primary Type, SocioEconomic-Status, Block

Target - Arrest

After selectively choosing the featured Data, considering the enormous class imbalance that is present intrinsically in the dataset, SMOTE undersampling is used. SMOTE (Synthetic Minority Over-sampling Technique) and undersampling are two commonly used techniques in machine learning for handling imbalanced datasets, where one class is significantly underrepresented compared to the other. SMOTE oversamples the minority class by creating synthetic examples, while undersampling reduces the majority class by randomly removing examples.

SMOTE undersampling is a combination of these two techniques, where an imbalanced dataset is first undersampled to reduce the majority class and then SMOTE is applied to oversample the minority class. This can help to balance the dataset while also ensuring that the synthetic examples generated by SMOTE are not overwhelmed by the majority class. Additionally, to for faster computation speeds the data is Scaled using a Standard Scaling procedure.

Sample Illustrations of Data before and after transformation.

Original Data set with Feature Columns

Feature Data Scaled and Downsampled

Modeling and Evaluating Support Vector Machine

SVC(kernel = Linear, C=[0.1, 1, 5 ,10])

Train Accuracy:  0.7486 Test Accuracy:  0.7393

Classification Report

Train Accuracy:  0.7493 Test Accuracy:  0.7382

Classification Report

Train Accuracy:  0.7492 Test Accuracy:  0.7382

Classification Report

Train Accuracy:  0.7492 Test Accuracy:  0.7382

Classification Report

SVC(kernel = rbf, C=[0.1, 1, 5 ,10])

Train Accuracy:  0.7917 Test Accuracy:  0.7985

Classification Report

Train Accuracy:  0.8147 Test Accuracy:  0.8145

Classification Report

Train Accuracy:  0.8322 Test Accuracy:  0.8228

Classification Report

Train Accuracy:  0.8432 Test Accuracy:  0.8235

Classification Report

SVC(kernel = poly, C=[0.1, 1, 5 ,10])

Train Accuracy:  0.7574 Test Accuracy:  0.7535

Classification Report

Train Accuracy:  0.7726 Test Accuracy:  0.7675

Classification Report

Train Accuracy:  0.7757 Test Accuracy:  0.7692

Classification Report

Train Accuracy:  0.7766 Test Accuracy:  0.7708

Classification Report

Results, Discussion and Comparison

Similar to the how the models were trained, the comparison is divided into the type of kernels and their inference.  

Linear Kernel 

The train and test accuracy scores for each model are first computed. For all the models, the train accuracy is found to be around 73%, although the test accuracy is somewhat higher, at roughly 74%. This shows that the model is functioning consistently over a range of C values and is not overfitting the training set.

The precision, recall, and F1 score for both true and false classes are then shown in the classification report. It can be shown that all four models virtually have the same accuracy, recall, and F1 score for the true and false classes. This suggests that the model's classification performance is not considerably impacted by the C value selection.

In conclusion, the results indicate that the performance of the SVC model is not significantly impacted by the C value selection. To improve the performance of the model, it is still advised to carry out a more thorough examination, such as hyperparameter tweaking.

At a regularization strength of C=0.1, the training dataset exhibits a classification accuracy of 79%, while the testing dataset demonstrates a slightly improved accuracy of 79.85%. The evaluation metrics of precision, recall, and F1 score have been computed for the false and true classes, revealing that the former exhibits a performance of approximately 76%, whereas the latter demonstrates a precision, recall, and F1 score of approximately 83%. The present evidence suggests that the model is exhibiting satisfactory performance with the given value of C.

Radial Bias Function as a Kernel 

At C=1, the training accuracy exhibits a marginal increase to 81.47%, while the testing accuracy stands at 81.45%. The evaluation metrics of precision, recall, and F1 score have been computed for the false and true classes, revealing that the former exhibits a performance of approximately 78%, whereas the latter demonstrates a precision, recall, and F1 score of roughly 85%. The findings indicate a marginal enhancement in the model's efficacy with the implementation of this particular C value.

At the value of C=5, the accuracy of the train set has exhibited a notable improvement, reaching a level of 83.23%. Similarly, the accuracy of the test set has also demonstrated an increase, achieving a higher value of 82.28%. The evaluation metrics of precision, recall, and F1 score for the negative class exhibit a value of approximately 79%, whereas for the positive class, the corresponding metrics demonstrate a value of approximately 86%. The present findings indicate that the model's efficacy has been enhanced to a greater extent with the implementation of this particular C value.

At the value of C equal to 10, the train accuracy attains its apex at 84.32%, while the test accuracy registers a slightly lower value of 82.35%. The evaluation metrics of precision, recall, and F1 score for the negative class exhibit a value of approximately 79%, whereas for the positive class, the precision, recall, and F1 score manifest a value of roughly 86%. The findings indicate that the model's optimal performance is achieved when utilizing this particular value of C, as evidenced by its superior train accuracy. However, it is noteworthy that the corresponding test accuracy is marginally lower than that obtained with the preceding value of C.

In summation, one can deduce that the efficacy of the Support Vector Classification (SVC) model utilizing a Radial Basis Function (RBF) kernel is contingent upon the selection of the C parameter. The empirical results indicate that the model's efficacy is positively correlated with the increment of C from 0.1 to 5. However, it is noteworthy that the model's tendency to overfit the data may escalate with the further augmentation of C. Hence, it is imperative to calibrate the C value in accordance with the particular conundrum and dataset to enhance the efficacy of the model.

Polynomial  Kernel 

The findings reveal a positive correlation between the augmenting values of C and the corresponding upsurge in the training accuracy of the model. Despite the implementation of various measures, the enhancement in test accuracy is observed to be merely marginal. The disparity between the values of C=0.1 and C=10 is at most 0.017. The implication of this observation is that augmenting the value of C beyond a certain threshold may potentially result in the overfitting of the model.

The classification report reveals that the precision, recall, and F1 score for the True class exhibit a consistent superiority over their False class counterparts, irrespective of the C values. This implies that the model excels in detecting the True class in comparison to the False class.

Moreover, it is evident that precision, recall, and F1 score pertaining to the False class exhibit a marginal increase with the escalation of C value, whereas these performance metrics remain relatively constant for the True class. The proposition arises that augmenting the value of C could potentially enhance the model's capacity to accurately discern instances belonging to the False category.

In summation, it can be inferred that the utilization of a polynomial kernel in conjunction with an SVC model yields a moderate level of efficacy when applied to this particular dataset. The highest level of accuracy achieved during testing was 0.771, with a superior ability to correctly identify instances of the True class as opposed to the False class.



A Technical Conclusion and Final Verdict about the Optimal Model Choice.

Across all three kernel classifications, the Support Vector Machine (SVM) model undergoes training utilizing four distinct values of C. The regularization parameter denoted by C plays a pivotal role in determining the optimal balance between minimizing the training error and the testing error. The assessment of each model's efficacy is conducted through a comprehensive evaluation of its training accuracy, testing accuracy, and classification report.

The model's performance remains relatively stable across all C values when utilizing the linear kernel. The obtained results indicate that the training accuracy and testing accuracy hover around 73%. Additionally, the classification report reveals that the precision, recall, and F1 scores for both true and false classes are identical.

Conversely, in the case of the radial basis function (RBF) kernel, augmenting the value of C enhances the efficacy of the model. As the hyperparameter C undergoes a tenfold increase from 0.1 to 10, the model's training accuracy exhibits a notable improvement from 79% to 84%. Similarly, the testing accuracy also experiences a discernible enhancement, rising from 79.8% to 82.3%. Furthermore, the classification report exhibits an enhancement in precision, recall, and F1 scores for both veritable and counterfeit categories.

The empirical findings suggest that augmenting the value of C for the polynomial kernel does not yield any discernible enhancement in the model's performance. The results of the experiment indicate that the training accuracy hovers at approximately 77%, while the testing accuracy ranges from 75.3% to 77.1% across all C values. The report on classification reveals that the false class exhibits low precision, recall, and F1 scores, while the true class displays high precision, recall, and F1 scores.

To summarize, our findings indicate that the radial basis function (RBF) kernel outperforms both the linear and polynomial kernels. Furthermore, we observed that augmenting the value of C enhances the efficacy of the RBF kernel. Notwithstanding, the efficacy of the polynomial kernel exhibits no enhancement with the escalation of C.


Conclusion

The application of Support Vector Machines (SVMs) can prove to be a valuable asset in the examination of crime datasets. Through the utilization of a Support Vector Machine (SVM) model trained on a meticulously labeled dataset of criminal activity, discernible patterns can be identified and subsequently utilized to make informed predictions regarding prospective criminal behavior. The model possesses the potential to categorize individuals into high or low risk groups, taking into account their demographic and behavioral attributes. The utilization of this data by law enforcement agencies can facilitate the efficient allocation of resources and proactively deter criminal activities. It is imperative to exercise caution in order to ensure that the model is devoid of any bias and that the input data is a true reflection of the population under scrutiny.  

On slightly technical side of things, even though the optimal out all the different model yielded only 84% accuracy it is still decent considering the that the data on which trained is only a fraction/sample or the original dataset size which is enormous. A simple rule of thumb in machine learning is that "More Data equals More Performance". Given the computational abilities to fit model on such huge data, SVM's could in fact prove to be efficient in categorizing/predicting incarcerations provided the characteristics of a Crime. 


Miscellaneous - Decision Boundary

Decision boundaries are a key concept in SVMs (Support Vector Machines). In simple terms, a decision boundary is a boundary that separates the data points into different classes based on their features. In SVMs, the goal is to find a decision boundary that maximally separates the data points of different classes while minimizing the margin of error or misclassification.

To plot a decision boundary, in theory, we need two feature columns representing a particular class/target. Since the Chicago Crime dataset is unsuitable for plotting a decision boundary.  I created a simulated dataset to show the distinction between the kernels and their decision boundary.

The plot on the right showcases SVM's with four different kernels, out of which only two fit the data correctly creating a clean decision boundary.

Decision boundaries for four different Kernels

Source Code