Neural Networks for Crime

The Dichotomy. 

In contrast to the other techniques that have been taken on this project, a dual strategy has been used, which makes use of the excellent performance of neural networks on two distinct kinds of targets. Neural Networks are used to categorize incarcerations (in a manner similar to those of other methods), and a Deep Neural Network is used in an effort to classify the top five types of criminal activity. 

The Application

Data Preparation for Incarceration

Similar to the Applications of other supervised techniques used in this project, the requirement of numerical data is mandatory in order to perform SVM in Python. pd.factorise is used to encode the categorical data into a uniquely represented numerical value. 

In contrast to the decision tree model, for SVM the features are again selectively chosen to yield better performance. The chosen features are listed below: 

Features - dayofweek, Community Area, Location Description, Primary Type, SocioEconomic-Status, Block

Target - Arrest

After selectively choosing the featured Data, considering the enormous class imbalance that is present intrinsically in the dataset, SMOTE undersampling is used. SMOTE (Synthetic Minority Over-sampling Technique) and undersampling are two commonly used techniques in machine learning for handling imbalanced datasets, where one class is significantly underrepresented compared to the other. SMOTE oversamples the minority class by creating synthetic examples, while undersampling reduces the majority class by randomly removing examples.

SMOTE undersampling is a combination of these two techniques, where an imbalanced dataset is first undersampled to reduce the majority class and then SMOTE is applied to oversample the minority class. This can help to balance the dataset while also ensuring that the synthetic examples generated by SMOTE are not overwhelmed by the majority class. Additionally, to for faster computation speeds the data is Scaled using a Standard Scaling procedure.

Data Preparation for Crime Type 


Similar to the Applications of other supervised techniques used in this project, the requirement of numerical data is mandatory in order to perform SVM in Python. pd.factorise is used to encode the categorical data into a uniquely represented numerical value. 

In contrast to the decision tree model, for SVM the features are again selectively chosen to yield better performance. The chosen features are listed below: 

Features - Arrest, Domestic, dayofweek, Community Area, Location Description, SocioEconomic-Status.

Target -  Primary Type,

Classes - ['BATTERY', 'CRIMINAL DAMAGE', 'NARCOTICS', 'ROBBERY', 'THEFT']

The classes are balanced using NEARMISS sampling. Similar to SMOTE, NEARMISS (Nearest Miss) is another technique commonly used in machine learning for handling imbalanced datasets. However, instead of oversampling the minority class, NEARMISS undersamples the majority class by selecting examples that are "closest" to the minority class. The intuition behind this approach is that the selected examples are more representative of the majority class and can help to balance the dataset without introducing synthetic examples.

NEARMISS (Nearest Miss) undersampling is a commonly used technique in machine learning for handling imbalanced datasets, where the majority class is significantly overrepresented compared to the minority class. The basic idea behind NEARMISS is to select a subset of examples from the majority class that are "closest" to the minority class, and remove the remaining examples from the majority class. This helps to balance the dataset and reduce the bias towards the majority class, which can improve the performance of the model. There are several variants of NEARMISS, each with a different approach for selecting examples from the majority class. However, it is important to note that NEARMISS may not work as well when the minority class is overlapping with the majority class or when there are multiple minority classes with different characteristics.

Constructing Neural Networks to Classify Incarceraton

Network - 1  

Architecture of Network 1 (generated by dotnets.py )

Network-1 Properties

Structure: 

7 Feature Units, 

→ 3 Units in the Hidden Layer, and 

→ 1 Output Unit

Activations: 

Rectified Linear Unit ( ReLu ) Activation was used for the Hidden Units. And Simoid for the Output Node for Binary Classification.  

Hyperparameters: 

Set to Train on 600 Epochs, with EarlyStopping Callback. The Training force stopped by early stopping mechanism at Epoch 42 

Optimizers & Loss Function

Adam Optimizer was used to minimize the Binary Crossentropy Loss Function

Parameters

 Total params: 25, Trainable params: 25

Non-trainable params: 0


Performance 

Final Few Epochs: 

Epochs Vs Loss

Classification Report

Network - 2 

Architecture of Network 2

Network-2 Properties

Structure: 

7 Feature Units, 

15 Units in the 1st Hidden Layer, 20 Units in the 2nd Hidden Layer

1 Output Unit

Activations: 

Rectified Linear Unit ( ReLu ) Activation was used for the Hidden Units. And Simoid for the Output Node for Binary Classification.  

Hyperparameters: 

Set to Train on 600 Epochs, with EarlyStopping Callback. The Training force stopped by early stopping mechanism at Epoch 280

Optimizers & Loss Function

Adam Optimizer was used to minimize the Binary Crossentropy Loss Function

Parameters

 Total params: 517, Trainable params: 517

Non-trainable params: 0


Performance 

Final Few Epochs: 

Epochs Vs Loss

Classification Report

Understanding the Output for Networks, Providing Inference 

Network-1 

The classification report indicates that the neural network achieved an overall accuracy of 35% on the test set, which is not very good. The report shows the precision, recall, and F1-score for each class, as well as the macro-averaged and weighted-averaged scores.

The precision score for the "False" class is 0.36, which means that out of all the instances that the model predicted as "False", only 36% of them are actually "False". The recall score for the "False" class is 0.52, which means that out of all the instances that are actually "False", the model correctly identified 52% of them. The F1-score for the "False" class is 0.42, which is the harmonic mean of precision and recall.

The precision score for the "True" class is 0.32, which means that out of all the instances that the model predicted as "True", only 32% of them are actually "True". The recall score for the "True" class is 0.20, which means that out of all the instances that are actually "True", the model correctly identified only 20% of them. The F1-score for the "True" class is 0.25.

The macro-averaged score is the average of precision, recall, and F1-score across both classes, while the weighted-averaged score takes into account the number of instances in each class. In this case, both macro-averaged and weighted-averaged scores are low, indicating poor performance overall.

The model has more data to learn from with 10,000 samples, which may lead to better results. The accuracy of the model may also be affected by other variables, such as the complexity of the issue and the hyperparameters that are used.

Training was terminated after just 42 of 600 possible iterations, indicating the model made little headway in lowering the loss. This might mean that the model's architecture has to be fine-tuned to better suit the data, or that the learning rate was too high or low.

The model is still not good at differentiating between the two classes, as seen by the poor accuracy and recall scores. This might be because of a number of factors, including the quality of the features or the intricacy of the decision boundary.

It's important to remember that the model was assessed using a limited sample size since the test set only comprises 30% of the data. Better estimates of the model's performance might be obtained by using a bigger test set or by doing cross-validation.

Although the larger data set may have helped the model perform better, it's possible that it could benefit from finer hyperparameter tuning and even some structural changes.

Network-2 

For binary classification, the neural network model with 7 input nodes, 15 hidden units in the first hidden layer, 20 hidden units in the second hidden layer, and 1 output node with a sigmoid activation function achieved an accuracy of 0.81 on the test set, with a precision of 0.76 and recall of 0.86 for the negative class (False), and a precision of 0.86 and recall of 0.77 for the positive class (True).

The model's architecture and the selected hyperparameters may explain why it only needed to be trained for 280 of a possible 600 epochs before it had learned the necessary patterns in the data.

Due to the increased complexity of the model with more hidden units in the two hidden layers, the model seems to be able to efficiently discriminate between the two classes, as seen by the high accuracy and recall scores for both classes.

A score of 0.81 on this binary classification problem is considered satisfactory, since it indicates that the model accurately predicts the class of the vast majority of samples in the test set.

It's important to remember that the model was assessed using a limited sample size since the test set only comprises 30% of the data. Better estimates of the model's performance might be obtained by using a bigger test set or by doing cross-validation.

With high precision and recall scores, this model appears to be able to accurately distinguish between the two classes, and its architecture, which features more hidden units in the two hidden layers, may have contributed to its success.


Constructing Neural Networks to Categorize Crime Type

Architecture of  Neural Network

Network Properties

Structure: 

7 Feature Units, 

16 Units in the 1st Hidden Layer, 25 Units in the 2nd Hidden Layer

5 Output Units

Activations: 

Rectified Linear Unit ( ReLu ) Activation was used for the Hidden Units. And Softmax for the Output Node for Multiclass Classification.  

Hyperparameters: 

Set to Train on 600 Epochs, with EarlyStopping Callback. The Training force stopped by early stopping mechanism at Epoch 280

Optimizers & Loss Function

Adam Optimizer was used to minimize the Binary Crossentropy Loss Function

Parameters

 Total params: 517, Trainable params: 517

Non-trainable params: 0


Performance 

Final Few Epochs: 

Epochs Vs Loss

Epochs Vs Accuracy

Classification Report

Understanding the Output for the Network, Providing Inference 

The classification report shows the evaluation metrics for a neural network model trained for multiclass classification with 5 output classes (BATTERY, CRIMINAL DAMAGE, NARCOTICS, ROBBERY, THEFT). The model architecture consists of 7 input nodes, 16 hidden units in the first hidden layer, 25 hidden units in the second hidden layer, and 5 output nodes with softmax activation.

The precision, recall, and F1-score for each class show how well the model is performing for that specific class. The weighted average F1-score is a measure of the overall effectiveness of the model. The accuracy shows the proportion of correctly classified samples out of the total number of samples.

The high precision, recall, and F1-scores indicate that the model is performing well for all the classes. The high accuracy of 0.97 indicates that the model is able to classify most of the samples correctly. The early stopping mechanism stopped the training at the 10th epoch, indicating that the model had already learned the patterns in the data.

The large sample size of 149772 ensures that the model has enough data to learn from and can generalize well to new data. The use of softmax activation in the output layer ensures that the predicted class probabilities sum up to one, making it easy to interpret the output as probabilities. Overall, the model seems to have learned the patterns in the data well and is performing effectively for the task of multiclass classification. 

Neural Network Sandbox - Be a Cop! 

The model developed in this section is used to categorize crimes in the backend. This is only an overview of the seamless integration that characterizes the industry's most sophisticated applications. 


Conclusion

From the data and analysis of the three models, we can infer that a neural network's performance is very sensitive to its design and hyperparameters. The first two models' underperformance on the test data was likely due to their simplistic designs and small number of hidden units. The third model, on the other hand, was able to learn more intricate patterns from the data and perform better on the test set because of its more complicated architecture with more hidden units and layers.

However, the success of a neural network depends on more than just its structure and the values of its hyperparameters. Important factors in influencing a neural network's ultimate performance include the quality and quantity of training data, the preprocessing processes used, and the optimization technique employed during training. Therefore, when designing and training neural networks for specific tasks, it is crucial to take into account all of these factors.

Source Code