Understanding Crime in Chicago

A Spatial, Temporal and Scio-economic study in congregation of Historic Crime Records to better understand the patterns involved.

Navigate to any of the following pages to learn more

Data Collection

Data Exploration

 Access subpages

Model Sandbox

 Access subpages

Conclusion & Results

Introduction

David Von Diemar

It is common knowledge that Chicago possesses one of the highest crime rates in the country, especially when it comes to rates of serious crimes like murder and assault. Since many years ago, the high incidence of crime in Chicago has been a big issue for both citizens and authorities in the city. In fact, Chicago has consistently been among the cities in the United States with the highest rate of crime. It is essential to take into account a wide range of criteria, such as demographic, socioeconomic, and environmental information, in order to get an understanding of the reasons that lead to crime in Chicago.

It is possible to get significant insights into the patterns and trends of criminal activity in Chicago by analyzing demographic data such as age, race, and gender. In addition, socioeconomic issues, such as poverty and unemployment, are able to have a substantial effect on the rate of criminal activity that occurs in certain regions. Environmental variables, such as the built environment, may also have an effect on Chicago's crime å since they influence access to resources, possibilities for criminal behavior, and the cohesiveness of the community. Researchers and policymakers may obtain a better understanding of the factors that contribute to crime in Chicago as well as build more effective methods to deal with this significant problem if they analyze these data points as well as other relevant data points.

In the current age of data and digitalization, every data point/record is stored real-time in an online repository. The City of Chicago has a data portal of crime records from the year 2001 to the present day. Naturally with the availability of this colossal amount of data, curiosity driven questions on the topic emerge.

With the colossal amounts of Data that we have at our disposal,  machine learning techniques can be leveraged to gain insights into patterns and trends of criminal activity, and to develop effective strategies for crime prevention and law enforcement. Machine learning algorithms can be trained on historical crime data to identify patterns and correlations that may not be immediately apparent to human analysts. For example, these models can be used to identify areas of the city that are particularly prone to certain types of crime, or to identify demographic or socioeconomic factors that are associated with higher rates of criminal activity.

Machine learning algorithms can also be used to develop predictive models that can forecast the likelihood of criminal activity in a particular area at a particular time. These models can take into account a wide range of variables, including weather, local events, and past criminal activity. By using these models, law enforcement agencies can allocate resources more effectively and prevent crime before it occurs. Furthermore, machine learning can also be used to monitor social media and other online platforms for signs of potential criminal activity, allowing law enforcement to intervene before a crime is committed. However, it is important to note that the use of machine learning in crime prevention and law enforcement should be done ethically and with transparency, taking into account privacy concerns and potential biases in the data.

Motivation

The motivation to work on crime prediction using socio-economic, temperature, and other datasets is to provide information to law enforcement agencies and communities, enabling them to take proactive measures to prevent crime. Crime prediction models use data and statistical analysis to identify areas and times where crime is most likely to occur based on factors such as socio-economic status, temperature, population density, and prior crime patterns. This information can help law enforcement allocate resources more effectively and target crime reduction efforts where they are needed most. Additionally, crime prediction can help community organizations and individuals make informed decisions about safety, allowing them to take steps to reduce their risk of becoming a victim of crime.

Also in a Data Scientist's Perspective, the amount of data on crime records is exciting for anyone who can look through the obscurity of data. With colossal data like this, curiosity to look for patterns in within the obscurity becomes the first priority.

Questions that are to be explored in the course of this Study - Answered


According the Exploratory Data Analysis it is evidently observed that the Category "Motor Vehicle Theft" has seen an abrupt increase in reporting over the years ( 2018-23 ) and certainly for the year 2022.  

In the Year 2022, the Communities areas namely Near North Side, IL, Austin, IL, Loop, IL, West Town, IL are the majorly affected areas. The primary and most frequently reported cases in these areas are THEFT and BATTERY. 

Interestingly, the reports of AUTO THEFTS enormously increased over the years as observed through the aggregation of the data by years and month levels. Another peculiar observation is that, in the most recent years the reports seem to tremendously increase as the end-of-the-year is approaching. 

This question on the data was interesting enough to begin with, but, due to the difficulty of accessing historic temperature records, this question was only partially explored with the single year's temperature data. The plot on the right is Temperature Vs Crimes reported for the year 2018 and January Month ( Winter Time )

Although it shows that there's a substantial peak when the temperature is 8 Degree Celcius. It is not enough evidence to conclude that it has an effect on the CRIMES. 

Upon stating this, there are already studies which explored this question more comprehensively and there are concluded evidences that temperature does in fact have and effect on crimes.

This was explored with the Spatial Analysis of the data by leveraging the coordinates and aggregating the cases reported by districts to them. The map on the right shows that there's an evident difference between the regions ( districts ) and the reported cases. The darker regions show high incidences of Crimes and vice versa. 

To practically explore this idea, an attempt was made to acquire the socio-economic data, but failed to get any recent census data that associates with the years in the dataset. Rather, there's an open-sourced data for the same but for the year 2012. So correlating the census statistics of 2018 with the current data is not rational. 

Even though the data has records of indicating the Incarceration, there's no future data regarding the people who are arrested. Because of this it was not possible to explore this idea. Though, having such data could have been substantial in knowing the psychology of people. This knowledge could have led to changes in the administration to incorporate ideas to prevent released inmates not to ommit any novel crimes. 

Though there are data repositories for Other Big cities that are similar in Demographic to Chicago such as San Francisco, New York, and also Denver. But due to the lack of time and prioritisation to the current problem at hand this comparative analytics was not possible. 

Although it would be ideal to perform such analysis, because it can reveal significant errors in the administrative structures in different cities. 

The Spatial point of view is explored in the Association Rule Mining  part of the project, where the data is segregated to two major communities areas that normally show high incidence of crimes. This agenda for this sort of application is to look for any patterns in the data which are differentiated by demographic. Interestingly, there was difference in both community areas on what crimes have greater association. 

On the temporal side of things, it was not possible to explore this because it was made out of scope for this project due to the time constraint.  

Yes, it is possible to essentially categorize a crime type with a handful features. The Neural Networks   Page demonstrates how to build an efficient model which does the mentioned job. The model built for the purpose of this project seems to have already attained more than decent accuracy on categorizing the crimes.

Related Work - Academic Papers

Safat, W. et.al.

Methodology: To forecast crime rates in Chicago and Los Angeles, the research used a variety of machine learning techniques, including Logistic Regression, SVM, Naive Bayes, KNN, Decision Tree, MLP, Random Forest, and XGBoost. In terms of prediction accuracy, each algorithm's performance was compared. The research classified crimes across time periods using LSTM and assessed its performance using RMSE, MAE, the number of epochs, and batch size. In order to provide a visual overview of crime trends and patterns in both cities, exploratory data analysis was also done.

Findings: Of the machine learning algorithms examined, the study's findings show that XGBoost had the best prediction accuracy (94% in Chicago and 88% in Los Angeles). The survey also discovered that the top five crimes in Chicago were theft, battery, criminal damage, drugs, and offense, while the top crimes in Los Angeles were various offenses, larceny-theft, assault, and narcotics. The ARIMA model was used to predict crime rates and high-crime density locations for the next five years. It indicated a little rise in crime in Chicago and a significant decrease in crime in Los Angeles.

Nitta, G. R et.al

Methdology: The Chicago Crime dataset was used by the authors of this study to develop a multi-class crime prediction model. The authors used a variety of analytics methods, including LASSO feature selection, Naive Bayes classification, and ARIMA model, to analyze the criminal occurrences. The e1071 package, a freely accessible multi-class classification library, was used by the authors to train the Naive Bayes classifier. To find the ideal value, the Naive Bayes classifier's parameters were adjusted. The authors have created a R programming-based ARIMA-based prediction model to assess the criminal event density.

Findings: Accuracy, precision, recall, and the area under the curve were used to gauge how well the prediction model performed (AUC). The results demonstrated that, in comparison to alternative approaches like coordinate connection or probabilistic methods, the multi-class Naive Bayes classifier was a more suited system for anticipating criminal occurrences. In comparison to previously published techniques, the authors discovered that the proposed model displayed more capabilities. The authors came to the conclusion that their methodology may aid law enforcement organizations in enhancing their crime analytics and better safeguarding their communities.


Shamsuddin, N. H. M. et.al.

Methodology: To analyze crime data, the authors of the research conducted a thorough evaluation of crime prediction techniques. They concentrated on the use of specific techniques including support vector machines, fuzzy approaches, and artificial neural networks. They also contrasted how well these techniques performed in other crime data sets.

Findings: According to the paper's findings, crime analysis is a sensitive and difficult endeavor that calls for precise prediction and categorization techniques. The writers are aware of the difficulty confronting law enforcement in precisely and quickly processing the volume of criminal data that is growing. As a consequence of their analysis of different crime prediction techniques, they come to the conclusion that crime prediction methods may be helpful in this endeavor. The authors also suggest extending this research in the future in order to improve crime prediction techniques and get around the shortcomings of existing ones in order to provide more accurate outcomes and perform better.

CRISP - DM

The Cross Industry Standard Process for Data Mining (CRISP-DM) is a widely-used data mining process model that provides a structured approach to planning and implementing a data mining project.

CRISP-DM consists of six phases:

Sytematic Approach of the Project

4. Modeling: In this phase, the data is processed to discover patterns and relationships. A variety of algorithms and techniques, such as regression, clustering, and association rule mining, can be used. The best model is selected based on performance evaluation.

5. Evaluation: In this phase, the results of the modeling phase are evaluated and compared against the goals and objectives established in the business understanding phase. The results are validated and tested to ensure their accuracy and reliability.

6. Deployment: In this phase, the results of the data mining project are deployed and integrated into the business processes. The model is monitored to ensure it continues to deliver accurate results and can be updated as necessary.

CRISP-DM provides a structured and systematic approach to data mining projects and helps to ensure that the project stays focused on the business objectives and that the results are accurate and reliable.