Data Cleaning

The act of discovering, correcting, or deleting mistakes, inconsistencies, and inaccuracies in a dataset is referred to as "data cleaning," "data cleansing," or "data scrubbing," and it goes by a few other names as well. Because the quality of the analysis is dependent on the quality of the data that is utilized, this phase in the process of analyzing the data is a critical step.

Due to the fact that I was working on many datasets at the same time for this project, I decided to just partly clean the datasets and then store them as Version 1s. This discerning tidying up is for the sake of the very first round of data exploration via visualization. As the project moves forward, I will describe the many adjustments that have been done and update the website with the subsequent phases of data cleansing. Because of this, I will be able to monitor the progress of the data cleaning process and make certain that the information you use for your research is of the greatest possible quality. The ultimate goal of data cleaning is to prepare a high-quality, usable dataset for analysis and make informed decisions based on accurate data.

Historical Crime Data

Snippet of the Original Data Set

Demographic / Socio-economic Data

Temperature Data ( 2018 January )

Environmental Data

Cleaning: Insight Development

At this step, I just explored the data sets for missing values using visual methods like HeatMapping.

Historic Crime Records

Demographic Data

Environmental Data

As the secondary stage, to minimise the complexity of working with multiple datasets, the crime dataset and demographic datasets are linked together using the Community Area Mapping. This created new data variable in the crime dataset called the Community Area Name which is then used for visualisation. Similiarly, districts are mapped with appropriate District Names. The Code for the above "Insight Development" and the "Relational Linkage" can be found through the following link.

Source Code