Business Problem — Introduction
The purpose of this project is to help people in exploring better facilities around their neighborhood. It will help people in making smart and efficient in deciding great neighborhood out of number of areas for themselves in Scarborough, Toronto. Lots of people are migrating to various states of Canada for their betterment of their future. And for this they need lot of research for good and economical housing prices etc. and etc. This project is for these kind of people. Thus, it makes their access easier for Café, Schools, Super markets, medical shops, grocery shops, malls, theatres, hospitals etc. This project aims to create an analysis of features for people migrating to Scarborough to search best neighborhood as to other neighborhoods. These features include median housing price and better schools colleges according to ratings, crime rate in that particular area, road connectivity, weather conditions, electricity and water supply, and excrement conveyed in sewers and recreational facilities.
Problem needed to be solved
The sole purpose of this project is to suggest a better neighborhood in a new city for the people who are immigrating to an absolutely unknown place. Social presence in society in terms of like-minded people. Connectivity to airport, bus stand, markets and other daily needs nearby.
- Sorted list of house in terms of housing prices in ascending and descending order.
2. Sorted list of schools in terms of location, fees, rating and reviews
Scarborough is a popular destination for new immigrants in Canada to reside. As a result, it is one of the most diverse and multicultural area in Toronto being home to various religious groups and places of worship. Although immigration has become a hot topic over the past few years with more governments seeking more restrictions on immigrants and refugees, the general trend of immigration to Canada has been one on rise.
We will need data about different venues in different neighborhoods of that specific borough. In order to gain that information we will use “Foursquare” locational information. Foursquare is a location data provider with information about all venues and events within an area of interest. Such information includes venue names, locations, menus and even photos, ratings and tips. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.
After finding the list of neighborhoods, we will then connect it to the Foursquare API to gather information about venues inside each and every neighborhood. For each neighborhood, we have chosen the radius to be 100 meters.
The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the postcodes. The information obtained per venue as follows:
- Neighborhood Latitude
- Neighborhood Longitude
- Name of the venue e.g. the name of a store or restaurant
- Venue Latitude
- Venue Longitude
- Venue Category
To compare the similarities of two cities, we decided to explore neighborhoods, segment them, and group them into clusters to find similar neighborhoods in a big city like New York and Toronto. To be able to do that, we need to cluster data which is a form of unsupervised machine learning: k-means clustering algorithm.
Libraries which need to be imported:-
Pandas: For creating and manipulating Data Frames.
Folium: Python visualization library which is used to visualize the neighborhoods cluster distribution of using interactive leaflet map.
Scikit Learn: For importing k-means clustering.
JSON: Library to handle JSON files.
XML: To separate data from presentation and XML stores data in plain text format.
Geocoder: To retrieve Location Data.
Beautiful Soup and Requests: To scrap and library to handle http requests.
Matplotlib: Python Plotting Module.
Scarborough dataset which was scrapped from Wikipedia will be used to get to know about the various neighborhoods which are to be taken into consideration for the project.
Exploratory Data Analysis (EDA)
Cleaning the data i.e. deleting the rows having missing values
Geospatial data to get better understanding of the neighborhoods in it and their corresponding locations in Folium Map would make things more clear which will be achieved by plotting CHLOROPLETH MAPS.
Map of Toronto
Map of Scarborough
Average Housing Price by Clusters in Scarborough
In this project, using k-means cluster algorithm I separated the neighborhood into 10(Ten) different clusters and for 103 different latitude and longitude from dataset, which have very-similar neighborhoods around them. Using the charts visualized presented to a particular neighborhood based on average house prices and school rating have been made.
Trail is the most common venue in Scarborough.
In Scarborough, The Beaches neighborhood attracts the most of the immigrants.