
Spatial clustering of wine quality to identify anomalies
What is DBSCAN?
DBSCAN is abbreviation for Density-Based Spatial Clustering of Application with Noise algorithm. It is a method of clustering by separate high-density points from low-density points. As an outcome, the algorithm finds the noise points (outliers) from a set of data points. It sounds complicated however it is simple and easy to apply.DBSCAN is an example of unsupervised learning (a branch of machine learning and hence a subset of artificial intelligence)and part of the density-based algorithm[DD1] . Before proceeding further, we need to understand what the unsupervised learning method is.Unsupervised (machine) learning algorithms infer patterns from a dataset without reference to known, or labelled, outcomes. The term ‘density-based algorithm’ refers that we are going to arrange data based on how dense the location of data points.What follows is a technical dive into the approach taken.How to implement DBSCAN
The two main parameters of DBSCAN algorithm are ε (epsilon) and minPoints, which are defined as:Epsilon:
ε = Radius of the neighbourhood regionTwo points are considered neighbours if the distance between the two points is below the threshold epsilon. The epsilon is calculated based on the Euclidian distance between points. To understand more explicitly assume the below example, where we have two points X and Y in a 2 two-dimensional axis then we can calculate its distance as,

minPoint:
minPoint = Minimum number of points that must present within the neighbourhoodWe can adjust minPoint based on our convenience, for example if we need at least 10 points to be present in a core point then we will keep it as 10 and so on.Based on ε and minPoint, we get three different outputs which are two clusters and an outlier. The figure below illustrates the scenario in a clear manner.Core point = A data point is said to be a core point if it at least has ‘minPoint’. For example, assume our minPoint is five and if we get a datapoint with three, then we can’t classify it as a core point as it doesn’t satisfy the requirement.Border point = A data point is said to be a border point if it has less than ‘minPoint’ and contains one of the core points. For example, assume our minPoint as five and if any of data point has 3 with one them as core point that is reachable with a distance of ε.Noise point = Noise point can be termed as outliers which is the goal of finding through DBSCAN algorithm. A data point is said to be noise point if its neither a core nor a border point, these can be assumed as an extreme value, unexpected occurrence or different behaviour than a regular event.
Implementation of DBSCAN in Python
We can implement DBSCAN algorithm in python with sci-kit learning, which is really a simple procedure.Step 1: Import necessary libraries required

Step 2: Read the dataset

Step 3: Basic idea of data

Step 4: Define the model

Step 5: Check the count in each cluster

Step 6: Visualize the outliers

Advantages of DBSCAN
It can identify outliers from any shape of data existence, where a normal k-means and k-median can identify clusters only when the data resembles a circle.Can identify clusters whatever the shape of data and simple to implement.Disadvantage of DBSCAN
Sensitive to ε and minPoint as the outliers point changes for every value combination.If our data exist with varying densities, then it would be tricky to identify clusters and noise point.DBSCAN is good at separating high-density clusters from low-density clusters but struggles with similar density.DBSCAN suffers from high dimensionality of data. Hence, we need to do the additional task of feature selection before passing to DBSCAN.Other potential applications of DBSCAN
Anomaly detection in temperature, sales and X-ray image cells.Clustering of data.Identification of abnormal behaviour in stock market.Referenceshttps://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.htmlAre you interested in knowing more about it?Let’s talk, we can help you!Contact | Lucid Insights
Check out the Lucid Insights blogThere is a variety of content that may help you to improve your business!


