The 2021 data by Federal Trade Commission  showed that consumers lost more than $5.8 billion to fraud, a spike increase of more than 70 per cent over 2020. It is mainly reported as imposter scams, followed by online shopping scams.
The rise of FinTech and cryptocurrency has evolved even further in 2022, which also imposes an increase in fraudulent activities in the financial services industry. As much as fraud activities could increase, the advancement of Artificial Intelligent (AI) and technology has also accelerated, providing the methodologies to detect and prevent fraud with machine learning.
There are many approaches to using machine learning and AI in fraud detection, depending on the datasets and use cases. For this study case, we will refer to the credit card fraud detection dataset from Kaggle .
The open-source dataset contains 284,807 transactions, with 492 transactions classified as fraudulent activities. Training models to detect fraud will be very challenging for this dataset, given it is highly unbalanced.
Fig. 1. Credit card fraud detection dataset from Kaggle
First and foremost, we will conduct some exploratory data analysis to investigate the dataset, such as the distribution (like mean, standard deviation, etc.), and check whether there are any duplicate values to be removed or any missing values to be replaced. Once we identify the data patterns and distributions, we notice the highly unbalanced dataset. As the unbalanced data would pose an issue when training the classifier, understanding the metric between Precision and Recall is critical to obtaining an ideal model that benefits the business.
Regarding the two metrics, Recall is the amount of correctly classified fraud activity divided by the total amount of samples predicted as fraud. In comparison, Precision is the amount of correctly classified fraud activity divided by the total number of samples predicted as fraud (refer to Fig. 2).
In this case,
Fig. 2. Total number of samples predicted as fraud
For a dataset with highly unbalanced data, a model with a high recall metric is ideal to obtain, meaning correct classifications of fraudulent activities. Implementing machine learning algorithms once again all depends on the dataset and the purpose of doing the analysis. We could say the Linear Regression model would yield a high recall metric for this dataset, but it does not apply to other datasets. Specifically for fraud detection, refer to the list of recommended machine learning models that could bring the best outcomes for fraud detection.
In conclusion, the solutions from fraud detection could help the business quickly identify any suspicious activity patterns using its historical data, preventing the company from any known and unknown fraud attacks. The outcomes from machine learning and AI could be best visualised on the dashboard showing real-time analytics and other vital elements that maximise the business operations. At Lucid Insights, we provide professional services, including Advanced Analytics, Data Science, Power BI dashboards and many more.
References Federal Trade Commission. (2022). New Data Shows FTC Received 2.8 Million Fraud Reports from Consumers in 2021. [online] Available at: https://www.ftc.gov/news-events/news/press-releases/2022/02/new-data-shows-ftc-received-28-million-fraud-reports-consumers-2021-0.  Machine Learning Group – ULB. (2018). Credit Card Fraud Detection. [online] Available at: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud.  Chandru, S. (2019). Which Machine Learning Algorithm to Use for Fraud Detection? [online] Available at: https://www.saksoft.com/blog/which-machine-learning-algorithm-to-use-for-fraud-detection/.
Check out the Lucid Insights blogThere is a variety of content that may help you to improve your business!