Decision Trees: The unfounded strength of recursive decision rules

Mohammed Arebi published on 2021-02-10 included in Machine Learning Tutorial

Article on decision trees and how they work with code examples and implementation

Confounding Variables in Regression Analysis

Mohammed Arebi published on 2020-10-30 included in Data Analysis Machine Learning Tutorial

This articles provides a detailed explanation with examples to the concept of confounding

/posts/New-York-Taxi-Analysis/new-york.jpeg

Digging Deep Into the New York Taxi Dataset

Mohammed Arebi published on 2020-10-16 included in Data Analysis Machine Learning Big Data Interactive Data Visualisation

An open-source exploration of the city's taxi life, boroughs, neighborhoods, and more via the prism of publicly available taxi data

/posts/Upsampling Applications/oversampling.png

Applying Over-Sampling Methods to Highly Imbalanced Data

Mohammed Arebi published on 2020-08-14 included in Data Preprocessing Machine Learning Tutorial

I mentioned various undersampling approaches for dealing with highly imbalanced data in the earlier post “Applying Under-Sampling Methods to Highly Imbalanced Data” In this article, I present oversampling strategies for dealing with the same problem. By reproducing minority class examples, oversampling raises the weight of the minority class. Although it does not provide information, it introduces the issue of over-fitting, which causes the model to be overly specific. It is possible that while the accuracy for the training set is great, the performance for unseen datasets is poor.

/posts/Undersampling Applications/undersample.png

Applying Under-Sampling Methods to Highly Imbalanced Data

Mohammed Arebi published on 2020-08-12 included in Data Preprocessing Tutorial Machine Learning

Class imbalance can lead to a significant bias toward the dominant class, lowering classification performance and increasing the frequency of false negatives. How can we solve the problem? The most popular strategies include data resampling, which involves either undersampling the majority of the class, oversampling the minority class, or a combination of the two. As a consequence, classification performance will improve. In this article, I will describe what unbalanced data is, why Receiver Operating Characteristic Curve (ROC) fails to measure accurately, and how to address the problem.

Logistic Regression: With Application and Analysis on the 'Rain in Australia' Dataset

Mohammed Arebi published on 2020-07-09 included in Tutorial Machine Learning

Introduction The logistic model (or logit model) in statistics is a statistical model that represents the probability of an event occurring by making the log-odds for the event a linear combination of one or more independent variables. Logistic regression is another approach borrowed from statistics by machine learning. It is the go-to strategy for binary classification problems (problems with two classes) even though it is a regression algorithm (predicts probabilities, more on that later).

Mohammed Arebi

Experienced data scientist. I blog about machine learning and my journey. Actively Building. Always Learning