Logistic Regression

Logistic Regression

Logistic Regression (LR) is a supervised machine learning classification algorithm. It predicts the probability associated with each dependent variable. It is the simplest machine learning algorithm that can be used for classification problems. It converts the output using the probability function to return a probability value.

Sigmoid Curve is also known as S-Curve or Logistic Curve.

Probability Function:

The probability function is given as p = 1/1+e^-y ->Equation (1)

Where, e = 2.7183 (Real number constant)

This probability function is also known as a sigmoid function or logistic function. This function converts any real value into a value between 0 and 1.

In order to predict the values between 0 and 1 we use probability function with linear regression.

As we know the equation of linear regression is Y= B₀ + B₁X + e -> Equation (2)

Where, B₀ = Intercept

B₁ = Coefficient

e = error

Solve equation (1) and (2) for LHS = Y we get,

Log (p/1-p) = B₀ + B₁X + e

Threshold Value:

The default threshold value in logistic regression is 0.5.

If P(y) > 0.5 then we consider it as Y = 1

P(Y) <= 0.5 then we consider it as Y = 0

But the threshold value can be changed with respect to different domains.

Assumptions in Logistic Regression:

Only one outcome per event – (Pass or Fail, Yes or No)
The outcomes are statistically independent
All relevant predictors are in the model
One category at a time

Types of Logistic Regression:

Logistic Regression mainly used to classify binary output variables but there can be more than 2 categories of output variables. Based on different categories of output variables logistic regression is divided into 3 types.

1. Binary Logistic Regression:

In binary logistic regression, the target variable has two possible outcomes. For example, 0 or 1, fail or pass, ham or spam.

2. Multinomial Logistic Regression:

In multinomial logistic regression, the target variable has 3 or more categories without ordering. For example, (category1, category2, category3), (BMW, Mercedes, Audi).

3. Ordinal Logistic Regression:

In ordinal logistic regression, the target variable has 3 or more categories with ordering. For example, Ratings from 1 to 5, (High, medium, low).

Note: In Logistic Regression, sometimes lot of feature engineering is required, and it is sensitive to missing values and outliers.

Implementation in Python:

We are going to use Scikit Learn library for implementing logistic regression

Output:

To download code - Click Here

Author Description