Logistic Regression
Logistic Regression (LR) is a supervised
machine learning classification algorithm. It predicts the probability
associated with each dependent variable. It is the simplest machine learning
algorithm that can be used for classification problems. It converts the
output using the probability function to return a probability value.
Sigmoid
Curve is also known as S-Curve or Logistic Curve.
Probability Function:
The probability function is given
as p = 1/1+e^-y -> Equation (1)
Where, e = 2.7183 (Real number
constant)
This probability function is also known
as a sigmoid function or logistic function. This
function converts any real value into a value between 0 and 1.
In order to predict the values between
0 and 1 we use probability function with linear regression.
As we know the equation of linear
regression is Y= B0 + B1X + e -> Equation
(2)
Where, B0 = Intercept
B1 =
Coefficient
e =
error
Solve equation (1) and (2) for LHS = Y
we get,
Log (p/1-p) = B0 +
B1X + e
Threshold Value:
The default threshold
value in logistic regression is 0.5.
If P(y)
> 0.5 then we consider it as Y = 1
P(Y) <=
0.5 then we consider it as Y = 0
But the threshold value can be changed with respect to different domains.
Assumptions in Logistic Regression:
- Only one outcome per event – (Pass or Fail, Yes or No)
- The outcomes are statistically independent
- All relevant predictors are in the model
- One category at a time
Types of Logistic Regression:
Logistic
Regression mainly used to classify binary output variables but there can be
more than 2 categories of output variables. Based on different categories of
output variables logistic regression is divided into 3 types.
1.
Binary Logistic Regression:
In binary logistic
regression, the target variable has two possible outcomes. For example, 0 or 1,
fail or pass, ham or spam.
2.
Multinomial Logistic Regression:
In multinomial logistic
regression, the target variable has 3 or more categories without ordering. For
example, (category1, category2, category3), (BMW, Mercedes, Audi).
3.
Ordinal Logistic Regression:
In ordinal logistic
regression, the target variable has 3 or more categories with ordering. For
example, Ratings from 1 to 5, (High, medium, low).
Note: In
Logistic Regression, sometimes lot of feature engineering is required, and it
is sensitive to missing values and outliers.
Implementation in Python:
No comments:
Post a Comment