Full width home advertisement

Post Page Advertisement [Top]

Logistic Regression


Logistic Regression (LR) is a supervised machine learning classification algorithm. It predicts the probability associated with each dependent variable. It is the simplest machine learning algorithm that can be used for classification problems. It converts the output using the probability function to return a probability value.

1

Sigmoid Curve is also known as S-Curve or Logistic Curve.

 

Probability Function:


The probability function is given as p = 1/1+e^-y -> Equation (1) 

Where, e = 2.7183 (Real number constant)

This probability function is also known as a sigmoid function or logistic function. This function converts any real value into a value between 0 and 1.

In order to predict the values between 0 and 1 we use probability function with linear regression.

As we know the equation of linear regression is Y= B0 + B1X + e  -> Equation (2)                       

Where, B0 = Intercept

             B1 = Coefficient

            e   = error

Solve equation (1) and (2) for LHS = Y we get,

Log (p/1-p) = B0 + B1X + e

 

Threshold Value:  

The default threshold value in logistic regression is 0.5.


3

If P(y) > 0.5 then we consider it as Y = 1

P(Y) <= 0.5 then we consider it as Y = 0

But the threshold value can be changed with respect to different domains.

 

Assumptions in Logistic Regression:

  • Only one outcome per event – (Pass or Fail, Yes or No)
  • The outcomes are statistically independent
  • All relevant predictors are in the model
  • One category at a time

 

Types of Logistic Regression:

Logistic Regression mainly used to classify binary output variables but there can be more than 2 categories of output variables. Based on different categories of output variables logistic regression is divided into 3 types.


1. Binary Logistic Regression:

In binary logistic regression, the target variable has two possible outcomes. For example, 0 or 1, fail or pass, ham or spam.

 

2. Multinomial Logistic Regression:

In multinomial logistic regression, the target variable has 3 or more categories without ordering. For example, (category1, category2, category3), (BMW, Mercedes, Audi).

 

3. Ordinal Logistic Regression:

In ordinal logistic regression, the target variable has 3 or more categories with ordering. For example, Ratings from 1 to 5, (High, medium, low).

 

Note: In Logistic Regression, sometimes lot of feature engineering is required, and it is sensitive to missing values and outliers.

 

Implementation in Python:

We are going to use Scikit Learn library for implementing logistic regression


# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Importing the dataset
dataset = pd.read_csv('data.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Model Building
# Fitting Logistic Regression to the Training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(X_test)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
#accuracy of test data
accuracy_test = np.mean(y_pred==y_test)
accuracy_test
# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'purple')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'purple'))(i), label = j)
plt.title('Logistic Regression (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
Output:
Figure_1
To download code - Click Here

No comments:

Post a Comment

Bottom Ad [Post Page]