Simple linear regression

 Simple linear regression:-

There is only one continuous independent variable x and the assumed relation between the independent variable and the dependent variable y is

y = a + bx.

Simple linear regression is an approach for predicting a response using a single feature. It is assumed that the two variables are linearly related. Hence, we try to find a linear function that predicts the response value(y) as accurately as possible as a function of the feature or independent variable(x).

Let us start to experiment with Simple Linear Regression:

Download the dataset:

Code:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

 

## load the dataset

dataset = pd.read_csv("C:\\Users\\SR Laptop\\Desktop\\Linear regression\\Salary_dataset.csv")

X =dataset.iloc[:, 0:1].values

y = dataset.iloc[:, -1].values

 

##Spliting the datasets into training and testing test

from sklearn.model_selection import train_test_split

 

X_train, X_test, y_train, y_test= train_test_split(X,y, test_size=0.7, random_state=0)

print("features YOE:  ", X_train)

print("         ")

 

print("Label Salary:  ", y_train)

print("         ")

 

print("features YOE:  ", X_test)

print("         ")     

 

print("Label Salary:  ", y_test)

print("         ")     

 

#train the simple linear regression model on the training sets

 

from sklearn.linear_model import LinearRegression

 

regressor= LinearRegression()

regressor.fit(X_train, y_train)

 

print("features YOE:  ", X_train)

print("         ")

 

print("Label Salary:  ", y_train)

print("         ")

 

print("features YOE:  ", X_test)

print("         ")

 

print("Label Salary:  ",  y_test)

print("         ")

 

##predict the test sets results

y_pred = regressor.predict(X_test)

 

 

##visualizing the training set results

plt.scatter(X_train,y_train,color='r')

 

plt.plot(X_train, regressor.predict (X_train), color='b')

 

plt.title('salary vs experince(training set)')        

plt.xlabel('years of exprence')

plt.ylabel('salary') 

plt.show()          

 

##Visualizing the test set Results

plt.scatter(X_test,y_test,color='r')

 

plt.plot(X_train, regressor.predict (X_train), color='b')

 

plt.title('salary vs experince(testing set)')       

plt.xlabel('years of exprence')

plt.ylabel('salary') 

plt.show()  

 

##visulization with predicted values

plt.scatter(X_test,y_test,color='r')

 

plt.plot(X_test, y_pred, color='b')

plt.title('salary vs experince(predicted test values)')       

plt.xlabel('years of exprence')

plt.ylabel('salary') 

plt.show() 

 

##make a single prediction

single_prediction= regressor.predict([[12]])

print(single_prediction)

 

##print the model parameters

coefficient= regressor.coef_

print("coefficient is:", coefficient)

intercept= regressor.intercept_

print("intercept is:", intercept)

 

##manual calculation of salary prediction extra part

 

manual_prediction= 2835.78327444 + 33603.2285041225*15

print("manual prediction by using coeficient and intercept: ", manual_prediction)

 

#auto prediction

auto_prediction= regressor.predict([[15]])

print("Auto regression is: ", auto_prediction)


Output:

features YOE:   [[19]

 [ 9]

 [ 7]

 [25]

 [ 3]

 [ 0]

 [21]

 [15]

 [12]]

        

Label Salary:   [ 93941.  57190.  54446. 105583.  43526.  39344.  98274.  67939.  56958.]

        

features YOE:   [[ 2]

 [28]

 [13]

 [10]

 [26]

 [24]

 [27]

 [11]

 [17]

 [22]

 [ 5]

 [16]

 [ 8]

 [14]

 [23]

 [20]

 [ 1]

 [29]

 [ 6]

 [ 4]

 [18]]

        

Label Salary:   [ 37732. 122392.  57082.  63219. 116970. 109432. 112636.  55795.  83089.

 101303.  56643.  66030.  64446.  61112. 113813.  91739.  46206. 121873.

  60151.  39892.  81364.]

        

features YOE:   [[19]

 [ 9]

 [ 7]

 [25]

 [ 3]

 [ 0]

 [21]

 [15]

 [12]]

        

Label Salary:   [ 93941.  57190.  54446. 105583.  43526.  39344.  98274.  67939.  56958.]

        

features YOE:   [[ 2]

 [28]

 [13]

 [10]

 [26]

 [24]

 [27]

 [11]

 [17]

 [22]

 [ 5]

 [16]

 [ 8]

 [14]

 [23]

 [20]

 [ 1]

 [29]

 [ 6]

 [ 4]

 [18]]

        

Label Salary:   [ 37732. 122392.  57082.  63219. 116970. 109432. 112636.  55795.  83089.

 101303.  56643.  66030.  64446.  61112. 113813.  91739.  46206. 121873.

  60151.  39892.  81364.]

        





 

 

[67632.62779741]

coefficient is: [2835.78327444]

intercept is: 33603.2285041225

manual prediction by using coeficient and intercept:  506884.21083627746

Auto regression is:  [76139.97762073]

 


Post a Comment

0 Comments