Multiple Linear Regression
In the previous example, we have trained and evaluated a model to predict the price of a pizza. While you are eager to demonstrate the pizza-price predictor to your friends and co-workers, you are concerned by the model’s imperfect r-squared score and the embarrassment its predictions could cause you.
How can we improve the model? Recalling your personal pizza-eating experience, you might have some intuitions about the other attributes of a pizza that are related to its price. For instance, the price often depends on the number of toppings on the pizza. Fortunately, your pizza journal describes toppings in detail; let’s add the number of toppings to our training data as a second explanatory variable. We cannot proceed with simple linearregression, but we can use a generalization of simple linear regression that can use multiple explanatory variables called multiple linear regression. Formally, multiple linear regression is the following model:
\[y = \alpha + \beta_1 x_1 + \beta_2x _2 + \beta_3x_3 ......\]this edit makes no sense. change to “Where simple linear regression uses a single explanatory variable with a single coefficient, multiple linear regression uses a coefficient for each of an arbitrary number of explanatory variables.
\[y = X \beta\]For simple linear regression, this is equivalent to the following:
\[\begin{bmatrix}y_{1}\\y_{2}\\...\\y_{4}\end{bmatrix} = \begin{bmatrix} \alpha + \beta X_{1}\\\alpha +\beta X_{2}\\...\\\alpha + \beta X_{3}\end{bmatrix} = \begin{bmatrix}1 + X_{1}\\1 + X_{1}\\...\\1 + X_{1}\end{bmatrix} × \begin{bmatrix} \alpha \\ \beta \end{bmatrix}\]Y is a column vector of the values of the response variables for the training examples. \(\beta\) is a column vector of the values of the model’s parameters. X, called the design matrix, is an m× n dimensional matrix of the values of the explanatory variables for the training examples. m is the number of training examples and n is the number of explanatory variables.Let’s update our pizza training data to include the number of toppings with the following values:
Training Example | Diameter | Number of toppings | Price (in dollars) |
---|---|---|---|
1 | 6 | 2 | 7 |
2 | 8 | 1 | 9 |
3 | 10 | 0 | 13 |
4 | 14 | 2 | 17.5 |
5 | 18 | 0 | 18 |
We must also update our test data to include the second explanatory variable, as follows:
Training Example | Diameter | Number of toppings | Price (in dollars) |
---|---|---|---|
1 | 8 | 2 | 11 |
2 | 9 | 0 | 8.5 |
3 | 11 | 2 | 15 |
4 | 16 | 2 | 18 |
5 | 12 | 0 | 11 |
Our learning algorithm must estimate the values of three parameters: the coefficients for the two features and the intercept term. While one might be tempted to solve \(\beta\) by dividing each side of the equation by X , division by a matrix is impossible. Just as dividing a number by an integer is equivalent to multiplying by the inverse of the same integer, we can multiply \(\beta\) by the inverse of X to avoid matrix division. Matrix inversion is denoted with a superscript -1. Only square matrices can be inverted. X is not likely to be a square; the number of training instances will have to be equal to the number of features for it to be so. We will multiply X by its transpose to yield a square matrix that can be inverted. Denoted with a superscript T , the transpose of a matrix is formed by turning the rows of the matrix into columns and vice versa, as follows:
\[\begin{bmatrix}1&2&3\\4&5&6\end{bmatrix}^T = \begin{bmatrix}1&4\\2&5\\3&6\end{bmatrix}\]We know the values of Y and X from our training data. We must find the values of \(\beta\) , which minimize the cost function. We can solve \(\beta\) as follows:
\[\beta = ( X^T X)^1 X^TY\]Lets solve it using python:
from sklearn.linear_model import LinearRegression
X = [[6, 2], [8, 1], [10, 0], [14, 2], [18, 0]]
y = [[7], [9], [13], [17.5], [18]]
model = LinearRegression()
model.fit(X, y)
X_test = [[8, 2], [9, 0], [11, 2], [16, 2], [12, 0]]
y_test = [[11], [8.5], [15], [18], [11]]
predictions = model.predict(X_test)
for i, prediction in enumerate(predictions):
print(Predicted: %s, Target: %s' % (prediction, y_test[i]))
print('R-squared: %.2f' % model.score(X_test, y_test))
Predicted: [ 10.0625], Target: [11]
Predicted: [ 10.28125], Target: [8.5]
Predicted: [ 13.09375], Target: [15]
Predicted: [ 18.14583333], Target: [18]
Predicted: [ 13.3125], Target: [11]
R-squared: 0.77