The solution can be found here.
For this tutorial session, we will analyze three (linear regression) problems from top to bottom.
For this problem, we will analyse data about the mileage per gallon performances of various cars. The data set was retrieved from this page (with changes). You can download the .csv
file here.
col.names <- c('mpg', 'cylinders', 'displacement', 'hp', 'weight', 'acceleration', 'year', 'origin')
car <- read.csv(file = 'others/car.csv', header = FALSE, sep = ',', col.names = col.names)
head(car, 5)
## mpg cylinders displacement hp weight acceleration year origin
## 1 18 8 307 130 3504 12.0 70 1
## 2 16 8 304 150 3433 12.0 70 1
## 3 17 8 302 140 3449 10.5 70 1
## 4 NA 8 350 165 4142 11.5 70 1
## 5 NA 8 351 153 4034 11.0 70 1
Explore the data set, fit an appropriate linear model, check the model assumptions, and plot the results. At the end, make predictions for unknown values.
For this problem, we will analyse data collected in an observational study in a semiconductor manufacturing plant. Data were retrieved from the Applied Statistics and Probability for Engineers book. You can download the .csv
file here. In this plant, the finished semiconductor is wire-bonded to a frame. The variables reported are pull strength (a measure of the amount of force required to break the bond), the wire length, and the height of the die.
col.names <- c('pull_strength', 'wire_length', 'height')
wire <- read.csv(file = 'others/wire_bond.csv', header = FALSE, sep = ',', col.names = col.names)
head(wire, 5)
## pull_strength wire_length height
## 1 9.95 2 50
## 2 24.45 8 110
## 3 31.75 11 120
## 4 35.00 10 550
## 5 25.02 8 295
Explore the data set, fit an appropriate linear model for the data, check the model assumptions, and plot the fitted plan. At the end, make predictions for unknown values.
For this problem, we will analyse a data set with 6 variable (1 response variable + 6 covariates). Although their meaning may not be stated, we will see how important feature selection is when performing multiple regression analysis. You can download the .csv
file here.
col.names <- c('var1', 'var2', 'var3', 'var4', 'var5', 'var6', 'response')
data <- read.csv(file = 'others/data.csv', header = FALSE, sep = ',', col.names = col.names)
head(data, 5)
## var1 var2 var3 var4 var5 var6 response
## 1 68.10730 95.83754 49.66851 0.015061421 2.090953 64.83720 218.5916
## 2 78.18420 97.69040 54.51643 0.042649961 4.320810 74.54103 245.8415
## 3 54.24527 105.20130 49.59829 0.005194938 4.948731 78.74680 264.0839
## 4 54.56271 97.41171 47.21550 0.021132252 5.127075 74.95861 251.3954
## 5 56.75478 95.57443 44.05604 0.027485738 1.801114 63.39468 214.6450
Explore the data set, fit an appropriate (and reduced, based on any feature selection procedure) linear model for the data, check the model assumptions, and plot the results. At the end, make predictions for unknown values.