Consider the data set on a steam producing plant. The 10 regressor variables are defined below. The goal is to obtain a model that is excellent for fitting and prediction of “pounds of steam used monthly”.
Follow the steps we covered in class.
1. Check for outliers than can be removed. Explain.
2. Check the 10 X’s for any multicollinearity. Identify if any regressor variables should not be used in the same models.
3. Run proc rsquare to obtain the adjusted R square, MSE and Mallow’s Cp. Select 3 models that look promising to you, and also add the full model. This way, you will have four models for consideration (making it 4 models). Explain what each statistic is measuring and what it means.
4. Write 4 separate proc reg statements to obtain the PRESS statistic for each candidate model. Explain what PRESS means and why we need it.
5. Obtain the “PRESS based R square” as 1-PRESS/(SSTotal) with a calculator for the 4 candidate models. This is an R square that is prediction oriented.
6. Summarize your findings in a table. The table has columns for
(a) Model
(b) adj. R square
(c) PRESS based R square.
(d) MSE
(e) Cp
(f)PRESS
7. Briefly explain why you prefer one particular model.
multiple regression, variable selection
Obs y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
1 10.98 9.20 0.61 7.4 31 11.1104 22 35.3 54.8 4 7.7500
2 11.13 5.12 0.64 8.0 29 11.7631 25 29.7 64.0 5 5.8000
3 12.51 6.19 0.78 7.4 31 11.2710 17 30.8 54.8 4 7.7500
4 8.40 3.89 0.49 7.5 30 11.5003 22 58.8 56.3 4 7.5000
5 9.27 6.28 0.84 5.5 31 11.7997 0 61.4 30.3 5 6.2000
6 8.73 5.76 0.74 8.9 30 12.0850 0 71.3 79.2 4 7.5000
7 6.36 3.45 0.42 4.1 31 11.8331 0 74.4 16.8 2 15.5000
8 8.50 6.57 0.87 8.1 31 11.7735 0 76.7 16.8 5 6.2000
9 7.82 5.69 0.75 4.1 30 12.1353 0 70.7 16.8 4 7.5000
10 9.14 6.14 0.76 4.5 31 11.7261 0 57.5 20.3 5 6.2000
11 8.24 4.84 0.65 10.3 30 11.7687 11 46.4 106.1 4 7.5000
12 12.19 4.88 0.62 6.9 31 11.3380 12 28.9 47.6 4 7.7500
13 11.88 6.03 0.79 6.6 31 10.9529 25 28.1 43.6 5 6.2000
14 22.57 4.55 0.60 3.3 28 12.2705 18 39.1 53.3 8 3.5000
15 10.94 5.71 0.70 8.1 31 11.6143 5 46.8 65.6 4 7.7500
16 9.58 5.67 0.74 8.4 30 11.9466 7 48.5 70.6 4 7.5000
17 10.09 6.72 0.85 6.1 31 11.7090 0 59.3 37.2 6 5.1667
18 8.11 4.95 0.67 4.9 30 12.1712 0 70.0 24.0 4 7.5000
19 6.83 4.62 0.45 4.6 31 11.8093 0 70.0 21.2 3 10.3333
20 8.88 6.60 0.95 3.7 31 11.7425 0 74.5 13.7 4 7.7500
21 7.68 5.01 0.64 4.7 30 12.2038 0 72.1 22.1 4 7.5000
22 8.47 5.68 0.75 5.3 31 11.7264 1 58.1 28.1 6 5.1667
23 8.86 5.28 0.70 6.2 30 11.7446 14 44.6 38.4 4 7.5000
24 10.36 5.36 0.67 6.8 31 11.0387 22 33.4 46.2 4 7.7500
25 19.08 5.87 0.70 7.5 31 10.8634 28 28.6 56.3 5 6.2000
y=pounds of steam used monthly
x1=pounds of fatty acid in storage per month
x2=pounds of crude Glycerin made
x3= average wind velocity (miles/hour)
x4= calendar days per month
x5= index of number of warm days per month
x6= days below 32F
x7= average atmospheric temperature
x8= squared average wind velocity
x9=number of startups
x10=number of days per start-up
Problem 2:
Consider real data on several variables for Florida Counties for 2011. Each variables has been converted to normal scores. The rates are for these variables: Poverty, Cancer, Poor Water Quality, Poor Air Quality (particulate matter PM 2.5), Median Income, Unemployment.
Water: Percentage of the population whose water is below EPA standards.
Cancer: Age-adjusted, invasive types of cancer, incidence rates for years 2007-2011
PM2.5: Average daily fine particle count (less than 2.5 micrometers)
Poverty: Percentage of the population living below the poverty line
Income: Median Income
Unemployment: Unemployment rate
We want to predict the poverty rates from the rest of the variables.
• Identify outliers: keep it brief
• Identify multicollinearity: keep it brief
• Identify two candidate models of your choice
• Check out the PRESS
• Identify your “best prediction model”. Comment on its parameter estimates. What can you say about the prediction of the poverty rates for Florida counties? Which variable(s) seem to be mostly associated with the rates?
data;
input obs npoverty ncancer nwater nair nincome nunemployment;
cards;
1 1.01409 -0.07345 0.58499 -0.12776 -0.26475 -0.29549
2 0.33180 0.62905 -0.72799 0.07345 0.07547 0.50101
3 -0.32074 1.09273 0.76702 0.33307 0.23397 0.54029
4 0.97055 -0.24476 -0.72799 -0.05689 -0.32626 0.03066
5 -0.35360 0.31776 0.31662 -0.55082 0.45605 0.85304
6 -0.22859 -0.20506 0.40395 -0.53307 0.65082 0.23571
7 1.30425 -1.04213 -0.72799 0.24601 -1.24431 0.33149
8 -0.52613 -0.57539 0.18921 -0.62561 -0.06658 0.77491
9 0.12939 -0.03751 1.22190 -0.27018 -0.86135 1.09627
10 -1.03869 0.48805 -0.72799 -0.06739 1.19018 0.29590
11 -0.05123 -1.41167 0.76702 -0.61779 0.89094 0.64364
12 0.79433 -0.21495 -0.72799 0.02339 -0.66887 0.39441
13 1.79416 -1.15333 0.18921 -0.57825 -1.17718 0.39441
14 1.22020 -0.63743 1.78416 -0.11233 -1.48533 1.38265
15 0.19847 1.19924 -0.72799 0.07992 0.41262 0.74370
16 0.31480 -0.15628 -0.72799 0.83535 -0.09328 0.54029
17 -0.25766 -0.15097 0.68465 -0.25724 0.51319 1.68942
18 0.86780 -0.97507 1.64735 0.12573 -0.82227 -0.19855
19 1.22020 -0.66786 -0.72799 0.11314 -0.97832 0.67724
20 0.48081 -1.32433 -0.72799 -0.10706 -0.48987 0.39441
21 0.85902 -1.41386 -0.72799 -0.60903 -0.38121 0.50101
22 0.98223 -0.63743 0.31662 0.24601 -0.56542 0.54029
23 1.88376 -1.67948 0.76702 0.08559 -1.70307 0.94736
24 1.86968 -1.03452 -0.72799 -0.54052 -1.11949 0.64364
25 1.68948 -1.38804 -0.72799 -0.62561 -0.83078 1.86265
26 0.01169 -0.05608 -0.72799 -0.32668 -0.18164 1.62572
27 0.37169 -0.89816 0.31662 -0.57825 -1.02903 0.70957
28 0.18000 0.55693 -0.72799 -0.44356 0.47222 0.67724
29 1.33410 -3.04426 -0.72799 0.36694 -1.09420 -0.13269
30 -0.40342 -0.52843 -0.72799 -0.61340 0.02016 1.35285
31 0.63942 -2.17865 0.72863 0.26642 -0.84050 -0.19855
32 0.35232 -1.56669 -0.72799 0.01694 -0.20341 0.19978
33 1.04561 -1.12252 1.78416 -0.04356 -0.18369 -0.13269
34 -0.40342 0.38772 -0.72799 -0.36478 0.32201 0.91442
35 -0.17549 -0.88258 0.31662 -0.64883 0.36134 0.91442
36 0.92258 -1.06318 -0.72799 0.04639 0.30676 -0.09575
37 0.83079 0.97313 0.47138 -0.17836 -1.04143 0.88475
38 1.01409 -1.06318 1.24348 0.15628 -0.55881 0.03066
39 1.30425 -2.14933 -0.72799 0.03589 -1.36311 0.94736
40 -0.09368 -0.98288 -0.72799 -0.52195 0.16240 0.81084
41 0.18000 0.78222 0.40395 -0.24850 -0.67190 1.26993
42 -0.64040 -0.82453 1.67786 -0.60903 0.82001 0.81084
43 0.69123 -0.79487 -0.72799 -0.47808 -0.17304 0.97685
44 -0.42143 -1.13702 -0.72799 -0.52565 0.89937 -0.74904
45 -0.92321 0.75084 1.29394 0.18697 1.17718 0.36018
46 -0.43956 -0.35876 0.47138 0.55882 0.75298 -0.35803
47 1.33410 -0.38381 2.33202 -0.59549 -1.01139 1.09627
48 0.31480 -0.35360 -0.72799 -0.46053 0.28401 0.57800
49 -0.07992 -0.92876 -0.72799 -0.54707 -0.11274 1.00651
50 -0.10584 -0.62561 -0.72799 -0.58876 0.69020 0.74370
51 -0.11964 0.57254 0.31662 -0.38078 -0.12208 1.18678
52 -0.21536 -0.42848 -0.72799 -0.40561 0.09733 0.70957
53 0.43335 0.94125 0.58499 -0.49990 -0.20258 1.09627
54 1.53059 1.10601 -0.72799 -0.17836 -1.42157 1.15319
55 -1.44182 -0.13142 0.84227 -0.11883 1.60790 -0.01130
56 0.60273 -0.78936 -0.72799 -0.61779 -0.06254 1.47797
57 -0.94694 -0.59886 0.76702 0.67950 1.06816 0.16532
58 -0.86487 -0.76806 0.47138 -0.57396 0.55035 0.77491
59 -0.89819 -0.49078 0.80436 -0.42274 1.15177 0.39441
60 -0.29874 -0.29831 0.18921 -0.33307 0.27982 0.13107
61 1.16201 -0.43246 -0.72799 0.00040 -0.87546 0.23571
62 0.92258 -0.70202 2.21012 -0.02500 -1.00602 0.77491
63 1.22873 3.53867 1.37965 -0.03186 -0.14201 -0.09575
64 0.14608 -0.69225 -0.72799 -0.34888 -0.42583 0.77491
65 -0.27563 -0.08478 -0.72799 0.05931 0.84972 -0.01130
Use the order calculator below and get started! Contact our live support team for any assistance or inquiry.
[order_calculator]