Chapter 16 Stepwise model selection
16.1 Motivation
Problem: Already when there are moderately many explanatory variables, the number of possible models is very large (especially when also considering interactions) and grows exponentially fast.
Consequences:
- Fitting all possible models is burdensome and computationally intensive.
- If these models are all compared using significance tests, the usual interpretation of type I errors is lost.
Solutions:
- Information criteria for model selection (instead of significance tests).
- Stepwise model fitting (instead of all models in advance).
Moreover: Regularized estimation or iterative model fitting (boosting, trees) are other possible strategies (not considered here).
Example: Which explanatory variables should be used to model invest
in the MLA data?
Which interactions effects between the variables male
, age
, treatment
,
grade
, arrangement
should be considered?
## invest male age treatment grade arrangement
## Min. : 0.0 no :320 Min. :11.1 long :285 6-8 :341 single:385
## 1st Qu.: 30.0 yes:250 1st Qu.:12.0 short:285 10-12:229 team :185
## Median : 50.0 Median :13.9
## Mean : 50.4 Mean :14.3
## 3rd Qu.: 70.0 3rd Qu.:15.9
## Max. :100.0 Max. :21.1
16.2 Type I error
Problem: When conducting many significance tests (e.g., \(F\)-tests for model comparisons), the probability for a type I error increases quickly.
Example:
- In a significance test, the probability for a type I error (rejecting a null hypothesis although it is correct) is controlled, often set to 5%.
- Thus, a type I error can be avoided with 95% probability.
- When conducting 10 such tests independently, the probability to avoid a type I error in all of them is only \(0.95^{10} = 0.599\).
- In this case the probability for a type I error increased to about 40%.
Alternatively: Use information criteria instead of significance tests for model selection.
16.3 Information criteria
Ideas:
- Adding any regressor always decreases the residual sum of squares (RSS).
- Thus, choosing the model with the lowest RSS would always choose the most complex model.
- Hence make a tradeoff between decreasing the objective function RSS and keeping the model simple (with few parameters/coefficients).
Popular: AIC (Akaike information criterion) and BIC (Bayes information criterion or Schwartz criterion). Both have a similar form but employ different penalties for the tradeoff.
\[\begin{eqnarray*} \mathsf{AIC}(\hat \beta) & = & n \cdot \log(\mathit{RSS}(\hat \beta)) ~+~ 2 \cdot k, \\ \mathsf{BIC}(\hat \beta) & = & n \cdot \log(\mathit{RSS}(\hat \beta)) ~+~ \log(n) \cdot k. \end{eqnarray*}\]
Properties:
- When comparing two models with AIC or BIC, the one with the smaller criterion is preferred.
- The penalty term of BIC is higher than for AIC (when \(n \ge 8\)). Hence BIC tends to select smaller models.
- There are various equivalent definitions of AIC and BIC, e.g., in R a constant is added (for reasons explained later).
- Both criteria have a consistency property:
- AIC asymptotically selects the model that best approximates the true model (ie., allowing for model misspecification).
- BIC asymptotically selects the true model if this is one of the models in the selection.
16.4 Stepwise model selection
Previously: Fit all relevant models in advance and then select the best one (e.g., via significance tests or information criteria. However, this is computationally intensive if there are many explanatory variables.
Backward selection: Start with the most complex candidate model. Stepwise simplification, dropping the least relevant variable (if any). Stop if model cannot be improved further.
Forward selection: Start with the simplest candidate model. Stepwise extension of the model, adding the most relevant variable (if any). Stop if model cannot be improved further.
Model comparison: Either via \(F\)-tests or information criteria.
Combination: Forward and backward selection, starting from some plausible initial model (not necessarily the simplest or most complex). In each step add or drop one variable that most improves the model (if any).
In R: step()
.
16.5 Application
Goal: Based on the AIC, select all relevant explanatory variables and interactions
for invest
in MLA data.
Minimal model: Trivial model (constant only).
Maximal model: male
, age
, treatment
, grade
, arrangement
and all pairwise interactions (also denoted by ^2
in R).
Initial model: Main effects.
In R:
m1 <- lm(invest ~ male + age + treatment + grade + arrangement, data = MLA)
m2 <- step(m1, scope = list(
lower = invest ~ 1,
upper = invest ~ (male + age + treatment + grade + arrangement)^2
))
## Start: AIC=3678
## invest ~ male + age + treatment + grade + arrangement
##
## Df Sum of Sq RSS AIC
## + grade:arrangement 1 6485 347844 3670
## + age:arrangement 1 5759 348570 3671
## + age:grade 1 3437 350892 3675
## - treatment 1 8 354337 3676
## <none> 354329 3678
## + male:grade 1 936 353393 3679
## + treatment:arrangement 1 782 353547 3679
## + male:age 1 352 353977 3680
## + treatment:grade 1 28 354301 3680
## + male:treatment 1 27 354302 3680
## + male:arrangement 1 23 354306 3680
## + age:treatment 1 20 354309 3680
## - grade 1 3684 358012 3682
## - age 1 6773 361102 3687
## - male 1 11918 366247 3695
## - arrangement 1 14007 368335 3699
## Step: AIC=3670
## invest ~ male + age + treatment + grade + arrangement + grade:arrangement
##
## Df Sum of Sq RSS AIC
## + age:grade 1 3747 344097 3666
## - treatment 1 68 347912 3668
## <none> 347844 3670
## + treatment:arrangement 1 970 346874 3670
## + age:arrangement 1 134 347710 3672
## + male:age 1 88 347757 3672
## + age:treatment 1 81 347763 3672
## + male:arrangement 1 62 347782 3672
## + male:grade 1 38 347806 3672
## + male:treatment 1 2 347842 3672
## + treatment:grade 1 0 347844 3672
## - grade:arrangement 1 6485 354329 3678
## - age 1 7311 355155 3680
## - male 1 11856 359700 3687
## Step: AIC=3666
## invest ~ male + age + treatment + grade + arrangement + grade:arrangement +
## age:grade
##
## Df Sum of Sq RSS AIC
## - treatment 1 85 344182 3664
## <none> 344097 3666
## + treatment:arrangement 1 928 343169 3666
## + age:arrangement 1 315 343782 3667
## + male:age 1 135 343962 3668
## + age:treatment 1 83 344014 3668
## + male:treatment 1 79 344018 3668
## + male:grade 1 67 344030 3668
## + treatment:grade 1 1 344096 3668
## + male:arrangement 1 1 344096 3668
## - age:grade 1 3747 347844 3670
## - grade:arrangement 1 6795 350892 3675
## - male 1 11840 355937 3683
## Step: AIC=3664
## invest ~ male + age + grade + arrangement + grade:arrangement +
## age:grade
##
## Df Sum of Sq RSS AIC
## <none> 344182 3664
## + age:arrangement 1 328 343854 3665
## + male:age 1 117 344065 3666
## + treatment 1 85 344097 3666
## + male:grade 1 52 344130 3666
## + male:arrangement 1 0 344182 3666
## - age:grade 1 3730 347912 3668
## - grade:arrangement 1 6723 350905 3673
## - male 1 12157 356339 3682
Interpretation:
- The selected model includes all main effects except
treatment
. - For example, (teams with) male players invest \(10.35\) percentage points more than female players or all-female teams.
- Interaction effects between
grade
andarrangement
as well asgrade
andage
are included. - The team effect is only \(6.28\) in the lower grades (reference category) but \(21.35\) in the higher grades.
- Age differences within the lower grades have only a small influence on investment with a slope of \(1.15\). The age differences in the higher grades have a somewhat more pronounced effect with a slope of \(5.73\).
##
## Call:
## lm(formula = invest ~ male + age + grade + arrangement + grade:arrangement +
## age:grade, data = MLA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -77.35 -19.67 0.24 19.16 57.24
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.44 15.75 1.81 0.07136
## maleyes 10.35 2.32 4.46 0.0000099
## age 1.15 1.24 0.93 0.35041
## grade10-12 -84.37 28.15 -3.00 0.00284
## arrangementteam 6.28 3.02 2.08 0.03788
## grade10-12:arrangementteam 15.07 4.55 3.32 0.00097
## age:grade10-12 4.58 1.85 2.47 0.01380
##
## Residual standard error: 24.7 on 563 degrees of freedom
## Multiple R-squared: 0.158, Adjusted R-squared: 0.149
## F-statistic: 17.7 on 6 and 563 DF, p-value: <2e-16
16.7 Tutorial
16.7.1 Setup
The myopic loss aversion (MLA) data (MLA.csv, MLA.rda) already analyzed in Chapters 13, 14, and @ref(analysis-of covariance) are reconsidered. Here, the dependence of invested point percentages (invest
) on all available explanatory variables and potential pairwise interaction is investigated, using AIC-based stepwise model selection. Only the variable gender
is not included in the model search as it can be represented as a certain interaction of the male
and arrangement
factors.
R
## invest male age treatment grade arrangement
## Min. : 0.0 no :320 Min. :11.1 long :285 6-8 :341 single:385
## 1st Qu.: 30.0 yes:250 1st Qu.:12.0 short:285 10-12:229 team :185
## Median : 50.0 Median :13.9
## Mean : 50.4 Mean :14.3
## 3rd Qu.: 70.0 3rd Qu.:15.9
## Max. :100.0 Max. :21.1
Python
# Make sure that the required libraries are installed.
# Import libraries and classes:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
from itertools import combinations
pd.set_option("display.precision", 4) # Set display precision to 4 digits in pandas and numpy.
## invest gender male age treatment grade arrangement
## count 570.0000 570 570 570.0000 570 570 570
## unique NaN 3 2 NaN 2 2 2
## top NaN female no NaN long 6-8 single
## freq NaN 320 320 NaN 285 341 385
## mean 50.3784 NaN NaN 14.3331 NaN NaN NaN
## std 26.8094 NaN NaN 2.3029 NaN NaN NaN
## min 0.0000 NaN NaN 11.1256 NaN NaN NaN
## 25% 30.0000 NaN NaN 11.9782 NaN NaN NaN
## 50% 50.0000 NaN NaN 13.8756 NaN NaN NaN
## 75% 70.0000 NaN NaN 15.9397 NaN NaN NaN
## max 100.0000 NaN NaN 21.1064 NaN NaN NaN
## invest gender male age treatment grade arrangement
## 0 70.0000 female/male yes 11.7533 long 6-8 team
## 1 28.3333 female/male yes 12.0866 long 6-8 team
## 2 50.0000 female/male yes 11.7950 long 6-8 team
## 3 50.0000 male yes 13.7566 long 6-8 team
## 4 24.3333 female/male yes 11.2950 long 6-8 team
16.7.2 Information criteria
Information criteria are an alternative to analysis of variance (ANOVA, Chapter 14) for model selection. As the \(F\)-tests underlying the ANOVA do not control their type-I error if multiple tests are carried out, information criteria instead use a trade-off between minimizing the objective function (residual sum of squares, negative log-likelihood, …) and increasing the model complexity as captured by the degrees of freedom (df, the number of estimated parameters in a model).
Most commonly-used information criteria \(\text{IC}(\hat \beta)\) for a model with estimated parameters \(\hat \beta\) have the following form:
\[ \text{IC}(\hat \beta) ~=~ -2 \cdot \ell(\hat \beta) ~+~ k \cdot \text{df}\\ \] where \(\ell(\hat \beta)\) is the maximized log-likelihood of the model (see Chapter 18), \(\text{df}\) are the degrees of freedom (number of estimated parameters in \(\hat \beta\)) and \(k\) is the penalty parameter. Setting \(k = 2\) corresponds to the AIC (Akaike information criterion) and \(k = \log(n)\) yields the BIC (Bayes information criterion).
Some formulations of AIC/BIC use a different objective function instead of the log-likelihood. For example, based on the residual sum of squares \(\mathit{RSS}(\hat \beta)\):
\[ \text{IC}(\hat \beta) ~=~ n \cdot \log(\mathit{RSS}(\hat \beta)) ~+~ k \cdot \text{df}\\ \] This differs by a constant from the definition above based on \(n \cdot \log(\mathit{RSS}(\hat \beta))\). However, as the constant is the same for all models on the same data set, it does not change the resulting model selection.
For illustration, we fit the main effects model of all explanatory variables for modeling invest
in the MLA
data:
R
Methods for lm
objects:
Method | Description |
---|---|
print() |
Simple display with call and coefficient estimates. |
summary() |
Detailed summary including coefficient estimates, standard errors, \(t\)-tests as well as \(F\)-test agains the trivial model. |
coef() |
Extract coefficient estimates \(\hat \beta\). |
fitted() |
Extract fitted values \(\hat y_i\) (in-sample). |
residuals() |
Extract residuals \(\hat \varepsilon_i\). |
predict() |
Predictions on new data. |
anova() |
Model comparison via analysis of variance and \(F\)-test. |
logLik() |
Extract maximized log-likelihood and degrees of freedom. |
AIC() , BIC() |
Extract AIC, BIC, and other information criteria. |
step() |
Stepwise model selection based on information criteria. |
Remark: All the methods above are also available for many other
model classes in R, especially generalized linear models (GLMs) via glm()
.
The corresponding log-likelihood, degrees of freedom, and number of observations can then be extracted by:
Python
## np.float64(-2642.0131643370705)
## 5.0
## 570.0
Regarding the degrees of freedom, note that there are different conventions which parameters are counted here and which are not. It is clear that the five slope coefficients should be counted (as in Python/statsmodels
) but additionally the intercept coefficient and/or the error variance can be counted as well (as in R).
These are then the basis for the computation of the information criteria:
R
In R, the function AIC(object, k = 2)
can be used to compute the AIC for a given
fitted model object
- here of class "lm"
but other model classes are also supported,
notably "glm"
. Internally, this computes logLik(object)
to extract the maximized
log-likelihood and corresponding degrees of freedom. For computing the BIC, either
the k
argument can be changed manually or BIC(object)
can be used which extracts
the number of observations in the associated data set by nobs(object)
.
## [1] 5298
## [1] 5328
Additionally, it is worth pointing out that the BIC could equivalently be obtained by
the AIC()
function when the penalty k
is set appropriately:
By themselves these information criteria are not very meaningful. They only become
interesting for comparing different models fitted to the same data set where a lower
information criterion indicates a better fit. Such a comparison can be made manually
for previously fitted model objects. Moreover, the step()
function introduced in the next section can automatically fit and select models in a stepwise approach.
16.7.3 Stepwise model selection
The main function for stepwise model selection in R is the function step()
. Python
has no equivalent of the R
function step()
, therefore we provide the function stepwise_selection()
with similar functionalities. We follow the R tutorial for details and introduce the usage of the function(s) in both R
and Python
.
The most important arguments of the step()
function are:
The object
is a fitted model object - here of class "lm"
but other model classes
are also supported, notably "glm"
. The scope
is a list of the simplest (lower
)
and most complex (upper
) model to be considered, see below for a concrete example.
The search direction
could be restricted to only "forward"
or only "backward"
search but the default is to do "both"
. The argument k
corresponds to the penalty
parameter in the computation of the information criterion (see above). As in the
AIC()
function, the default is k = 2
corresponding to the AIC but by setting
k = log(n)
the BIC could be used for the model selection.
Here, we start the stepwise model search from the main effects model m1
, using
all available explanatory variables. As the lower
end of the scope
the
trivial model invest ~ 1
without any covariates is employed while the upper
end
is set to an interaction model allowing all pairwise interactions. In R this could be
specified as invest ~ male + age + ... + male:age + ...
etc. but a more compact
formula is invest ~ (male + age + treatment + grade + arrangement)^2
. The ^2
indicates
all interactions up to order 2 (higher orders could be used as well). Note that this
special meaning of the ^
operator is the reason that the I()
function has to be
used to obtain its arithmetic meaning, e.g., when computing age-squared as I(age^2)
.
The reason for restricting ourselves to second-order interactions is that based on our
analysis so far, higher orders seem unlikely. Also, these would become increasingly
difficult to interpret. Specifically, using the most complex interaction model
invest ~ male * age * treatment * grade * arrangement
would allow interactions up
to order five which would not be plausible in this application.
Carrying out the model selection with arguments as specified above yields:
R
m2 <- step(m1, scope = list(
lower = invest ~ 1,
upper = invest ~ (male + age + treatment + grade + arrangement)^2
))
## Start: AIC=3678
## invest ~ male + age + treatment + grade + arrangement
##
## Df Sum of Sq RSS AIC
## + grade:arrangement 1 6485 347844 3670
## + age:arrangement 1 5759 348570 3671
## + age:grade 1 3437 350892 3675
## - treatment 1 8 354337 3676
## <none> 354329 3678
## + male:grade 1 936 353393 3679
## + treatment:arrangement 1 782 353547 3679
## + male:age 1 352 353977 3680
## + treatment:grade 1 28 354301 3680
## + male:treatment 1 27 354302 3680
## + male:arrangement 1 23 354306 3680
## + age:treatment 1 20 354309 3680
## - grade 1 3684 358012 3682
## - age 1 6773 361102 3687
## - male 1 11918 366247 3695
## - arrangement 1 14007 368335 3699
##
## Step: AIC=3670
## invest ~ male + age + treatment + grade + arrangement + grade:arrangement
##
## Df Sum of Sq RSS AIC
## + age:grade 1 3747 344097 3666
## - treatment 1 68 347912 3668
## <none> 347844 3670
## + treatment:arrangement 1 970 346874 3670
## + age:arrangement 1 134 347710 3672
## + male:age 1 88 347757 3672
## + age:treatment 1 81 347763 3672
## + male:arrangement 1 62 347782 3672
## + male:grade 1 38 347806 3672
## + male:treatment 1 2 347842 3672
## + treatment:grade 1 0 347844 3672
## - grade:arrangement 1 6485 354329 3678
## - age 1 7311 355155 3680
## - male 1 11856 359700 3687
##
## Step: AIC=3666
## invest ~ male + age + treatment + grade + arrangement + grade:arrangement +
## age:grade
##
## Df Sum of Sq RSS AIC
## - treatment 1 85 344182 3664
## <none> 344097 3666
## + treatment:arrangement 1 928 343169 3666
## + age:arrangement 1 315 343782 3667
## + male:age 1 135 343962 3668
## + age:treatment 1 83 344014 3668
## + male:treatment 1 79 344018 3668
## + male:grade 1 67 344030 3668
## + treatment:grade 1 1 344096 3668
## + male:arrangement 1 1 344096 3668
## - age:grade 1 3747 347844 3670
## - grade:arrangement 1 6795 350892 3675
## - male 1 11840 355937 3683
##
## Step: AIC=3664
## invest ~ male + age + grade + arrangement + grade:arrangement +
## age:grade
##
## Df Sum of Sq RSS AIC
## <none> 344182 3664
## + age:arrangement 1 328 343854 3665
## + male:age 1 117 344065 3666
## + treatment 1 85 344097 3666
## + male:grade 1 52 344130 3666
## + male:arrangement 1 0 344182 3666
## - age:grade 1 3730 347912 3668
## - grade:arrangement 1 6723 350905 3673
## - male 1 12157 356339 3682
Python
Setup
Download the stepwise_selection.py file in the same folder of your notebook.
Usage
m_1 = smf.ols("invest ~ male + age + treatment + grade + arrangement", data=MLA) # DON'T call fit()
m_2 = stepwise_selection(m_1, scope={"lower": "invest ~ 1",
"upper": "invest ~ (male + age + treatment + grade + arrangement)**2" })
## Step: aic= 5296.026328674141
## (' + grade:arrangement', np.float64(5287.497588176868))
## (' + age:arrangement', np.float64(5288.686294017627))
## (' + age:grade', np.float64(5292.470970482877))
## (' - treatment', np.float64(5294.038844118856))
## ('', np.float64(5296.026328674141))
## (' + male:grade', np.float64(5296.518578654452))
## (' + treatment:arrangement', np.float64(5296.766772650885))
## (' + male:age', np.float64(5297.460103986587))
## (' + treatment:grade', np.float64(5297.981237271327))
## (' + male:treatment', np.float64(5297.983509471273))
## (' + male:arrangement', np.float64(5297.989633730205))
## (' + age:treatment', np.float64(5297.994846344091))
## (' - grade', np.float64(5299.921349788084))
## (' - age', np.float64(5304.819395918524))
## (' - male', np.float64(5312.88346257991))
## (' - arrangement', np.float64(5316.1243356592295))
## Step: aic= 5287.497588176868
## (' + age:grade', np.float64(5283.324336214653))
## (' - treatment', np.float64(5285.609741230732))
## ('', np.float64(5287.497588176868))
## (' + treatment:arrangement', np.float64(5287.905124140504))
## (' + age:arrangement', np.float64(5289.277796506088))
## (' + male:age', np.float64(5289.354154167884))
## (' + age:treatment', np.float64(5289.364155053963))
## (' + male:arrangement', np.float64(5289.396354056759))
## (' + male:grade', np.float64(5289.4347437099805))
## (' + male:treatment', np.float64(5289.494340327461))
## (' + treatment:grade', np.float64(5289.497050555255))
## (' - grade:arrangement', np.float64(5296.026328674141))
## (' - age', np.float64(5297.354467503067))
## (' - male', np.float64(5304.6020993464435))
## Step: aic= 5283.324336214653
## (' - treatment', np.float64(5281.46504144095))
## ('', np.float64(5283.324336214653))
## (' + treatment:arrangement', np.float64(5283.78421409581))
## (' + age:arrangement', np.float64(5284.801531908277))
## (' + male:age', np.float64(5285.100943260024))
## (' + age:treatment', np.float64(5285.18694832994))
## (' + male:treatment', np.float64(5285.192713579132))
## (' + male:grade', np.float64(5285.213667227472))
## (' + treatment:grade', np.float64(5285.322717091125))
## (' + male:arrangement', np.float64(5285.322738244473))
## (' - age:grade', np.float64(5287.497588176868))
## (' - grade:arrangement', np.float64(5292.470970482877))
## (' - male', np.float64(5300.607112070311))
## Step: aic= 5281.46504144095
## ('', np.float64(5281.46504144095))
## (' + age:arrangement', np.float64(5282.9219850838035))
## (' + male:age', np.float64(5283.270560845902))
## (' + age:treatment', np.float64(5283.279723549397))
## (' + treatment', np.float64(5283.324336214653))
## (' + male:grade', np.float64(5283.378344189257))
## (' + male:arrangement', np.float64(5283.464657598176))
## (' + treatment:arrangement', np.float64(5283.78421409581))
## (' + male:treatment', np.float64(5285.192713579132))
## (' + treatment:grade', np.float64(5285.322717091125))
## (' - age:grade', np.float64(5285.609741230732))
## (' - grade:arrangement', np.float64(5290.4916501439475))
## (' - male', np.float64(5299.250849003049))
## Result: aic= 5281.46504144095
## invest ~ male + age + grade + arrangement + grade:arrangement + age:grade
The output table first states the initial AIC along with the corresponding model formula.
Then all possible single-step changes are assessed, either by adding any of the possible
pairwise interactions or dropping any of the included main effects. For each possible
model update, the search direction is shown (+
vs. -
), the term to be added or dropped,
the corresponding change in degrees of freedom and change in residual sum of squares,
the overall residual sum of squares and the resuling information criterion. The table
is already ordered by the information criterion so that the best-fitting model (in terms
of the criterion) is listed first. As the unmodified model (<none>
) is also listed,
it becomes immediately clear, if any change in the model leads to an improvement. Note
also that all the terms considered are added/dropped individually (and not sequentially)
to the initial model.
Here, the biggest improvement in AIC results from adding the grade:arrangement
interaction,
i.e., indicating that the size of the team effect differs between lower and upper grades.
This then shown as the new model along with its AIC before the next table of single-step
modifications is shown. Note that in this next step dropping the main effects of grade
or arrangement
is not considered because their interaction effect is present in the
model.
In the next step, the age:grade
interaction is added, indicating different magnitudes
of age effects in the lower and upper grades. Again, this becomes the new current model
before the next table of single-step modifications is presented. This leads to the
exclusion of the treatment
main effect. After that no individual effects could be
added or dropped to improve the model. Thus, the model selection concludes and returns
the model m2
: invest ~ male + age + grade + arrangement + grade:arrangement + age:grade
.
R
##
## Call:
## lm(formula = invest ~ male + age + grade + arrangement + grade:arrangement +
## age:grade, data = MLA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -77.35 -19.67 0.24 19.16 57.24
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.44 15.75 1.81 0.07136
## maleyes 10.35 2.32 4.46 0.0000099
## age 1.15 1.24 0.93 0.35041
## grade10-12 -84.37 28.15 -3.00 0.00284
## arrangementteam 6.28 3.02 2.08 0.03788
## grade10-12:arrangementteam 15.07 4.55 3.32 0.00097
## age:grade10-12 4.58 1.85 2.47 0.01380
##
## Residual standard error: 24.7 on 563 degrees of freedom
## Multiple R-squared: 0.158, Adjusted R-squared: 0.149
## F-statistic: 17.7 on 6 and 563 DF, p-value: <2e-16
Python
m2 = smf.ols("invest ~ male + age + grade + arrangement + grade:arrangement + age:grade", data=MLA).fit()
print(m2.summary())
## OLS Regression Results
## ==============================================================================
## Dep. Variable: invest R-squared: 0.158
## Model: OLS Adj. R-squared: 0.149
## Method: Least Squares F-statistic: 17.66
## Date: Sun, 17 Nov 2024 Prob (F-statistic): 8.60e-19
## Time: 12:31:15 Log-Likelihood: -2633.7
## No. Observations: 570 AIC: 5281.
## Df Residuals: 563 BIC: 5312.
## Df Model: 6
## Covariance Type: nonrobust
## ====================================================================================================
## coef std err t P>|t| [0.025 0.975]
## ----------------------------------------------------------------------------------------------------
## Intercept -55.9290 23.332 -2.397 0.017 -101.757 -10.101
## male[T.yes] 10.3534 2.322 4.459 0.000 5.793 14.914
## grade[T.6-8] 84.3739 28.146 2.998 0.003 29.090 139.658
## arrangement[T.team] 21.3517 3.732 5.721 0.000 14.022 28.682
## grade[T.6-8]:arrangement[T.team] -15.0744 4.546 -3.316 0.001 -24.003 -6.146
## age 5.7311 1.382 4.148 0.000 3.017 8.445
## age:grade[T.6-8] -4.5769 1.853 -2.470 0.014 -8.216 -0.938
## ==============================================================================
## Omnibus: 18.099 Durbin-Watson: 1.623
## Prob(Omnibus): 0.000 Jarque-Bera (JB): 8.962
## Skew: 0.038 Prob(JB): 0.0113
## Kurtosis: 2.390 Cond. No. 555.
## ==============================================================================
##
## Notes:
## [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
This model can essentially be interpreted as the models in Chapter 14. Noteworthy conclusions include:
- (Teams with) male players invest 10.35 percentage points more than female players or all-female teams.
- The team effect is only 6.28 in the lower grades (reference category) but 21.35 in the higher grades.
- Age differences within the lower grades have only a small influence on investment with a slope of 1.15. The age differences in the higher grades have a somewhat more pronounced effect with a slope of 5.73.