PastPaperHero | Multiple regression and diagnostics - Model selection adjusted-r2 aic and bic

Learning Outcomes

This article explains how to evaluate and select multiple regression models for the CFA Level 2 exam, including:

understanding the intuition and formula for adjusted R-squared and how it differs from ordinary R-squared in assessing the marginal contribution of new predictors to model performance;
applying adjusted R-squared to compare models with different numbers of predictors and recognizing when a marginal increase is economically and statistically meaningful for investment analysis;
interpreting AIC and BIC values as information criteria that trade off model fit against complexity, and ranking competing specifications accordingly in exam scenarios;
distinguishing between situations where AIC is more appropriate (forecasting focus) and where BIC is preferred (parsimony and identification of core drivers) when answering item-set questions;
diagnosing overfitting by linking high in-sample R-squared to weak out-of-sample performance and excessive variables, and relating this to exam-style vignette data sets;
integrating statistical output from regression software—coefficients, standard errors, and information criteria—to justify a recommended model in structured calculation and conceptual questions;
avoiding common exam pitfalls, such as automatically favoring the model with the highest adjusted R-squared or misinterpreting the direction of preference for AIC and BIC;
recognizing how joint significance tests (F-tests) on groups of variables complement adjusted R-squared, AIC, and BIC when comparing nested models.

CFA Level 2 Syllabus

For the CFA Level 2 exam, you are required to understand model selection in multiple regression, with a focus on the following syllabus points:

interpreting and calculating adjusted R-squared and understanding its role versus standard R-squared in model comparison;
explaining the purpose and interpretation of AIC and BIC for evaluating model fit and forecasting performance;
recognizing when the addition of variables constitutes overfitting and the implications for out-of-sample prediction;
comparing alternative models using goodness-of-fit measures and information criteria to select the most appropriate model for prediction or explanation;
understanding how F-tests can be used to compare nested models when several potential regressors are added or removed jointly.

Test Your Knowledge

Attempt these questions before reading this article. If you find some difficult or cannot remember the answers, remember to look more closely at that area during your revision.

An analyst is building regression models to explain monthly excess returns on an equity fund ( $R_t$ ) using various factors: market excess return ( $MKT_t$ ), size ( $SMB_t$ ), value ( $HML_t$ ), momentum ( $MOM_t$ ), and term spread ( $TERM_t$ ). She estimates three competing models using 120 monthly observations:

Model A: $R_t$ on $MKT_t$ only
Model B: $R_t$ on $MKT_t$ , $SMB_t$ , $HML_t$
Model C: $R_t$ on $MKT_t$ , $SMB_t$ , $HML_t$ , $MOM_t$ , $TERM_t$

Summary statistics are:

Model	Number of predictors ( $k$ )	$R^2$	Adjusted $R^2$	AIC	BIC
A	1	0.65	0.64	420.0	425.0
B	3	0.78	0.76	390.0	401.0
C	5	0.81	0.77	388.0	406.0

If the analyst’s primary objective is to build a forecasting model for next year’s monthly returns, which model is most appropriate based solely on the information criteria reported?
1. Model A
2. Model B
3. Model C
4. All models are equally appropriate for forecasting
If the objective is to identify the core economic drivers of the fund’s returns while avoiding unnecessary complexity, which model is most appropriate based on the information given?
1. Model A because it is the simplest and has the highest BIC
2. Model B because it balances higher adjusted R2R^2R2 and relatively low BIC
3. Model C because it has the highest R2R^2R2
4. Model C because it has the lowest AIC
Suppose the analyst adds a sixth predictor to Model C. The new model’s R2R^2R2 increases slightly, but adjusted R2R^2R2, AIC, and BIC all worsen. Which interpretation is most consistent with these results?
1. The new variable is highly significant and should definitely be kept
2. The new variable is likely capturing noise and contributes to overfitting
3. The new variable improves both in-sample and out-of-sample performance
4. The new variable has no impact on the model because R2R^2R2 increased
When comparing nested models (e.g., Model B versus Model C), which method most directly tests whether the additional regressors in the larger model are jointly useful?
1. Comparing R2R^2R2 values only
2. Comparing standardized coefficients
3. Using a joint F-test on the added coefficients
4. Inspecting residual plots visually

Introduction

Choosing an appropriate regression model is critical for robust financial analysis. While adding more predictors to a model often increases the apparent fit, this can lead to overfitting—where the model explains random noise rather than meaningful patterns. CFA candidates must be able to distinguish between genuine improvement in explanatory power and artifacts of model complexity. This article reviews key quantitative tools—adjusted R-squared, AIC, and BIC—to support prudent model selection for both forecasting and inference.

Key Term: R-squared
R-squared measures the proportion of the variance of the dependent variable explained by the regression model, ranging from 0 to 1, and always weakly increasing as more regressors are added.

Key Term: adjusted R-squared
Adjusted R-squared modifies ordinary R-squared by incorporating a penalty for adding more independent variables, so that it reflects the proportion of variance explained adjusted for model complexity.

Key Term: Akaike Information Criterion (AIC)
AIC is an information criterion used to compare regression models estimated on the same dependent variable; it combines a measure based on the sum of squared errors with a penalty that increases with the number of parameters. Lower AIC values indicate a preferred model.

Key Term: Bayesian Information Criterion (BIC)
BIC is an information criterion similar to AIC but applies a stronger penalty for additional parameters that increases with sample size, making BIC more conservative and more focused on parsimony.

Key Term: overfitting
Overfitting occurs when a regression model captures random noise in the sample rather than the true relationship, often due to too many predictors; such models look strong in-sample but perform poorly out-of-sample.

In investment applications, model selection arises in many contexts: building factor models for portfolio returns, forecasting earnings growth from macro variables, or explaining credit spreads using firm-level characteristics. In each case you face a trade-off:

better in-sample fit versus risk of overfitting;
richer economic detail versus ease of interpretation;
explanatory focus versus forecasting focus.

Adjusted R-squared, AIC, and BIC are designed to formalize this trade-off, allowing you to compare competing specifications in a disciplined way that will be tested on the CFA Level 2 exam.

Key Term: parsimony
Parsimony refers to the principle of favoring simpler models that achieve similar explanatory power or forecasting performance, avoiding unnecessary variables.

Model Selection in Multiple Regression

Regression model evaluation goes beyond assessing in-sample fit. Careful model selection accounts for:

the explanatory power of additional variables;
the risk of overfitting;
the goal of forecasting versus explanation.

Adjusted R-squared

Standard R-squared always increases or remains constant as explanatory variables are added, even if those variables are irrelevant. This makes $R^2$ unreliable for comparing models with different numbers of predictors.

Adjusted R-squared introduces a penalty for including additional variables, so it may rise or fall when new predictors are added. If a new variable improves the model enough to offset the penalty, adjusted R-squared increases; otherwise, it decreases.

Formula and intuition

For a multiple regression estimated with ordinary least squares on $n$ observations and $k$ independent variables, adjusted R-squared is:

R_a^2 = 1 - \left[\frac{n - 1}{n - k - 1}\right](1 - R^2)

where:

$n$ = number of observations,
$k$ = number of independent variables,
$R^2$ = ordinary R-squared,
$R_a^2$ = adjusted R-squared.

Because $\frac{n - 1}{n - k - 1} > 1$ whenever $k \ge 1$ , the term multiplying $(1 - R^2)$ is larger than 1, so $R_a^2$ is always less than or equal to $R^2$ .

Key implications:

adding a variable always increases $R^2$ , but may increase or decrease $R_a^2$ ;
if the new variable has very low explanatory power (very low $t$ -statistic in absolute value, often below 1), $R_a^2$ typically falls;
$R_a^2$ allows you to compare models with different $k$ on a more level playing field.

Key Term: sum of squared errors (SSE)
SSE is the sum over all observations of the squared residuals from the regression; it measures the unexplained variation in the dependent variable.

Adjusted R-squared is closely related to SSE: models with lower SSE and similar $k$ tend to have higher $R^2$ and $R_a^2$ .

Exam Warning

Relying on unadjusted R-squared alone can lead to selecting overly complex models with poor out-of-sample performance. Use adjusted R-squared to compare models with a different number of predictors, but remember that it is still an in-sample statistic and does not guarantee good forecasting performance.

Adjusted R-squared also does not:

tell you whether the overall model is statistically significant (that requires an F-test);
guarantee that individual coefficients are significant or economically meaningful;
correct for model misspecification (omitted variables, incorrect functional form, etc.).

Worked Example 1.1

An analyst runs a regression of monthly stock returns on five independent variables over 60 months. The total variation in returns (total sum of squares) is 460, and the unexplained variation (sum of squared errors, SSE) is 170.

Compute $R^2$ and adjusted $R^2$ .
The analyst then adds four more variables. $R^2$ rises to 0.65, but with 60 observations and 9 independent variables, the new adjusted $R^2$ falls. Which model should be preferred?

Answer:

First compute $R^2$ : $R^2 = \frac{\text{explained variation}}{\text{total variation}} = \frac{460 - 170}{460} = 0.63$

Now compute adjusted $R^2$ with $n = 60$ and $k = 5$ :
$R_a^2 = 1 - \left[\frac{60 - 1}{60 - 5 - 1}\right](1 - 0.63) = 1 - \left[\frac{59}{54}\right](0.37) \approx 0.596$
So the original model has $R^2 \approx 63\%$ and adjusted $R^2 \approx 59.6\%$ .

After adding four more regressors, $R^2$ increases to 0.65. However, with $k = 9$ , adjusted $R^2$ becomes: $R_a^2 = 1 - \left[\frac{60 - 1}{60 - 9 - 1}\right](1 - 0.65) = 1 - \left[\frac{59}{50}\right](0.35) \approx 0.587$ Adjusted $R^2$ falls from about 59.6% to 58.7%, indicating that the extra variables do not improve fit enough to justify their complexity. The analyst should generally prefer the first, more parsimonious model with five predictors.

This type of calculation is directly testable on the exam, including choosing between models based on adjusted $R^2$ when $R^2$ alone would be misleading.

Using F-tests with nested models

Adjusted $R^2$ provides a heuristic penalty for complexity, but you may also be asked to compare nested models—where the larger (unrestricted) model contains all the variables in the smaller (restricted) model plus some extras—using a formal F-test.

Key Term: nested models
Nested models are a pair of regression specifications where the restricted model’s regressors are a strict subset of the unrestricted model’s regressors.

Key Term: F-test
An F-test compares two nested models by testing whether a group of additional coefficients in the larger model is jointly equal to zero.

For an unrestricted model with $k$ independent variables and SSE $=\text{SSE}_U$ , and a restricted model excluding $q$ variables with SSE $=\text{SSE}_R$ , the F-statistic is:

F = \frac{(\text{SSE}_R - \text{SSE}_U)/q}{\text{SSE}_U/(n - k - 1)}

If $F$ exceeds the critical value, the additional variables are jointly significant and the unrestricted model is statistically preferred.
If not, the evidence does not support adding the extra variables.

In exam vignettes, F-tests and adjusted $R^2$ /AIC/BIC often tell a consistent story: when adding variables meaningfully improves the model, both the F-test and the information criteria will typically favor the richer specification.

Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)

Both AIC and BIC evaluate models using a trade-off between goodness-of-fit and the number of estimated parameters. Lower values reflect a model that better balances fit and complexity.

For OLS regressions, both criteria can be expressed in terms of SSE:

\text{AIC} = n \ln\left(\frac{\text{SSE}}{n}\right) + 2(k + 1)

\text{BIC} = n \ln\left(\frac{\text{SSE}}{n}\right) + \ln(n)\,(k + 1)

where:

$n$ = number of observations,
$k$ = number of independent variables (slope coefficients).

The first term, $n \ln(\text{SSE}/n)$ , rewards models with lower SSE (better fit). The second term is the complexity penalty, which increases with the number of parameters.

Because $\ln(n)$ typically exceeds 2 even for modest sample sizes (e.g., $\ln(100) \approx 4.6$ ), BIC imposes a larger penalty for additional variables than AIC.

Key Term: information criteria
Information criteria such as AIC and BIC are scalar measures combining an error term based on SSE with a penalty for the number of parameters, used to rank competing model specifications; lower values are preferred.

Practical interpretation:

for two models estimated on the same dependent variable and sample, the one with the lower AIC is expected to forecast better (lower expected out-of-sample error);
the one with the lower BIC is often treated as having better overall goodness-of-fit, emphasizing parsimony and the likelihood of approximating the “true” model.

AIC is most appropriate when the primary goal is accurate prediction. BIC is preferred when the focus is on identifying the most likely true model or a compact set of core drivers.

Worked Example 1.2

An analyst estimates three alternative regressions to explain rental price per square foot for office properties using 191 observations:

Model	Number of predictors ( $k$ )	SSE	Adjusted $R^2$	AIC	BIC
Age	1	32,627.3	8.75%	985.9	992.4
Age + Distance to CBD	2	15,000.2	57.8%	839.4	849.2
Age + Distance + Restaurants	3	13,550.6	61.7%	822.0	835.0

a) Which model is most appropriate for generating forecasts?
b) Which model offers the best goodness-of-fit while controlling for overfitting?

Answer:
a) AIC is the preferred criterion when the goal is forecasting. The model with Age, Distance, and Restaurants has the lowest AIC (822.0), so it is most appropriate for generating forecasts of rental rates.

b) BIC is more conservative and emphasizes parsimony and goodness-of-fit. The three-variable model also has the lowest BIC (835.0) and the highest adjusted $R^2$ . Therefore, it offers the best overall fit when penalizing unnecessary complexity. On both criteria, the three-variable model is preferred.

Worked Example 1.3

Suppose an analyst tests two regression models to explain quarterly stock returns:

Model 1 uses 2 predictors, with adjusted $R^2 = 0.55$ , AIC = 120, BIC = 130.
Model 2 uses 6 predictors, with adjusted $R^2 = 0.56$ , AIC = 116, BIC = 140.

Which model should be preferred for:

a) forecasting future returns; and
b) identifying core drivers of returns?

Answer:
a) For forecasting, AIC is the more relevant criterion. Model 2 has a lower AIC (116 versus 120), so it is expected to deliver better out-of-sample predictive accuracy. Despite being more complex, it appears to reduce expected forecasting error.

b) For identifying core drivers, BIC and parsimony are more important. Model 1 has a much lower BIC (130 versus 140) and almost the same adjusted $R^2$ (0.55 versus 0.56). The extra four predictors in Model 2 add complexity without a commensurate improvement in explanatory power. Model 1 is therefore preferred for explanation and explaining returns.

This worked example illustrates a critical exam point: you must align the choice of selection criterion with the stated objective (prediction versus explanation).

Diagnosing overfitting and out-of-sample performance

Overfitting is a central concern in multiple regression, especially when you have many potential predictors relative to the sample size.

Key signals of overfitting in an exam vignette include:

a very high $R^2$ and adjusted $R^2$ but poor out-of-sample performance or high forecast errors;
many variables with insignificant $t$ -statistics;
adjusted $R^2$ , AIC, and BIC all worsening when new variables are added, even as $R^2$ increases slightly;
no strong economic rationale for some of the regressors.

From a model selection standpoint:

if an additional variable reduces AIC and BIC and increases adjusted $R^2$ , it is a candidate to keep—subject to economic plausibility;
if it increases AIC and BIC and reduces adjusted $R^2$ , it is likely capturing noise and should generally be excluded;
if the changes are small or mixed, you must interpret them in context and consider the analysis objective.

In time-series settings (covered elsewhere in the curriculum), comparing out-of-sample forecast errors such as root mean squared error (RMSE) is another standard way to choose among competing models. AIC and BIC are shortcuts that approximate this trade-off using only in-sample information.

Comparing R-squared, Adjusted R-squared, AIC, and BIC

Statistic	Increases with more predictors?	Penalizes complexity?	Best for
R-squared	Yes	No	In-sample fit
Adjusted R-squared	Not always	Yes	Model comparison (same goal)
AIC	Not always	Yes (less severe)	Forecasting
BIC	Not always	Yes (more severe)	Parsimony/goodness of fit

A few practical exam-oriented points:

Direction of preference: Lower AIC and BIC values are better. Candidates sometimes incorrectly think “higher is better” because of $R^2$ intuition.
Magnitude of differences: Large differences in AIC/BIC (e.g., 10 or more) provide stronger evidence in favor of the lower-value model. Very small differences may not be decisive.
Consistency with adjusted $R^2$ and F-tests: In well-constructed item sets, when a richer model is truly better, you will typically see higher adjusted $R^2$ , lower AIC and BIC, and a significant F-test on the additional regressors.

Key Term: goodness of fit
Goodness of fit refers to how well a regression model’s predicted values match the observed data, typically summarized by statistics such as $R^2$ , adjusted $R^2$ , AIC, BIC, and overall F-tests.

Always read the vignette carefully to identify the main objective—forecasting or explanation—before deciding which model selection statistic to prioritize in your answer.

Summary

Effective regression model selection for the CFA exam means balancing explanatory power against the risk of an overfitted, complex model. Adjusted R-squared provides a quick means to penalize extra predictors, while AIC and BIC offer more refined criteria—AIC for predictive accuracy and BIC for identifying core variables under a parsimony principle. F-tests for nested models give an additional formal way to assess whether groups of added regressors are jointly useful.

Always relate your model choice to the specific purpose:

for prediction, favor models with lower AIC and good adjusted $R^2$ ;
for explanation and identifying key drivers, favor models with lower BIC and a strong economic rationale, even if $R^2$ is slightly lower.

Lower AIC or BIC values are better, but the magnitude of the difference and the purpose of the analysis are key. Avoid common pitfalls such as blindly choosing the highest adjusted $R^2$ model or misinterpreting the direction in which AIC/BIC should move.

Key Point Checklist

This article has covered the following key knowledge points:

adjusted R-squared penalizes unnecessary model complexity and allows fairer comparison of models with different numbers of predictors;
AIC and BIC both support model selection by balancing goodness-of-fit and complexity; lower values indicate preferred models;
AIC is typically used when the objective is predictive accuracy, while BIC is favored when identifying a simpler, core-driver model is the priority;
F-tests on nested models provide a formal way to test whether groups of additional regressors are jointly significant;
model selection should always relate to the analysis goal—prediction versus explanation—and should be grounded in economic reasoning;
relying only on in-sample $R^2$ may result in models that overfit and perform poorly out-of-sample, especially when many variables are included.

Key Terms and Concepts

R-squared
adjusted R-squared
Akaike Information Criterion (AIC)
Bayesian Information Criterion (BIC)
overfitting
parsimony
sum of squared errors (SSE)
nested models
F-test
information criteria
goodness of fit

Multiple regression and diagnostics - Model selection adjust...