stata跟spss的区别,aic和bic准则选择哪一个模型

在回归分析中，说明变量的数量应该越少越好。因为模型越简洁，应用范围越广，越容易被理解。但是，为了更好地拟合模型，模型越复杂越好，解释变量越多，模型拟合越好$R^2$

数据介绍

使用上一教程中使用的数据： icecream.dta。具体内容见上一篇文章： stata教程05-自相关验证和处理/

1use data/icecream.dta，clear

回归分析1进行1reg consumption temp price income

输出：

Source | SS df MS Number of obs=30

--------f---- 3、26-----22.17

model|. 0902505233.030083508 probf=0.0000

residual|. 03527283526.001356647 r-squared=0.7190

----------- adjr------ squared=---0.6866

total|.12552335829.004328392根MSE=.03683

请参见----------------------------------------- -

consumption|coef.STD.err.TP|t|[ 95 % conf.interval ]

请参见----------------------------------------- -

temp|. 0034584.00044557.760.000.0025426.0043743

price|- 1.044413.834357-1.250.222-2.759458.6706322

income|. 0033078.00117142.820.009.0008999.0057156

_ cons|. 1973149.27021610.730.472-. 3581223.752752

请参见----------------------------------------- -

AICBIC1estat ic的计算

输出：

AK aike ' sinformationcriterionandbayesianinformationcriterion

请参见----------------------------------------- -

模型| OBS ll (null ) ll )模型) df AIC BIC

请参见----------------------------------------- -

.|3039.5787658.619444-109.2389-103.6341

请参见----------------------------------------- -

note : n=obsusedincalculatingbic；大笔记本电脑。

加入temp的一阶滞后项1 regconsumptiontempl.temppriceincome

输出：

Source | SS df MS Number of obs=29

------------f---- 4、24------ 28.98

model|. 1033871834.025846796 probf=0.0000

residual|. 02140604924.000891919 r-squared=0.8285

----------- adjr------ squared=---- 0.7999

total|.12479323228.004456901根MSE=.02987

请参见----------------------------------------- -

--------------------

consumption | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

temp |

--. | .0053321 .0006704 7.95 0.000 .0039484 .0067158

radds/p>

price | -.8383021 .6880205 -1.22 0.235 -2.258307 .5817025

income | .0028673 .0010533 2.72 0.012 .0006934 .0050413

_cons | .1894822 .2323169 0.82 0.423 -.2899963 .6689607

------------------------------------------------------------------------------

再计算AIC&BIC1estat ic

输出(stream):

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------

Model | Obs ll(null) ll(model) df AIC BIC

-------------+---------------------------------------------------------------

. | 29 37.85248 63.41576 5 -116.8315 -109.995

-----------------------------------------------------------------------------

Note: N=Obs used in calculating BIC; see [R] BIC note.

我们可以看到AIC和BIC都下降了。

再加入temp的2阶滞后项1reg consumption temp L.temp L2.temp price income

输出(stream):

Source | SS df MS Number of obs = 28

-------------+---------------------------------- F(5, 22) = 21.92

Model | .103722201 5 .02074444 Prob > F = 0.0000

Residual | .020822754 22 .000946489 R-squared = 0.8328

-------------+---------------------------------- Adj R-squared = 0.7948

Total | .124544954 27 .004612776 Root MSE = .03077

------------------------------------------------------------------------------

consumption | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

temp |

--. | .0047858 .0013502 3.54 0.002 .0019856 .007586

L1. | -.0010836 .0022905 -0.47 0.641 -.0058338 .0036666

wjdpj/p>

price | -.7326035 .7214324 -1.02 0.321 -2.228763 .7635558

income | .0026704 .0011308 2.36 0.027 .0003252 .0050156

_cons | .1883478 .23949 0.79 0.440 -.3083241 .6850196

------------------------------------------------------------------------------

再计算AIC&BIC1estat ic

输出(stream):

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------

Model | Obs ll(null) ll(model) df AIC BIC

-------------+---------------------------------------------------------------

. | 28 36.08382 61.12451 6 -110.249 -102.2558

-----------------------------------------------------------------------------

Note: N=Obs used in calculating BIC; see [R] BIC note.

我们可以看到, 在增加了L2项滞后, 我们的AIC和BIC反而上升, 说明增加二阶滞后项导致模型复杂度上升, 但并没有带来模型的拟合优度较大的上升, 也就是得不偿失。

总结

AIC和BIC是两个常用的用户评估模型复杂性的指标, 但是他们略有不同, BIC是一致估计, 而AIC不是, 但现实样本不可能无限大, 而BIC可能导致模型过小, 所以我们通常是综合考虑两个指标。

注意

本文由jupyter notebook转换而来, 您可以在这里下载notebook

有问题可以直接在下方留言

或者给我发邮件675495787[at]qq.com

请记住我的网址: mlln.cn 或者 jupyter.cn