使用mgcv的惩罚样条,我希望在示例数据中获得10 /年的有效自由度(EDF)(整个期间为60).
library(mgcv) library(dlnm) df <- chicagoNMMAPS df1<-subset(df, as.Date(date) >= '1995-01-01') mod1 <-gam(resp ~ s(time,bs='cr',k=6*15, fx=F)+ s(temp,k=6, bs='cr') + as.factor(dow) ,family=quasipoisson,na.action=na.omit,data=df1)
在示例数据中,由edf测量的时间的基本维度为56.117,小于每年10.
summary(mod1) Approximate significance of smooth terms: edf Ref.df F p-value s(time) 56.117 67.187 5.369 <2e-16 *** s(temp) 2.564 3.204 0.998 0.393 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 R-sq.(adj) = 0.277 Deviance explained = 28.2% GCV score = 1.1297 Scale est. = 1.0959 n = 2192
手动我将通过提供如下的平滑参数来更改edf a
mod1$sp s(time) s(temp) 23.84809 17.23785
然后我将sp输出插入一个新模型并重新运行它.基本上我会继续改变sp,直到我获得大约60的edf.我将只改变平滑参数的时间.
我将从较低的值开始并检查edf:
mod1a <-gam(resp ~ s(time,bs='cr',k=6*15, fx=F)+ s(temp,k=6, bs='cr') + as.factor(dow) ,family=quasipoisson,na.action=na.omit,data=df1, sp= c(12.84809, 17.23785 )) summary(mod1a) # edf 62.997
我必须增加平滑参数的时间,以将edf降低到60左右.
mod1b <-gam(resp ~ s(time,bs='cr',k=6*15, fx=F)+ s(temp,k=6, bs='cr') + as.factor(dow) ,family=quasipoisson,na.action=na.omit,data=df1, sp= c(14.84809, 17.23785 )) summary(mod1b) edf 61.393 ## EDF still large, thus I have to increase the sp` mod1c <-gam(resp ~ s(time,bs='cr',k=6*15, fx=F)+ s(temp,k=6, bs='cr') + as.factor(dow) ,family=quasipoisson,na.action=na.omit,data=df1, sp=c(16.8190989, 17.23785)) summary(mod1c) edf= 60.005 ## This is what I want to obtain as a final model.
如何通过高效的代码实现最终结果?
我不了解您的模型的细节,但如果您希望最小化(或最大化)edf
适合不同的模型sp
,optim
将完成这项工作.首先,创建一个只返回edf
给定不同值的函数sp
.
edf.by.sp<-function(sp) { model <-gam(resp ~ s(time,bs='cr',k=6*15, fx=F)+ s(temp,k=6, bs='cr') + as.factor(dow), family=quasipoisson, na.action=na.omit, data=df1, sp= c(sp, 17.23785) # Not sure if this quite right. ) abs(summary(model)$s.table['s(time)','edf']-60) # Subtract 60 and flip sign so 60 is lowest. }
现在,您可以运行optim
以最小化edf
:
# You could pick any reasonable starting sp value. # Many optimization methods are available, but in your case # they work equally well. best<-optim(12,edf.by.sp,method='BFGS')$par best # 16.82708
并且,在插入函数时,你会得到接近0(转换前正好是60):
edf.by.sp(best) # 2.229869e-06