怎么设置smooth.spline的参数
Posted pythonic生物人
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了怎么设置smooth.spline的参数相关的知识,希望对你有一定的参考价值。
smoothing parameter value too small
Calls: predict -> smooth.spline
Execution halted
下面的y绝对值过小时「太接近0时」,出现上方error
Usage
smooth.spline(x, y = NULL, w = NULL, df, spar = NULL, lambda = NULL, cv = FALSE, all.knots = FALSE, nknots = .nknots.smspl, keep.data = TRUE, df.offset = 0, penalty = 1, control.spar = list(), tol = 1e-6 * IQR(x), keep.stuff = FALSE)
Arguments
x
a vector giving the values of the predictor variable, or a list or a two-column matrix specifying x and y.
y
responses. If
y
is missing orNULL
, the responses are assumed to be specified byx
, withx
the index vector.w
optional vector of weights of the same length as
x
; defaults to all 1.df
the desired equivalent number of degrees of freedom (trace of the smoother matrix). Must be in \\((1,n_x]\\), \\(n_x\\) the number of unique x values, see below.
spar
smoothing parameter, typically (but not necessarily) in \\((0,1]\\). When
spar
is specified, the coefficient \\(\\lambda\\) of the integral of the squared second derivative in the fit (penalized log likelihood) criterion is a monotone function ofspar
, see the details below. Alternativelylambda
may be specified instead of the scale freespar
=\\(s\\).lambda
if desired, the internal (design-dependent) smoothing parameter \\(\\lambda\\) can be specified instead of
spar
. This may be desirable for resampling algorithms such as cross validation or the bootstrap.cv
ordinary leave-one-out (
TRUE
) or ‘generalized’ cross-validation (GCV) whenFALSE
; is used for smoothing parameter computation only when bothspar
anddf
are not specified; it is used however to determinecv.crit
in the result. Setting it toNA
for speedup skips the evaluation of leverages and any score.all.knots
if
TRUE
, all distinct points inx
are used as knots. IfFALSE
(default), a subset ofx[]
is used, specificallyx[j]
where thenknots
indices are evenly spaced in1:n
, see also the next argumentnknots
.Alternatively, a strictly increasing
numeric
vector specifying “all the knots” to be used; must be rescaled to \\([0, 1]\\) already such that it corresponds to theans $ fit$knots
sequence returned, not repeating the boundary knots.nknots
integer or
function
giving the number of knots to use whenall.knots = FALSE
. If a function (as by default), the number of knots isnknots(nx)
. By default for \\(n_x > 49\\) this is less than \\(n_x\\), the number of uniquex
values, see the Note.keep.data
logical specifying if the input data should be kept in the result. If
TRUE
(as per default), fitted values and residuals are available from the result.df.offset
allows the degrees of freedom to be increased by
df.offset
in the GCV criterion.penalty
the coefficient of the penalty for degrees of freedom in the GCV criterion.
control.spar
optional list with named components controlling the root finding when the smoothing parameter
spar
is computed, i.e., missing orNULL
, see below.Note that this is partly experimental and may change with general spar computation improvements!
low:
lower bound for
spar
; defaults to -1.5 (used to implicitly default to 0 in R versions earlier than 1.4).high:
upper bound for
spar
; defaults to +1.5.tol:
the absolute precision (tolerance) used; defaults to 1e-4 (formerly 1e-3).
eps:
the relative precision used; defaults to 2e-8 (formerly 0.00244).
trace:
logical indicating if iterations should be traced.
maxit:
integer giving the maximal number of iterations; defaults to 500.
Note that
spar
is only searched for in the interval \\([low, high]\\).tol
a tolerance for same-ness or uniqueness of the
x
values. The values are binned into bins of sizetol
and values which fall into the same bin are regarded as the same. Must be strictly positive (and finite).keep.stuff
an experimental
logical
indicating if the result should keep extras from the internal computations. Should allow to reconstruct the \\(X\\) matrix and more.Value
An object of class
"smooth.spline"
with componentsx
the distinct
x
values in increasing order, see the ‘Details’ above.y
the fitted values corresponding to
x
.w
the weights used at the unique values of
x
.yin
the y values used at the unique
y
values.tol
the
tol
argument (whose default depends onx
).data
only if
keep.data = TRUE
: itself alist
with componentsx
,y
andw
of the same length. These are the original \\((x_i,y_i,w_i), i = 1, \\dots, n\\), values wheredata$x
may have repeated values and hence be longer than the abovex
component; see details.lev
(when
cv
was notNA
) leverages, the diagonal values of the smoother matrix.cv.crit
cross-validation score, ‘generalized’ or true, depending on
cv
. The CV score is often called “PRESS” (and labeled onprint()
), for ‘PREdiction Sum of Squares’.pen.crit
the penalized criterion, a non-negative number; simply the (weighted) residual sum of squares (RSS),
sum(.$w * residuals(.)^2)
.crit
the criterion value minimized in the underlying
.Fortran
routinesslvrg
. Whendf
has been specified, the criterion is \\(3 + (tr(S_\\lambda) - df)^2\\), where the \\(3 +\\) is there for numerical (and historical) reasons.df
equivalent degrees of freedom used. Note that (currently) this value may become quite imprecise when the true
df
is between and 1 and 2.spar
the value of
spar
computed or given, unless it has been given asc(lambda = *)
, when it set toNA
here.ratio
(when
spar
above is notNA
), the value \\(r\\), the ratio of two matrix traces.lambda
the value of \\(\\lambda\\) corresponding to
spar
, see the details above.iparms
named integer(3) vector where
..$ipars["iter"]
gives number of spar computing iterations used.auxMat
experimental; when
keep.stuff
was true, a “flat” numeric vector containing parts of the internal computations.fit
list for use by
predict.smooth.spline
, with componentsknot:
the knot sequence (including the repeated boundary knots), scaled into \\([0, 1]\\) (via
min
andrange
).nk:
number of coefficients or number of ‘proper’ knots plus 2.
coef:
coefficients for the spline basis used.
min, range:
numbers giving the corresponding quantities of
x
.call
the matched call.
method(class = "smooth.spline") shows a hatvalues() method based on the lev vector above.
Details
Neither
x
nory
are allowed to containing missing or infinite values.The
x
vector should contain at least four distinct values. ‘Distinct’ here is controlled bytol
: values which are regarded as the same are replaced by the first of their values and the correspondingy
andw
are pooled accordingly.Unless
lambda
has been specified instead ofspar
, the computational \\(\\lambda\\) used (as a function of \\(s=spar\\)) is \\(\\lambda = r * 256^{3 s - 1}\\) where \\(r = tr(X' W X) / tr(\\Sigma)\\), \\(\\Sigma\\) is the matrix given by \\(\\Sigma_{ij} = \\int B_i''(t) B_j''(t) dt\\), \\(X\\) is given by \\(X_{ij} = B_j(x_i)\\), \\(W\\) is the diagonal matrix of weights (scaled such that its trace is \\(n\\), the original number of observations) and \\(B_k(.)\\) is the \\(k\\)-th B-spline.Note that with these definitions, \\(f_i = f(x_i)\\), and the B-spline basis representation \\(f = X c\\) (i.e., \\(c\\) is the vector of spline coefficients), the penalized log likelihood is \\(L = (y - f)' W (y - f) + \\lambda c' \\Sigma c\\), and hence \\(c\\) is the solution of the (ridge regression) \\((X' W X + \\lambda \\Sigma) c = X' W y\\).
If
spar
andlambda
are missing orNULL
, the value ofdf
is used to determine the degree of smoothing. Ifdf
is missing as well, leave-one-out cross-validation (ordinary or ‘generalized’ as determined bycv
) is used to determine \\(\\lambda\\).Note that from the above relation,
spar
is \\(s = s0 + 0.0601 * \\bold{\\log}\\lambda\\), which is intentionally different from the S-PLUS implementation ofsmooth.spline
(wherespar
is proportional to \\(\\lambda\\)). In R's (\\(\\log \\lambda\\)) scale, it makes more sense to varyspar
linearly.Note however that currently the results may become very unreliable for
spar
values smaller than about -1 or -2. The same may happen for values larger than 2 or so. Don't think of settingspar
or the controlslow
andhigh
outside such a safe range, unless you know what you are doing! Similarly, specifyinglambda
instead ofspar
is delicate, notably as the range of “safe” values forlambda
is not scale-invariant and hence entirely data dependent.The ‘generalized’ cross-validation method GCV will work correctly when there are duplicated points in
x
. However, it is ambiguous what leave-one-out cross-validation means with duplicated points, and the internal code uses an approximation that involves leaving out groups of duplicated points.cv = TRUE
is best avoided in that case.References
Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S, Wadsworth & Brooks/Cole.
Green, P. J. and Silverman, B. W. (1994) Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman and Hall.
Hastie, T. J. and Tibshirani, R. J. (1990) Generalized Additive Models. Chapman and Hall.
See Also
predict.smooth.spline
for evaluating the spline and its derivatives.
REF
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/smooth.spline
以上是关于怎么设置smooth.spline的参数的主要内容,如果未能解决你的问题,请参考以下文章
布局中声明的Android Fragment,如何设置参数?