Title: | Simplicial Generalized Beta Regression |
---|---|
Description: | Main properties and regression procedures using a generalization of the Dirichlet distribution called Simplicial Generalized Beta distribution. It is a new distribution on the simplex (i.e. on the space of compositions or positive vectors with sum of components equal to 1). The Dirichlet distribution can be constructed from a random vector of independent Gamma variables divided by their sum. The SGB follows the same construction with generalized Gamma instead of Gamma variables. The Dirichlet exponents are supplemented by an overall shape parameter and a vector of scales. The scale vector is itself a composition and can be modeled with auxiliary variables through a log-ratio transformation. Graf, M. (2017, ISBN: 978-84-947240-0-8). See also the vignette enclosed in the package. |
Authors: | Monique Graf |
Maintainer: | Monique Graf <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.1.1 |
Built: | 2024-11-01 04:08:55 UTC |
Source: | https://github.com/cran/SGB |
Package SGB contains a generalization of the Dirichlet distribution, called the Simplicial Generalized Beta (SGB). It is a new distribution on the simplex (i.e. on the space of compositions or positive vectors with sum of components equal to 1). The Dirichlet distribution can be constructed from a random vector of independent Gamma variables divided by their sum. The SGB follows the same construction with generalized Gamma instead of Gamma variables. The Dirichlet exponents are supplemented by an overall shape parameter and a vector of scales. The scale vector is itself a composition and can be modeled with auxiliary variables through a log-ratio transformation.
Index of help topics:
B2i Balances to isometric log-ratio EZ.SGB Expectations of Z under the SGB distribution EqualityConstr Equality constraints for overall shape and/or regression parameters and jacobian GenGammaDistrib Generalized Gamma distribution GoodnessFit Goodness of fit tests on the marginal distributions of each part in a SGB model Imputation Imputation of missing parts in compositions from a SGB model InequalityConstr Inequality constraints and jacobian InitialParameters Initial parameters estimates and comparison MarginPlots Histograms, quantile and probability plots for the z(u)-transforms of parts SGB-package Package SGB SGBLik SGB log-likelihood and gradient SGBdistrib Density and random generator for the SGB distribution SGButil Computation of scales and z-vectors Tabulation Tabulation of overall SGB regression results with AIC and matrix view of regression coefficients arc arc dataset carseg carseg dataset covest.SGB Classical and robust asymptotic covariance matrix ocar ocar data set oilr oilr data set regSGB Regression for compositions following a SGB distribution stepSGB Stepwise backward elimination for SGB regression summaryA.SGB Aitchison expectation and mode under the SGB distribution
Further information is available in the following vignettes:
vignette |
SGB multivariate regression (source, pdf) |
Monique Graf
Maintainer: Monique Graf <[email protected]>
Graf, M. (2017). A distribution on the simplex of the Generalized Beta type. In J. A. Martin-Fernandez (Ed.), Proceedings CoDaWork 2017, University of Girona (Spain), 71-90.
Graf, M. (2019). The Simplicial Generalized Beta distribution - R-package SGB and applications. Proceedings of the 8th International Workshop on Compositional Data Analysis (CoDaWork2019): Terrassa, 3-8 June, 2019. J.J. Egozcue, J. Graffelman and M.I. Ortego (Editors). Universitat Politecnica de Catalunya-BarcelonaTECH, 2019. 202 p. ISBN 978-84-947240-2-2. .
Graf, M. (2020). Regression for compositions based on a generalization of the Dirichlet distribution. Statistical Methods & Applications, (), 1-24.
## Result of a regression object: summary(oilr)
## Result of a regression object: summary(oilr)
39 (sand,silt,clay) compositions in an Arctic lake in function of depth.
data("arc")
data("arc")
A data frame with 39 observations on the following 4 variables.
sand part
silt part
clay part
depth (m)
Aitchison, J. (1986). The Statistical Analysis of Compositional Data.. Monographs on Statistics and Applied Probability. Chapman and Hall Ltd (reprinted 2003 with additional material by the Blackburn Press, London (UK).
Coakley, J.P. and Rust, B.R. (1968). Sedimentation in an Arctic lake. J. Sed. Petrology, 38, 1290-1300.
data(arc) str(arc)
data(arc) str(arc)
Coefficients of log of parts in a balance matrix, (+1) for numerator and (-1) for denominator, are transformed into the corresponding isometric log-ratio (ilr) coefficients
B2i(bal, balnames=FALSE)
B2i(bal, balnames=FALSE)
bal |
a ( |
balnames |
logical, if TRUE, balance names are attributed to ilr transforms; if FALSE (default) ilr transforms are numbered |
Two scalars multiplying positive and negative cells respectively are defined for each row of the matrix in such a way that the resulting matrix defines the ilr transformation to apply to the log of a compositional vector. The output transformation matrix is transposed for application to a compositional dataset where the compositions are the rows.
a matrix giving the coefficients of the ilr transforms
Pawlowsky-Glahn, V., J. J. Egozcue, and R. Tolosana-Delgado (2007). Lecture Notes on Compositional Data Analysis.
bal <- matrix(c(1,-1,0,1,1,-1),nrow=2, byrow=TRUE) colnames(bal) <- paste("l.P",1:3,sep="") bal B2i(bal) rownames(bal) <- paste("B",1:2,sep="") bal B2i(bal,balnames=TRUE) B2i(bal)
bal <- matrix(c(1,-1,0,1,1,-1),nrow=2, byrow=TRUE) colnames(bal) <- paste("l.P",1:3,sep="") bal B2i(bal) rownames(bal) <- paste("B",1:2,sep="") bal B2i(bal,balnames=TRUE) B2i(bal)
Segment shares of car sales in five categories according to the size of the car chassis, with explanatory variables.
data("carseg")
data("carseg")
A data frame with 152 observations on the following 13 variables.
Segment share in category A
Segment share in category B
Segment share in category C
Segment share in category D
Segment share in category E
quarterly household expenditures
monthly confidence indicator made up of several branches
monthly households investment
binary vector indicating the incentive period
Gross domestic product
gas oil price
monthly short term interest rates
sequential month number (1 to 150)
This dataset consists of simulated monthly segment market shares (SA to SE) corresponding to the 5 segments of a certain brand during 150 consecutive months (01/2003 to 08/2015). The set of explanatory variables was selected by Morais and Thomas-Agnan (2019) as being the most meaningful to explain the segment shares. Names have been simplified.
Morais, J. and Thomas-Agnan, C. (2019), Impact of economic context on automobile market segment shares: a compositional approach, Case Studies in Business, Industry and Government Statistics, in press.
data(carseg) summary(carseg[,(6:12)])
data(carseg) summary(carseg[,(6:12)])
Computation of two covariance matrices of the estimators of parameters in a SGB regression. The first is based on the Hessian and the second is the sandwich estimator.
covest.SGB(x, d, u, V, weight=rep(1,dim(d)[1]), x0 = NULL, hessian = NULL, ind = NULL, shape1 = NULL)
covest.SGB(x, d, u, V, weight=rep(1,dim(d)[1]), x0 = NULL, hessian = NULL, ind = NULL, shape1 = NULL)
x |
vector of parameters (shape1,coefi,shape2) where
shape1 is the overall shape, coefi is the vector of regression coefficients (see |
d |
data matrix of explanatory variables (with constant vector if required in the model) |
u |
data matrix of compositions (variables to be explained) |
V |
full rank transformation of log(parts) into log-ratios, matrix |
weight |
vector of length |
x0 |
specification of the initial parameter vector of length |
hessian |
Hessian matrix (optional), see |
ind |
vector of length equal to the number of fixed parameters; specifies the indices of the fixed components in the vector of parameters |
shape1 |
fixed value of the overall shape parameter, if |
This function is internally called by regSGB. In this case the Hessian is the output of auglag
and is numerically computed.
A design based covariance matrix of the parameters can be obtained by linearization as the covariance matrix of the scores
.
a list with
summary |
Data frame with |
scores |
matrix |
vcov1 |
ordinary asymptotic covariance matrix, inverse of minus the Hessian. |
StdErr1 |
vector of ordinary asymptotic standard error of parameters. |
varest2 |
robust asymptotic covariance matrix. |
StdErr |
vector of robust asymptotic standard error of parameters. |
Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1, pp. 221-233.
regSGB
for creating oilr
.
data(arc) data(oilr) ## compositions da <- as.matrix(log(arc[["depth"]]),ncol=1) ua <- as.matrix(arc[,1:3]) ## ilr transforms c1 <- 1/sqrt(2) c2 <- 1/sqrt(6) Vilr <- matrix(c(-c1,c1,0,-c2,-c2,2*c2),nrow=3) colnames(Vilr) <- c("ilr1","ilr2") Vilr covs <- covest.SGB(oilr[["par"]], da, ua, Vilr) ## Compare the ordinary and robust correlation matrices of parameters estimates. ## (Ordinary) covariance based on inverse Hessian vcov1 <- covs[["vcov1"]] StdErr1 <- covs[["StdErr1"]] ## Estimated correlation matrix vcor1 <- diag(1/StdErr1) %*% vcov1 %*% diag(1/StdErr1) round(vcor1,2) ## Robust (Huber's sandwich estimator): StdErr <- covs[["StdErr"]] vcov <- covs[["vcov"]] ## Estimated correlation matrix round(diag(1/StdErr) %*% vcov %*% diag(1/StdErr),2)
data(arc) data(oilr) ## compositions da <- as.matrix(log(arc[["depth"]]),ncol=1) ua <- as.matrix(arc[,1:3]) ## ilr transforms c1 <- 1/sqrt(2) c2 <- 1/sqrt(6) Vilr <- matrix(c(-c1,c1,0,-c2,-c2,2*c2),nrow=3) colnames(Vilr) <- c("ilr1","ilr2") Vilr covs <- covest.SGB(oilr[["par"]], da, ua, Vilr) ## Compare the ordinary and robust correlation matrices of parameters estimates. ## (Ordinary) covariance based on inverse Hessian vcov1 <- covs[["vcov1"]] StdErr1 <- covs[["StdErr1"]] ## Estimated correlation matrix vcor1 <- diag(1/StdErr1) %*% vcov1 %*% diag(1/StdErr1) round(vcor1,2) ## Robust (Huber's sandwich estimator): StdErr <- covs[["StdErr"]] vcov <- covs[["vcov"]] ## Estimated correlation matrix round(diag(1/StdErr) %*% vcov %*% diag(1/StdErr),2)
Setting of equality constraints on parameters. heqa.SGB
sets the overall shape parameter to shape1
. heqb.SGB
sets specified regression parameters to 0. heqab.SGB
is a combination of both. heqa.SGB.jac
, heqb.SGB.jac
, heqab.SGB.jac
compute the corresponding Jacobians.
heqa.SGB(x, d, u, bound, shape1, ...) heqa.SGB.jac(x, ...) heqb.SGB(x, d, u, bound, shape1, index, ...) heqb.SGB.jac(x, d, u, bound, shape1, index, ...) heqab.SGB(x, d, u, bound, shape1, index, ...) heqab.SGB.jac(x, d, u, bound, shape1, index, ...)
heqa.SGB(x, d, u, bound, shape1, ...) heqa.SGB.jac(x, ...) heqb.SGB(x, d, u, bound, shape1, index, ...) heqb.SGB.jac(x, d, u, bound, shape1, index, ...) heqab.SGB(x, d, u, bound, shape1, index, ...) heqab.SGB.jac(x, d, u, bound, shape1, index, ...)
x |
current vector of parameters ( |
d |
data matrix of explanatory variables (without constant vector) |
u |
data matrix of compositions (independent variables) |
bound |
not used. |
shape1 |
chosen fixed value of the overall shape parameter. |
index |
vector of length equal to the number of fixed parameters; specifies the indices of the fixed components in the vector of parameters |
... |
not used. |
These functions are invoked by regSGB
through the specification of the function name, shape1
and/or index
.
heqa.SGB
, heqb.SGB
, heqab.SGB
: vector of the same length as index
specifying the current value of x[index]
or x[1]-shape1
, where x
is the current vector of parameters. It should be near zero at convergence of the regression algorithm. heqa.SGB.jac
, heqb.SGB.jac
, heqab.SGB.jac
: the corresponding jacobian matrices of dimensions .
## parameter vector for a 3 parts composition with one explanatory variable (+ intercept): x <- c(1,3.2,0.04,0.05,6,7:9) ## shape1 fixed to 1.5: heqa.SGB(x,d,u,bound,1.5) heqa.SGB.jac(x) ## Parameters 3 (first slope) and 4 (second intercept) fixed to 0: heqb.SGB(x,d,u,bound,shape1,c(3,4)) heqb.SGB.jac(x,d,u,bound,shape1,c(3,4)) ## Parameters 1, 3, 4 fixed to 1.5, 0, 0 respectively: heqab.SGB(x,d,u,bound,1.5,c(1,3,4)) heqab.SGB.jac(x,d,u,bound,1.5,c(1,3,4))
## parameter vector for a 3 parts composition with one explanatory variable (+ intercept): x <- c(1,3.2,0.04,0.05,6,7:9) ## shape1 fixed to 1.5: heqa.SGB(x,d,u,bound,1.5) heqa.SGB.jac(x) ## Parameters 3 (first slope) and 4 (second intercept) fixed to 0: heqb.SGB(x,d,u,bound,shape1,c(3,4)) heqb.SGB.jac(x,d,u,bound,shape1,c(3,4)) ## Parameters 1, 3, 4 fixed to 1.5, 0, 0 respectively: heqab.SGB(x,d,u,bound,1.5,c(1,3,4)) heqab.SGB.jac(x,d,u,bound,1.5,c(1,3,4))
Expectations under Lebesgue and Aitchison measures for the transformed composition and
, where
is the closure operation.
EZ.SGB(D, x)
EZ.SGB(D, x)
x |
vector of parameters ( |
D |
number of parts |
A matrix with 4 rows and D columns giving on each row the expectation of parts
EZ |
|
EAZ |
|
EZa |
|
EAZa |
|
set.seed(1234) x <- c(2,rnorm(4,0,1),1.8,3.1,4.0) D <- 3 EZ.SGB(D,x)
set.seed(1234) x <- c(2,rnorm(4,0,1),1.8,3.1,4.0) D <- 3 EZ.SGB(D,x)
Density and random generation of the generalized gamma distribution.
dggamma(x, shape1, scale, shape2) rggamma(n, shape1, scale, shape2)
dggamma(x, shape1, scale, shape2) rggamma(n, shape1, scale, shape2)
x |
vector of positive values |
n |
number of simulated vectors |
shape1 |
overall shape parameter |
scale |
vector of scales. Should be of the same length as |
shape2 |
vector of Dirichlet parameters. Should be of the same length as |
log density at :
dggamma: Generalized gamma density evaluated at x
rggamma: Generalized gamma random deviates
Stacy, E.W. (1962). "A Generalization of the Gamma Distribution." Annals of Mathematical Statistics 33(3): 1187-1192.
Johnson, N.L.; Kotz, S; Balakrishnan, N. (1994) Continuous Univariate Distributions, Volume 1, 2nd Edition. Wiley. ISBN 0-471-58495-9 (Section 17.8.7)
set.seed(12345) u1 <- rggamma(10,2,1,1.4) # 10 random deviates with scale 1 set.seed(12345) u <- rggamma(10,2,1:10,1.4) # 10 random deviates with scale 1:10, repectively u u/u1 dggamma(u,2,1:10,1.4)
set.seed(12345) u1 <- rggamma(10,2,1,1.4) # 10 random deviates with scale 1 set.seed(12345) u <- rggamma(10,2,1:10,1.4) # 10 random deviates with scale 1:10, repectively u u/u1 dggamma(u,2,1:10,1.4)
Kolmogorov-Smirnov goodness of fit tests
Cramer-von-Mises goodness of fit tests.
ks.SGB(u,shape1,shape2,scale,alpha=0.05) cvm.SGB(u,shape1,shape2,scale,alpha=0.05) ## S3 method for class 'testSGB' print(x,...)
ks.SGB(u,shape1,shape2,scale,alpha=0.05) cvm.SGB(u,shape1,shape2,scale,alpha=0.05) ## S3 method for class 'testSGB' print(x,...)
u |
data matrix of compositions (independent variables) |
shape1 |
positive number, overall shape parameter of the SGB distribution. See |
shape2 |
vector of length |
scale |
matrix of the same dimensions as |
alpha |
overall level of the test, default 0.05. |
x |
an object of class "testSGB". |
... |
further arguments passed to or from other methods. |
ks.SGB
calls ks.test
and cvm.SGB
calls cvm.test
.
Consider , where
is the closure operation. The scale compositions
may be modelled with auxiliary variables. Under the SGB hypothesis, the components of
should be marginally beta-distributed. The functions provide
tests, one for each part.
Theoretically, the parameters should be known and not estimated on the data. Thus the test using estimated parameters is conservative.
The cutoff value is based on the false discovery rate for multiple comparisons (Benjamini and Hochberg, 1995), which is simply alpha*i/D
for the i-th ordered p-value, i=1,...,D (number of tests). Reject the null hypothesis if at least one p-value is smaller than the cutoff. The overall level is then alpha
. The proof of the result does not use an independence assumption between the tests.
A list of class 'testSGB' with the following components:
method |
either "One-sample Kolmogorov-Smirnov test" or "Cramer-von Mises test of goodness-of-fit" |
Compositions |
name of the dataset |
tests |
data frame with |
A print method exists for the class "testSGB".
Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) 57 (1), 289-300.
Birnbaum, Z. W. and Fred H. Tingey (1951), One-sided confidence contours for probability distribution functions. The Annals of Mathematical Statistics, 22/4, 592-596.
Conover, William J. (1971), Practical Nonparametric Statistics. New York: John Wiley & Sons. Pages 295-301 (one-sample Kolmogorov test), 309-314 (two-sample Smirnov test).
Csorgo, S. and Faraway, J.J. (1996) The exact and asymptotic distributions of Cramer-von Mises statistics. Journal of the Royal Statistical Society, Series B 58, 221-234.
Durbin, J. (1973), Distribution theory for tests based on the sample distribution function. SIAM.
Marsaglia, G., Wai Wan Tsang and Jingbo Wang (2003), Evaluating Kolmogorov's distribution. Journal of Statistical Software, 8/18.
SGBdistrib
for the theoretical distribution, oilr
for the regression results.
## Generate 1000 random variates according to SGB(shape1,rep(1/3,3),shape2) shape1 <- 0.6 shape2 <- c(10,20, 30) rnum <- rSGB(1000,shape1,rep(1,3)/3,shape2) ks.SGB(rnum,shape1=shape1, shape2=shape2,scale=1) ## same result as ks.SGB(rnum,shape1=shape1,scale= matrix(rep(1/3,3000),ncol=3), shape2=shape2) library(goftest) cvm.SGB(rnum,shape1=shape1,scale= matrix(rep(1/3,3000),ncol=3), shape2=shape2) ## Arctic lake data # oilr is a SGB regression object, see \code{\link{oilr}}. data(oilr) # regSGB object data(arc) ua <- arc[1:3] # compositions ## Kolmogorov-Smirnov goodness of fit test ks.SGB(ua,shape1=oilr[["par"]][1],shape2=oilr[["par"]][4:6],scale=oilr[["scale"]]) ## Rounding shape1 affects the results less than rounding shape2. ks.SGB(ua,shape1=round(oilr[["par"]][1],3),shape2=round(oilr[["par"]][4:6],1), scale=oilr[["scale"]]) ks.SGB(ua,shape1=round(oilr[["par"]][1],1),shape2=round(oilr[["par"]][4:6],3), scale=oilr[["scale"]]) ## Cramer-von-Mises goodness of fit test library(goftest) cvm.SGB(ua,shape1=oilr[["par"]][1],shape2=oilr[["par"]][4:6],scale=oilr[["scale"]])
## Generate 1000 random variates according to SGB(shape1,rep(1/3,3),shape2) shape1 <- 0.6 shape2 <- c(10,20, 30) rnum <- rSGB(1000,shape1,rep(1,3)/3,shape2) ks.SGB(rnum,shape1=shape1, shape2=shape2,scale=1) ## same result as ks.SGB(rnum,shape1=shape1,scale= matrix(rep(1/3,3000),ncol=3), shape2=shape2) library(goftest) cvm.SGB(rnum,shape1=shape1,scale= matrix(rep(1/3,3000),ncol=3), shape2=shape2) ## Arctic lake data # oilr is a SGB regression object, see \code{\link{oilr}}. data(oilr) # regSGB object data(arc) ua <- arc[1:3] # compositions ## Kolmogorov-Smirnov goodness of fit test ks.SGB(ua,shape1=oilr[["par"]][1],shape2=oilr[["par"]][4:6],scale=oilr[["scale"]]) ## Rounding shape1 affects the results less than rounding shape2. ks.SGB(ua,shape1=round(oilr[["par"]][1],3),shape2=round(oilr[["par"]][4:6],1), scale=oilr[["scale"]]) ks.SGB(ua,shape1=round(oilr[["par"]][1],1),shape2=round(oilr[["par"]][4:6],3), scale=oilr[["scale"]]) ## Cramer-von-Mises goodness of fit test library(goftest) cvm.SGB(ua,shape1=oilr[["par"]][1],shape2=oilr[["par"]][4:6],scale=oilr[["scale"]])
Applied to a completely missing composition, the function returns the Aitchison expectation.
Applied to a partially missing composition, it returns the conditional Aitchison expectation, given the observed sub-composition.
Applied to a complete case, it returns the complete case.
impute.regSGB(obj, dsup, usup)
impute.regSGB(obj, dsup, usup)
obj |
list, output of regSGB. |
dsup |
data frame with explanatory variables for the incomplete compositions. Missing values not allowed. |
usup |
compositions corresponding to |
data frame with imputed compositions instead of missing or partially missing compositions. Complete cases are also returned.
## Arctic lake data(arc) arcmis <- arc arc[11:13,] ## Introduce NA alues arcmis[11,2] <- NA # "core" observation arcmis[12,3] <- NA # outlying clay value arcmis[13,1:3] <- NA # totally missing observation umis <- arcmis[,1:3] umis <- umis/rowSums(umis,na.rm=TRUE) umis[11:13,] d <- data.frame(depth=arc[["depth"]]) ## original compositions arc[11:13,1:3] ## unconditional predicted value MeanA.SGB(oilr[["par"]][1],oilr[["scale"]],oilr[["par"]][4:6] )[11:13,] ## predicted value given the sub-composition (sand,clay) for 11, (sand,silt) for 12 impute.regSGB(oilr,arcmis,umis)[11:13, ] impute.regSGB(oilr,arcmis[11:13, ],umis[11:13, ]) # same result.
## Arctic lake data(arc) arcmis <- arc arc[11:13,] ## Introduce NA alues arcmis[11,2] <- NA # "core" observation arcmis[12,3] <- NA # outlying clay value arcmis[13,1:3] <- NA # totally missing observation umis <- arcmis[,1:3] umis <- umis/rowSums(umis,na.rm=TRUE) umis[11:13,] d <- data.frame(depth=arc[["depth"]]) ## original compositions arc[11:13,1:3] ## unconditional predicted value MeanA.SGB(oilr[["par"]][1],oilr[["scale"]],oilr[["par"]][4:6] )[11:13,] ## predicted value given the sub-composition (sand,clay) for 11, (sand,silt) for 12 impute.regSGB(oilr,arcmis,umis)[11:13, ] impute.regSGB(oilr,arcmis[11:13, ],umis[11:13, ]) # same result.
Setting of inequality constraints on shape parameters. hin.SGB
sets inequality constraints on the shape parameters in a SGB regression.hin.SGB.jac
defines the corresponding Jacobian.
hin.SGB(x, d, u, bound, ...) hin.SGB.jac(x, d, u, ...)
hin.SGB(x, d, u, bound, ...) hin.SGB.jac(x, d, u, ...)
x |
vector of parameters ( |
d |
data matrix of explanatory variables (without constant vector) |
u |
data matrix of compositions (independent variables) |
bound |
the estimates of shapes are constrained by |
... |
not used. |
These functions are invoked internally by regSGB
with bound
specified by the user. shape1
is constrained to be larger than 0.1, in order to avoid numerical problems and shape2
must be positive.
Moments of ratios of parts only exist up to bound
. Thus bound = 2.1
guarantees the existence of variances of ratios of parts.
hin.SGB
: vector of length with the current value of
c(shape1-0.1,shape1*shape2-bound)
. It should be non-negative at convergence of the regression algorithm. hin.SGB.jac
: corresponding jacobian matrix of dimensions
length(x)
.
## Parameter vector for a 3 parts composition with one explanatory variable (+ intercept): x <- c(1,3.2,0.04,0.05,6,7:9) bound <- 2.1 u <- t(c(0.1,0.5,0.4)) # only used to compute the number of parts. hin.SGB(x, d, u, bound) # = c(shape1-0.1, shape1*shape2-bound,shape2) # all must be positive.
## Parameter vector for a 3 parts composition with one explanatory variable (+ intercept): x <- c(1,3.2,0.04,0.05,6,7:9) bound <- 2.1 u <- t(c(0.1,0.5,0.4)) # only used to compute the number of parts. hin.SGB(x, d, u, bound) # = c(shape1-0.1, shape1*shape2-bound,shape2) # all must be positive.
initpar.SGB
computes an initial vector of parameters. condshape2
computes the shape2
parameters by the same method as initpar.SGB
, but from an arbitrary set of parameters (shape1
,coefi
) (e.g. the result of a SGB regression fit). These approximations are compared with the shape2
estimates. compushape2
is internally called by initpar.SGB
and condshape2
. It computes shape2
parameters in function of shape1
and given regression parameters coefi
.
initpar.SGB(d, u, V, weight = rep(1, dim(u)[1]), shape1 = 1, Mean2 = TRUE) condshape2(x,d,u,V) compushape2(shape1, coefi, d, u, V)
initpar.SGB(d, u, V, weight = rep(1, dim(u)[1]), shape1 = 1, Mean2 = TRUE) condshape2(x,d,u,V) compushape2(shape1, coefi, d, u, V)
d |
data matrix of explanatory variables (without constant vector) |
u |
data matrix of compositions (independent variables) |
V |
full rank transformation of log(parts) into log-ratios, matrix |
weight |
vector of length |
shape1 |
positive number, overall shape parameter |
Mean2 |
logical, if TRUE (default), the computed |
coefi |
vector of regression coefficients of length |
x |
fitted SGB regression parameters, see |
The main function here is initpar.SGB
. The initial value of shape1
must be specified by the user; by default, it takes the value 1.
In the initial regression model, each column of
log(u) % * % V
is regressed by OLS on the columns of d
. coefi
is the vector of regression parameters, first the terms associated with the first explanatory variable in
d
, and so on similarily for each explanatory variable. The initial scale compositions are computed by back-transforming the predicted values to the simplex and used to compute the vector , where
is the closure operation. Wicker et al. (2008), see also Ng et al. (2011) p.74-75, describe a procedure to find initial values for the shape parameters in a Dirichlet distribution. Their method is used on the (approximate) Dirichlet vector
.
initpar.SGB
:
vector of length containing initial values for (
shape1
,coefi
,shape2
). condshape2
:
list with two components: 1. title and 2. data-frame with 2 columns: fitted shape2
and Wicker's approximation.
Wicker, N., J. Muller, R. K. R. Kalathur, and O. Poch (2008). A maximum likelihood approximation method for Dirichlet's parameter estimation. Computational Statistics & Data Analysis 52 (3), 1315-1322.
Kai Wang Ng, Guo-Liang Tian, Man-Lai Tang (2011). Dirichlet and Related Distributions: Theory, Methods and Applications. Wiley Series in Probability and Statistics.
## Explanatory variable da <- data.frame(l.depth=log(arc[["depth"]])) damat <- as.matrix(da) ## Compositions ua <- arc[,1:3] ## alr transforms Va <- matrix(c(1,0,-1,0,1,-1),nrow=3) colnames(Va) <- c("alr1","alr2") Va ## Initial values initpar.SGB(damat,ua,Va) initpar.SGB(damat,ua,Va,Mean2=FALSE) ## Conditional shape2 values; same as parameters computed with initpar condshape2(initpar.SGB(damat,ua,Va,Mean2=FALSE),damat,ua,Va) ## Comparison with fitted parameters oa <- regSGB(damat, as.matrix(ua), Va) condshape2(oa[["par"]],damat,ua,Va)
## Explanatory variable da <- data.frame(l.depth=log(arc[["depth"]])) damat <- as.matrix(da) ## Compositions ua <- arc[,1:3] ## alr transforms Va <- matrix(c(1,0,-1,0,1,-1),nrow=3) colnames(Va) <- c("alr1","alr2") Va ## Initial values initpar.SGB(damat,ua,Va) initpar.SGB(damat,ua,Va,Mean2=FALSE) ## Conditional shape2 values; same as parameters computed with initpar condshape2(initpar.SGB(damat,ua,Va,Mean2=FALSE),damat,ua,Va) ## Comparison with fitted parameters oa <- regSGB(damat, as.matrix(ua), Va) condshape2(oa[["par"]],damat,ua,Va)
These functions draw a plot for each part in the dataset.
hzbeta(u, obj, weight = rep(1,dim(u)[1]) ) qzbeta(u, obj, weight = rep(1,dim(u)[1]) ) pzbeta(u, obj, weight = rep(1,dim(u)[1]) )
hzbeta(u, obj, weight = rep(1,dim(u)[1]) ) qzbeta(u, obj, weight = rep(1,dim(u)[1]) ) pzbeta(u, obj, weight = rep(1,dim(u)[1]) )
u |
data matrix of compositions (independent variables) |
obj |
list, result of regSGB. See |
weight |
vector of length |
Let follow a
distribution. Then the composition
is called the -transform of
.
follows a
distribution and each part
is Beta-distributed with parameters
(shape2[i],sum(shape2)-shape2[i])
.
Goodness of fit plots are produced for the parts of the -transforms against the Beta distribution. Each function creates
plots, where
is the number of parts.
hzbeta
: histograms and the corresponding Beta-densities,qzbeta
: marginal quantile plots, pzbeta
: marginal probability plots.
If weight
is specified, weighted histgrams, quantile and probability plots are drawn.
plots are produced comparing the marginal distribution of the parts of the
compositions with the theoretical Beta distribution.
## Arctic lake data data(arc) # Compositions ua <- arc[,1:3] # SGB regression data(oilr) # plot par(mfrow=c(3,3)) hzbeta(ua,oilr) qzbeta(ua,oilr) pzbeta(ua,oilr)
## Arctic lake data data(arc) # Compositions ua <- arc[,1:3] # SGB regression data(oilr) # plot par(mfrow=c(3,3)) hzbeta(ua,oilr) qzbeta(ua,oilr) pzbeta(ua,oilr)
Car segment shares SGB regression with formula
data("ocar")
data("ocar")
List of 25 items, see regSGB
.
ocar
is the same regression as object3
in regSGB
, Example 3.
data(ocar) ocar summary(ocar) # regSGB summary ocar[["kkt1"]] # first KKT condition ocar[["V"]] # matrix of log-ratio transformation ####################################################### ## ocar has been created by the following commands: ## Car segment shares data(carseg) ## Extract the compositions uc <- as.matrix(carseg[,(1:5)]) ## Define the log-ratio transformation matrix Vc <- matrix(c( 1, 0, 0, 0, -1, 1, 0, 0, 0,-1, 1, 0, 0, 0,-1, 1, 0, 0, 0,-1),ncol=4,byrow=TRUE) colnames(Vc) <- c("AB","BC","CD","DE") rownames(Vc) <- colnames(uc) Vc ## Formula Form <- Formula(AB | BC | CD | DE ~ log(expend) + I(PAC*log(expend)) + log(sent) + log(FBCF) + log(price) + rates) ocar <- regSGB(Form, data = list(carseg, uc, Vc),shape10=4.4) ##########################################################
data(ocar) ocar summary(ocar) # regSGB summary ocar[["kkt1"]] # first KKT condition ocar[["V"]] # matrix of log-ratio transformation ####################################################### ## ocar has been created by the following commands: ## Car segment shares data(carseg) ## Extract the compositions uc <- as.matrix(carseg[,(1:5)]) ## Define the log-ratio transformation matrix Vc <- matrix(c( 1, 0, 0, 0, -1, 1, 0, 0, 0,-1, 1, 0, 0, 0,-1, 1, 0, 0, 0,-1),ncol=4,byrow=TRUE) colnames(Vc) <- c("AB","BC","CD","DE") rownames(Vc) <- colnames(uc) Vc ## Formula Form <- Formula(AB | BC | CD | DE ~ log(expend) + I(PAC*log(expend)) + log(sent) + log(FBCF) + log(price) + rates) ocar <- regSGB(Form, data = list(carseg, uc, Vc),shape10=4.4) ##########################################################
Arctic lake SGB regression based on isometric log-ratio transforms
data("oilr")
data("oilr")
List of 25 items, see regSGB
.
data(oilr) oilr summary(oilr) # regSGB summary oilr[["kkt1"]] # first KKT condition oilr[["V"]] # matrix of log-ratio transformation ## oilr has been created by the following commands: ## Arctic lake data data(arc) # Compositions ua <- arc[,1:3] ## ilr transforms c1 <- 1/sqrt(2) c2 <- 1/sqrt(6) Vilr <- matrix(c(-c1,c1,0,-c2,-c2,2*c2),nrow=3) colnames(Vilr) <- c("ilr1","ilr2") Vilr ## Formula F1 <- Formula(ilr1 | ilr2 ~ -1 + log(depth) ) # SGB regression object oilr <- regSGB(F1, data= list(arc, ua, Vilr), shape10=0.5, bound=2.1) ##########################################################
data(oilr) oilr summary(oilr) # regSGB summary oilr[["kkt1"]] # first KKT condition oilr[["V"]] # matrix of log-ratio transformation ## oilr has been created by the following commands: ## Arctic lake data data(arc) # Compositions ua <- arc[,1:3] ## ilr transforms c1 <- 1/sqrt(2) c2 <- 1/sqrt(6) Vilr <- matrix(c(-c1,c1,0,-c2,-c2,2*c2),nrow=3) colnames(Vilr) <- c("ilr1","ilr2") Vilr ## Formula F1 <- Formula(ilr1 | ilr2 ~ -1 + log(depth) ) # SGB regression object oilr <- regSGB(F1, data= list(arc, ua, Vilr), shape10=0.5, bound=2.1) ##########################################################
Explanatory variables may influence the scale vector through a linear model applied to a log-ratio transform of the compositions. The shape parameters do not depend on explanatory variables. The overall shape parameter shape1
is common to all parts, whereas the Dirichlet shape parameters vector shape2
are specific to each part, i.e. shape2[j]
is the Dirichlet parameter for u[i,j]
, i=1,...,n
, (n
=number of compositions in the dataset u
).
regSGB(d, ...) ## Default S3 method: regSGB(d, u, V, weight=rep(1,dim(d)[1]), shape10 = 1, bound = 2.1, ind = NULL, shape1 = NULL, Mean2 = TRUE, control.optim = list(trace=0,fnscale=-1), control.outer = list(itmax=1000,ilack.max=200,trace=TRUE, kkt2.check =TRUE, method = "BFGS"),...) ## S3 method for class 'formula' regSGB(Formula, data= list(), weight=rep(1,dim(d)[1]), shape10 = 1, bound = 2.1, ind = NULL, shape1 = 1, Mean2=TRUE, control.optim = list(trace=0,fnscale=-1), control.outer = list(itmax=1000,ilack.max=200,trace=TRUE,kkt2.check =TRUE, method = "BFGS"),...) ## S3 method for class 'regSGB' print(x, ...) ## S3 method for class 'regSGB' summary(object, digits=3,...)
regSGB(d, ...) ## Default S3 method: regSGB(d, u, V, weight=rep(1,dim(d)[1]), shape10 = 1, bound = 2.1, ind = NULL, shape1 = NULL, Mean2 = TRUE, control.optim = list(trace=0,fnscale=-1), control.outer = list(itmax=1000,ilack.max=200,trace=TRUE, kkt2.check =TRUE, method = "BFGS"),...) ## S3 method for class 'formula' regSGB(Formula, data= list(), weight=rep(1,dim(d)[1]), shape10 = 1, bound = 2.1, ind = NULL, shape1 = 1, Mean2=TRUE, control.optim = list(trace=0,fnscale=-1), control.outer = list(itmax=1000,ilack.max=200,trace=TRUE,kkt2.check =TRUE, method = "BFGS"),...) ## S3 method for class 'regSGB' print(x, ...) ## S3 method for class 'regSGB' summary(object, digits=3,...)
Formula |
formula of class Formula, see |
d |
data matrix of explanatory variables (without constant vector) |
u |
data matrix of compositions (independent variables) |
V |
log-ratio transformation matrix |
data |
a list with 3 components |
weight |
vector of length |
shape10 |
positive number, initial value of the overall shape parameter, default 1. |
bound |
inequality constraints on the estimates of shapes: |
ind |
vector of length equal to the number of fixed parameters; see |
shape1 |
fixed value of the overall shape parameter if |
Mean2 |
logical, if TRUE (default), the computed |
control.optim |
list of control parameters for optim, see |
control.outer |
list of control parameters to be used by the outer loop in |
object |
an object of class "regSGB". |
digits |
number of decimal places for print, default 3. |
x |
an object of class "regSGB". |
... |
not used. |
It is advisable to use the formula to specify the model for easy comparison between models.
Without formula, the d
matrix of explanatory variables must contain exactly the variables used in the model,
whereas with formula other variables can be included as well.
Variable transformations can be utilized within the formula, see Example 4 below with the indicator I
and the log.
Constraints on parameters can be introduced, see example 5 and EqualityConstr
for more details.
Use weight
for pseudo-likelihood estimation. weight
is scaled to , the sample size.
A design based covariance matrix of the parameters can be obtained by linearization as the covariance matrix of the scores
.
A list of class 'regSGB' with the following components:
The first 13 form the output from auglag
.
par |
Vector of length |
value |
The value of the objective function at termination. |
counts |
A vector of length 2 denoting the number of times the objective and its gradient were evaluated, respectively. |
convergence |
An integer code indicating the type of convergence. 0 indicates successful convergence. Positive integer codes indicate failure to converge. |
message |
A character string giving any additional information on convergence returned by |
outer.iteration |
Number of outer iterations. |
lambda |
Values of the Lagrangian parameter. This is a vector of the same length as the total number of inequalities and equalities. It must be zero for inactive inequalities; non-negative for active inequalities; and can have any sign for equalities. |
sigma |
Value of augmented penalty parameter for the quadratic term. |
gradient |
Gradient of the augmented Lagrangian function at convergence. It should be small. |
hessian |
Hessian of the augmented Lagrangian function at convergence. It should be negative definite for maximization. |
ineq |
Values of inequality constraints at convergence. All of them must be non-negative. |
equal |
Values of equality constraints at convergence. All of them must be close to zero. |
kkt1 |
A logical variable indicating whether or not the first-order KKT conditions were satisfied (printed 1 if conditions satisfied and 0 otherwise). |
kkt2 |
A logical variable indicating whether or not the second-order KKT conditions were satisfied (printed 1 if conditions satisfied and 0 otherwise). |
scale |
|
meanA |
Aitchison expectation at estimated parameters. |
fitted.values |
|
residuals |
Observed minus estimated log-ratio transforms. |
scores |
matrix |
Rsquare |
ratio of total variation of |
vcov |
The robust covariance matrix of parameters estimates, see |
StdErr1 |
Ordinary asymptotic standard errors of parameters. |
StdErr |
Robust asymptotic standard errors of parameters. |
fixed.par |
Indices of the fixed parameters. |
summary |
The summary from |
AIC |
AIC criterion. |
V |
log-ratio transformation matrix (same as corresponding input parameter |
call |
Arguments for calling |
Formula |
Expression for formula. |
Graf, M. (2017). A distribution on the simplex of the Generalized Beta type. In J. A. Martin-Fernandez (Ed.), Proceedings CoDaWork 2017, University of Girona (Spain), 71-90.
Hijazi, R. H. and R. W. Jernigan (2009). Modelling compositional data using Dirichlet regression models. Journal of Applied Probability and Statistics, 4 (1), 77-91.
Kotz, S., N. Balakrishnan, and N. L. Johnson (2000). Continuous Multivariate Distributions, Volume 1, Models and Applications. John Wiley & Sons.
Madsen, K., H. Nielsen, and O. Tingleff (2004). Optimization With Constraints. Informatics and Mathematical Modelling, Technical University of Denmark.
Monti, G. S., G. Mateu-Figueras, and V. Pawlowsky-Glahn (2011). Notes on the scaled Dirichlet distribution. In V. Pawlowsky-Glahn and A. Buccianti (Eds.), Compositional data analysis. Theory and applications. Wiley.
Varadhan, R. (2015). alabama: Constrained Nonlinear Optimization. R package version 2015.3-1.
Wicker, N., J. Muller, R. K. R. Kalathur, and O. Poch (2008). A maximum likelihood approximation method for Dirichlet parameter estimation. Computational Statistics & Data Analysis 52 (3), 1315-1322.
Zeileis, A. and Y. Croissant (2010). Extended model formulas in R: Multiple parts and multiple responses. Journal of Statistical Software 34 (1), 1-13.
stepSGB
, for an experimental stepwise descending regression, initpar.SGB
, for the computation of initial parameters.
This function uses Formula
, auglag
.
## Regression for car segment shares ## --------------------------------- data(carseg) ## Extract the compositions uc <- as.matrix(carseg[,(1:5)]) ## Extract the explanatory variables attach(carseg) ## Example 1: without formula ## -------------------------- ## Change some variables dc <- data.frame(l.exp1=log(expend)*PAC,l.exp0=log(expend)*(1-PAC), l.sent=log(sent), l.FBCF=log(FBCF), l.price=log(price), rates) ## Define the log-ratio transformation matrix Vc <- matrix(c( 1,0,0,0, -1,1,0,0, 0,-1,1,0, 0,0,-1,1, 0,0,0,-1),ncol=4,byrow=TRUE) colnames(Vc) <- c("AB","BC","CD","DE") rownames(Vc) <- colnames(uc) Vc # 2 next rows only necessary when calling regSGB without a formula. dc1 <- cbind("(Intercept)"= 1 , dc) dc1 <- as.matrix(dc1) object10 <- regSGB(dc1,uc, Vc,shape10=4.4) summary(object10) ## Example 2: same with formula ## ---------------------------- ## Define the formula Form <- Formula(AB | BC | CD | DE ~ l.exp1 + l.exp0 + l.sent + l.FBCF + l.price + rates) ## Regression with formula object1 <- regSGB(Form, data= list(dc, uc, Vc),shape10=4.4) summary(object1) ## Example 3: Usage of I() ## ----------------------- Form2 <- Formula(AB | BC | CD | DE ~ I(l.exp1 + l.exp0) + l.exp1 +l.sent + l.FBCF + l.price + rates ) object2 <- regSGB(Form2,data= list(dc, uc, Vc),shape10=4.4) object2 ## Example 4: Usage of variable transformations on the original file ## ----------------------------------------------------------------- Form3 <- Formula(AB | BC | CD | DE ~ log(expend) + I(PAC*log(expend)) + log(sent) + log(FBCF) + log(price) + rates) object3 <- regSGB(Form3, data=list(carseg, uc, Vc),shape10=4.4) object3 object2[["par"]]-object3[["par"]] # same results ## Example 5: Fixing parameter values ## ---------------------------------- ## 1. In the following regression we condition on shape1 = 2.36. object4 <- regSGB(Form3,data=list(carseg, uc, Vc), shape10 = 4.4, bound = 2.0, ind = 1, shape1 = 2.36) summary(object4) ## 2. In the following regression we condition on shape1 = 2.36 and the coefficient of ## log(FBCF).BC = 0. Notice that it is the 19th parameter. object5 <- regSGB(Form3,data=list(carseg, uc, Vc), shape10 = 4.4, bound = 2.0, ind = c(1,19) , shape1 = 2.36) summary(object5) object3[["AIC"]] object4[["AIC"]] # largest AIC object5[["AIC"]]
## Regression for car segment shares ## --------------------------------- data(carseg) ## Extract the compositions uc <- as.matrix(carseg[,(1:5)]) ## Extract the explanatory variables attach(carseg) ## Example 1: without formula ## -------------------------- ## Change some variables dc <- data.frame(l.exp1=log(expend)*PAC,l.exp0=log(expend)*(1-PAC), l.sent=log(sent), l.FBCF=log(FBCF), l.price=log(price), rates) ## Define the log-ratio transformation matrix Vc <- matrix(c( 1,0,0,0, -1,1,0,0, 0,-1,1,0, 0,0,-1,1, 0,0,0,-1),ncol=4,byrow=TRUE) colnames(Vc) <- c("AB","BC","CD","DE") rownames(Vc) <- colnames(uc) Vc # 2 next rows only necessary when calling regSGB without a formula. dc1 <- cbind("(Intercept)"= 1 , dc) dc1 <- as.matrix(dc1) object10 <- regSGB(dc1,uc, Vc,shape10=4.4) summary(object10) ## Example 2: same with formula ## ---------------------------- ## Define the formula Form <- Formula(AB | BC | CD | DE ~ l.exp1 + l.exp0 + l.sent + l.FBCF + l.price + rates) ## Regression with formula object1 <- regSGB(Form, data= list(dc, uc, Vc),shape10=4.4) summary(object1) ## Example 3: Usage of I() ## ----------------------- Form2 <- Formula(AB | BC | CD | DE ~ I(l.exp1 + l.exp0) + l.exp1 +l.sent + l.FBCF + l.price + rates ) object2 <- regSGB(Form2,data= list(dc, uc, Vc),shape10=4.4) object2 ## Example 4: Usage of variable transformations on the original file ## ----------------------------------------------------------------- Form3 <- Formula(AB | BC | CD | DE ~ log(expend) + I(PAC*log(expend)) + log(sent) + log(FBCF) + log(price) + rates) object3 <- regSGB(Form3, data=list(carseg, uc, Vc),shape10=4.4) object3 object2[["par"]]-object3[["par"]] # same results ## Example 5: Fixing parameter values ## ---------------------------------- ## 1. In the following regression we condition on shape1 = 2.36. object4 <- regSGB(Form3,data=list(carseg, uc, Vc), shape10 = 4.4, bound = 2.0, ind = 1, shape1 = 2.36) summary(object4) ## 2. In the following regression we condition on shape1 = 2.36 and the coefficient of ## log(FBCF).BC = 0. Notice that it is the 19th parameter. object5 <- regSGB(Form3,data=list(carseg, uc, Vc), shape10 = 4.4, bound = 2.0, ind = c(1,19) , shape1 = 2.36) summary(object5) object3[["AIC"]] object4[["AIC"]] # largest AIC object5[["AIC"]]
dSGB
computes the density for a given argument u
(a composition) and given parameters. rSGB
generates compositions for given parameters.
dSGB(u, shape1, scale, shape2) rSGB(n, shape1, scale, shape2)
dSGB(u, shape1, scale, shape2) rSGB(n, shape1, scale, shape2)
u |
vector of length |
shape1 |
overall shape parameter. shape1 = 1 for a Dirichlet composition. |
scale |
vector of the same length as u containing the scales of parts. If missing, scales are set to 1. |
shape2 |
vector of length |
n |
number of observations. |
The number of columns in u
and the number of components in shape2
must match.
dSGB gives the density, rSGB generates a ( - matrix with random compositions on each row.
u1 <- c(0.2,0.3,0.5) scale1 <- c(0.25,0.33,0.32) shape1 <- 1 shape2 <- c(0.8,3,0.9) dSGB(u1,shape1,scale1,shape2) rSGB(10,shape1,scale1,shape2) ## with equal scales dSGB(u1,shape1,shape2=shape2) rSGB(10,shape1,shape2=shape2)
u1 <- c(0.2,0.3,0.5) scale1 <- c(0.25,0.33,0.32) shape1 <- 1 shape2 <- c(0.8,3,0.9) dSGB(u1,shape1,scale1,shape2) rSGB(10,shape1,scale1,shape2) ## with equal scales dSGB(u1,shape1,shape2=shape2) rSGB(10,shape1,shape2=shape2)
fn.SGB
gives the log-likelihood and gr.SGB
the gradient vector of the log-likelihood.
fn.SGB(x, d, u, V, weight, ...) gr.SGB(x, d, u, V, weight, ...)
fn.SGB(x, d, u, V, weight, ...) gr.SGB(x, d, u, V, weight, ...)
x |
vector of parameters ( |
d |
data matrix of explanatory variables (without constant vector) |
u |
data matrix of compositions (independent variables) |
V |
full rank transformation of log(parts) into log-ratios, matrix |
weight |
vector of length |
... |
others parameters that might be introduced. |
The analytical expression for fn.SGB
is found in the vignette "SGB regression", Section 3.2. More details in Graf(2017).
fn.SGB: value of the log-likelihood at parameter x
gr.SGB: gradient vector at parameter x
.
Graf, M. (2017). A distribution on the simplex of the Generalized Beta type. In J. A. Martin-Fernandez (Ed.), Proceedings CoDaWork 2017, University of Girona (Spain), 71-90.
## Explanatory variable da <- data.frame(l.depth=log(arc[["depth"]])) damat <- as.matrix(da) ## Compositions ua <- as.matrix(arc[,1:3]) ## alr transforms Va <- matrix(c(1,0,-1,0,1,-1),nrow=3) colnames(Va) <- c("alr1","alr2") Va ## Initial values x <- initpar.SGB(damat,ua,Va) fn.SGB(x, damat, ua, Va,weight=rep(1,dim(da)[1])) gr.SGB(x, damat, ua, Va,weight=rep(1,dim(da)[1]))
## Explanatory variable da <- data.frame(l.depth=log(arc[["depth"]])) damat <- as.matrix(da) ## Compositions ua <- as.matrix(arc[,1:3]) ## alr transforms Va <- matrix(c(1,0,-1,0,1,-1),nrow=3) colnames(Va) <- c("alr1","alr2") Va ## Initial values x <- initpar.SGB(damat,ua,Va) fn.SGB(x, damat, ua, Va,weight=rep(1,dim(da)[1])) gr.SGB(x, damat, ua, Va,weight=rep(1,dim(da)[1]))
bval computes the scale for each observed composition from the parameters and auxiliary variables for that observation.
zval computes the z-vector for each observed composition, i.e. the transform that is Dirichlet distributed under the SGB model for the observed composition.
bval(D, x, d, V) zval(u, x, d, V)
bval(D, x, d, V) zval(u, x, d, V)
D |
number of parts |
x |
vector of parameters ( |
d |
|
u |
|
V |
|
See Graf (2017), Equation (8), or the vignette "SGB regression", Equation (1).
transformed composition of length .
Graf, M. (2017). A distribution on the simplex of the Generalized Beta type. In J. A. Martin-Fernandez (Ed.), Proceedings CoDaWork 2017, University of Girona (Spain), 71-90.
## Example with 2 compositions u <- matrix(c(0.2,0.4,0.5,0.5,0.3,0.2),nrow=2,byrow=TRUE) u D <- NCOL(u) # number of parts ## auxiliary variable d <- matrix(c(3.2,4.6),ncol=1) ## log-ratio transformation V <- matrix(c(c(1,-1,0)/sqrt(2),c(1,1,-2)/sqrt(6)),ncol=2) ## vector of parameters: shape1 <- 2.00 coefi <- c(-0.78, 0.06, 0.96, -0.11) shape2 <- c(1.80, 3.10, 4.00) x <-c(shape1, coefi, shape2) bval(D,x,d,V) zval(u,x,d,V)
## Example with 2 compositions u <- matrix(c(0.2,0.4,0.5,0.5,0.3,0.2),nrow=2,byrow=TRUE) u D <- NCOL(u) # number of parts ## auxiliary variable d <- matrix(c(3.2,4.6),ncol=1) ## log-ratio transformation V <- matrix(c(c(1,-1,0)/sqrt(2),c(1,1,-2)/sqrt(6)),ncol=2) ## vector of parameters: shape1 <- 2.00 coefi <- c(-0.78, 0.06, 0.96, -0.11) shape2 <- c(1.80, 3.10, 4.00) x <-c(shape1, coefi, shape2) bval(D,x,d,V) zval(u,x,d,V)
Stepwise elimination of the non significant regression parameters. Possibility to assign a fixed value shape1
to the overall shape parameter.
stepSGB(obj0, d, u, weight = rep(1, dim(d)[1]), shape10 = obj0[["par"]][1], bound = 2.1, shape1 = NULL, Mean2 = TRUE, maxiter = 10, control.optim = list(fnscale = -1), control.outer = list(itmax = 1000, ilack.max = 200, trace = TRUE, kkt2.check = TRUE, method = "BFGS") )
stepSGB(obj0, d, u, weight = rep(1, dim(d)[1]), shape10 = obj0[["par"]][1], bound = 2.1, shape1 = NULL, Mean2 = TRUE, maxiter = 10, control.optim = list(fnscale = -1), control.outer = list(itmax = 1000, ilack.max = 200, trace = TRUE, kkt2.check = TRUE, method = "BFGS") )
obj0 |
object of class regSGB, see |
d |
data matrix of explanatory variables (without constant vector) |
u |
data matrix of compositions (independent variables) |
weight |
vector of length |
shape10 |
positive number, initial value of the overall shape parameter, default obj0[["par"]][1]. |
bound |
inequality constraints on the estimates of shapes: |
shape1 |
fixed value of the overall shape parameter. Default is NULL (no fixed value). |
Mean2 |
logical, if TRUE (default), the initial shape2 parameters are each replaced by their average. See |
maxiter |
maximum number of iterations, i.e. attempts to set a parameter to 0. |
control.optim |
list of control parameters for optim, see |
control.outer |
list of control parameters to be used by the outer loop in constrOptim.nl, see |
This is an experimental procedure for searching a set of non-significant parameters that will be set to zero. The shape parameters are excluded from the elimination procedure. The algorithm starts with obj0
, output of regSGB. The p-values for the regression parameters in summary(obj0)
are taken in decreasing order. The parameter with the largest p-value is set to zero and regSGB
computes the regression with this constraint. If the AIC value is smaller than the AIC in obj0
, the parameter with the next largest p-value in obj0
is set to zero and the regression with the two constraints is computed. The process iterates until either a larger AIC is found or maxiter
is attained.
The initial value of the overall shape parameter is set to the estimated value in the full model obj0
. The other initial values are computed as in regSGB
.
There is the possibility to fix the value of the overvall shape parameter, if shape1
is given a positive number (default NULL, no fixed value).
If regSGB
was called without Formula
, the data-frame with auxiliary variables for stepSGB
follows the same rules as for the initial regSGB object, see Example 1 in regSGB
.
A list of class 'stepSGB' with the following 5 components:
reg |
A list with the following components: |
Formula |
The original formula, or NULL |
iter |
Value of k, the last iteration. |
tab |
Data frame with |
call |
Arguments for calling |
vignette("SGB regression", package = "SGB")
data(carseg) ## Extract the compositions uc <- as.matrix(carseg[,(1:5)]) ## Initial regression data(ocar) step_ocar <- stepSGB(ocar, carseg, uc, bound=2.1, control.outer=list(trace=FALSE)) summary(step_ocar[["reg"]][["full"]]) summary(step_ocar[["reg"]][["iter4"]]) step_ocar[["tab"]]
data(carseg) ## Extract the compositions uc <- as.matrix(carseg[,(1:5)]) ## Initial regression data(ocar) step_ocar <- stepSGB(ocar, carseg, uc, bound=2.1, control.outer=list(trace=FALSE)) summary(step_ocar[["reg"]][["full"]]) summary(step_ocar[["reg"]][["iter4"]]) step_ocar[["tab"]]
The expectation and mode in the log-ratio space, transformed back to the simplex.
MeanA.SGB(shape1, scale, shape2) ModeA.SGB(shape1, scale, shape2) MeanAobj.SGB(obj) ModeAobj.SGB(obj)
MeanA.SGB(shape1, scale, shape2) ModeA.SGB(shape1, scale, shape2) MeanAobj.SGB(obj) ModeAobj.SGB(obj)
shape1 |
overall shape parameter. |
scale |
vector of length |
shape2 |
vector of length |
obj |
list, result of regSGB. See |
MeanA
, ModeA
compute Aitchison expectation and mode in function of the SGB distribution parameters, whereas MeanAobj
, ModeAobj
compute Aitchison expectation and mode in function of the model variables in an SGB regression object.
A matrix or vector of dimensions .
Each row gives the Aitchison expectation for compositions having the corresponding set of auxiliary variables.
Aitchison, J. (1986). The Statistical Analysis of Compositional Data. Monographs on Statistics and Applied Probability. Chapman and Hall Ltd (reprinted 2003 with additional material by the Blackburn Press, London (UK).
oilr
.
set.seed(1234) x <- c(2,rnorm(4,0,1),1.8,3.1,4.0) d <- c(3.2,4.6) V <- t(matrix(c(1/sqrt(2),-1/sqrt(2),0, 1/sqrt(6),1/sqrt(6),-2/sqrt(6)), nrow=2,byrow=TRUE)) D <- 3 shape1 <- x[1] scale <- bval(D,x,d,V) shape2 <- x[(length(x)-D+1):length(x)] # Expectation MeanA.SGB(shape1,scale,shape2) # Mode ModeA.SGB(shape1, scale, shape2) ## Arctic lake data # oilr is a SGB regression object data(oilr) MeanAobj.SGB(oilr) # is the same as oilr[["meanA"]] ModeAobj.SGB(oilr)
set.seed(1234) x <- c(2,rnorm(4,0,1),1.8,3.1,4.0) d <- c(3.2,4.6) V <- t(matrix(c(1/sqrt(2),-1/sqrt(2),0, 1/sqrt(6),1/sqrt(6),-2/sqrt(6)), nrow=2,byrow=TRUE)) D <- 3 shape1 <- x[1] scale <- bval(D,x,d,V) shape2 <- x[(length(x)-D+1):length(x)] # Expectation MeanA.SGB(shape1,scale,shape2) # Mode ModeA.SGB(shape1, scale, shape2) ## Arctic lake data # oilr is a SGB regression object data(oilr) MeanAobj.SGB(oilr) # is the same as oilr[["meanA"]] ModeAobj.SGB(oilr)
table.regSGB: Value of the log-likelihood, number of parameters, AIC criterion, optimality tests and iterations counts.
coefmat: regression coefficients in matrix form with significance level.
table.regSGB(object) coefmat(object,digits=3)
table.regSGB(object) coefmat(object,digits=3)
object |
an object of class regSGB |
digits |
number of decimal places for the coefficients |
table.regSGB
: Data frame with one column, with the overall statistics results.
value |
the maximum log-likelihood |
n.par |
the number of parameters |
n.par.fixed |
the number of fixed parameters |
AIC |
the AIC criterion |
Rsquare |
total variance of estimated over total variance of observed compositions |
convergence |
the convergence code (0: converged, others, see |
kkt1 |
the first Karush-Kuhn-Tucker conditions (1=TRUE, 0=FALSE), see |
kkt2 |
the second Karush-Kuhn-Tucker conditions (1=TRUE, 0=FALSE), see |
counts.function |
number of times the log-likelihood was evaluated. |
counts.gradient |
number of times the gradient was evaluated. |
coefmat
: character matrix with the regression coefficients arranged in columns, one for each log-ratio transform. Each ceofficient is followed by the significance level.
## Overall model statistics table.regSGB(oilr) ## print(coefmat(oilr),quote=FALSE) ## it is a subset of summary(oilr)
## Overall model statistics table.regSGB(oilr) ## print(coefmat(oilr),quote=FALSE) ## it is a subset of summary(oilr)