Penalized Variance Components

Overview of the vcpen package for Penalized Variance Components

A penalized likelihood model is used to estimate variance components with an elastic-net penalty function that applies both L1 and L2 penalties to the variance components, using the function vcpen(). Each variance component multiplies a kernel matrix, and we provide the function kernel_linear() to compute linear kernel matrices, but users are welcome to use their own functions to compute kernel matrices.

The function vcpen() allows the user to provide intitial starting values for the variance components. If no initial values are provided, the default is to use our funcion minque() to calculate initial values. For linear mixed models, MINQUE is the first iteration of restricted maximum likeihood estimation (REML), and iterative updates of MINQUE converge to REML estimation.

require(vcpen)

Preparing to run vcpen

Sample dataset

Below provides snapshots of an example dataset. The response is the outcome variable, covmat is a matrix of adjusting covariates, and dose is a matrix of the dose of a minor allele for SNPs (dose values of 0, 1, 2). The doseinfo illustrates how the SNPs (columns of dose) map into groups, for creating kernel matrices for each group. A kernel matrix for n subjects is an nxn matrix that measures similarity of the dose values for each pair of subjects.

data(vcexample)
ls()
[1] "covmat"   "dose"     "doseinfo" "response"
head(dose)
  snp1 snp2 snp3 snp4 snp5 snp6 snp7 snp8 snp9 snp10 snp11 snp12 snp13 snp14
1    0    0    0    0    1    1    1    0    0     1     1     0     0     1
2    1    0    0    0    0    1    0    0    0     0     2     0     0     1
3    0    0    0    0    1    0    0    0    0     1     0     0     0     0
4    1    0    0    0    1    0    0    1    1     0     0     0     0     0
5    0    0    0    0    0    1    0    0    0     1     0     0     1     0
6    0    0    0    0    1    0    0    0    0     0     1     0     0     0
  snp15 snp16 snp17 snp18 snp19 snp20 snp21 snp22 snp23 snp24 snp25 snp26 snp27
1     1     0     1     1     0     1     2     0     0     0     0     1     0
2     2     1     0     1     0     1     0     0     0     0     0     0     0
3     0     0     0     0     0     0     0     1     1     1     0     1     0
4     0     1     0     0     1     1     0     0     0     0     0     0     0
5     0     0     0     0     0     0     1     0     0     1     0     1     0
6     1     0     0     0     0     0     0     1     0     0     1     0     0
  snp28 snp29 snp30 snp31 snp32 snp33 snp34 snp35 snp36 snp37 snp38 snp39 snp40
1     0     0     0     0     0     1     0     0     0     0     0     0     1
2     0     0     1     0     0     0     1     0     0     0     0     1     0
3     0     1     0     0     0     0     0     0     0     0     1     1     0
4     1     1     0     0     0     0     0     0     0     0     0     1     0
5     0     0     0     0     0     1     1     0     0     0     0     0     1
6     0     0     0     0     0     0     0     1     0     0     0     0     0
  snp41 snp42 snp43 snp44 snp45 snp46 snp47 snp48 snp49 snp50 snp51 snp52 snp53
1     1     0     1     0     1     1     0     1     1     0     0     0     0
2     0     1     0     1     0     1     0     1     0     0     0     0     0
3     0     1     0     0     0     1     1     0     0     0     0     0     0
4     0     0     0     0     0     1     0     0     0     0     0     1     0
5     0     1     0     1     0     0     0     1     0     1     1     1     0
6     0     1     0     0     1     0     1     0     0     0     0     2     0
  snp54 snp55 snp56 snp57 snp58 snp59 snp60 snp61 snp62 snp63 snp64 snp65 snp66
1     0     0     1     0     0     0     0     0     0     0     0     0     0
2     0     0     0     0     0     1     0     0     1     0     0     1     0
3     0     0     1     0     0     0     0     0     1     0     0     0     0
4     0     0     0     0     1     0     1     0     1     0     0     1     0
5     0     0     0     0     0     0     0     0     1     0     0     0     0
6     1     0     1     0     1     0     0     0     0     2     0     1     0
  snp67 snp68 snp69 snp70
1     0     0     0     0
2     0     1     0     0
3     1     0     0     0
4     0     0     0     0
5     0     0     0     0
6     0     0     0     0
head(doseinfo)
  index vcname
1     1    vc1
2     1    vc1
3     1    vc1
4     1    vc1
5     1    vc1
6     1    vc1
response[1:10]
 [1]  0.54166685  0.07233516 -1.03718603 -1.16407584  1.17964672  0.14994286
 [7]  1.82625006 -2.83793412 -0.18631904  1.53431587

Make kernel matrices

The example below illustrates how to loop over groups (indicated by doseinfo) to create linear kernel matrices for each group. Note that the number of variance components is the number of groups plus 1, because the last group is for the residual variance component, which will have a kernel matrix that is the identity matrix.

nvc <- 1+length(unique(doseinfo[,2]))
id <- 1:nrow(dose)

## vcs for genetic kernel matrices
Kerns <- vector("list", length=nvc)
for(i in 1:(nvc-1)){
  ## below uses kernel_linear, but users can replace this with their choice of function to 
  ## create other types of kernel matrices.
  Kerns[[i]] <- kernel_linear(dose[,grep(i, doseinfo[,2])])
  rownames(Kerns[[i]]) <- id
  colnames(Kerns[[i]]) <- id  
}
## vc for residual variance requires identity matrix
Kerns[[nvc]] <- diag(nrow(dose))
rownames(Kerns[[nvc]]) <- id
colnames(Kerns[[nvc]]) <- id

Penalized estimation of VCs

Default settings.

Run with default settings, which uses minque() to estimate initial values for variance components and default frac1=0.8.

fit  <- vcpen(response, covmat, Kerns)
summary(fit)
vcpen object
  N-subjects = 100
  N-VC = 5

  Model fits over lambda penalty grid:

   lambda iter   logl loglpen   bic min_bic
1    0.10   49 -155.2   160.5 333.4   FALSE
2    0.09   30 -154.7   159.9 332.4   FALSE
3    0.08   24 -154.3   159.3 331.5   FALSE
4    0.07   20 -153.9   158.7 330.8   FALSE
5    0.06   18 -153.6   158.0 330.2   FALSE
6    0.05   16 -153.3   157.2 329.7   FALSE
7    0.04   15 -153.1   156.4 329.3   FALSE
8    0.03   14 -152.9   155.6 328.9   FALSE
9    0.02   13 -152.8   154.7 328.6   FALSE
10   0.01   13 -152.7   153.7 328.5    TRUE
11   0.00  110 -152.7   152.7 333.0   FALSE

  VC estimates by lambda penalties:

   lambda     vc1    vc2       vc3       vc4    vc5
1    0.10 0.06318 0.1301 0.000e+00 0.000e+00 1.0854
2    0.09 0.07712 0.1535 0.000e+00 0.000e+00 1.0523
3    0.08 0.09139 0.1779 3.171e-20 0.000e+00 1.0231
4    0.07 0.10522 0.2020 0.000e+00 0.000e+00 0.9996
5    0.06 0.11933 0.2269 0.000e+00 1.337e-36 0.9796
6    0.05 0.13405 0.2530 0.000e+00 0.000e+00 0.9622
7    0.04 0.14992 0.2813 0.000e+00 4.089e-21 0.9466
8    0.03 0.16730 0.3123 0.000e+00 0.000e+00 0.9326
9    0.02 0.18676 0.3472 0.000e+00 0.000e+00 0.9200
10   0.01 0.20925 0.3875 2.605e-22 2.571e-22 0.9082
11   0.00 0.23670 0.4372 2.420e-13 5.713e-03 0.8910

Estimates with min BIC:
beta:
   x0    x1 
0.628 1.097 
VC estimates:
      vc1    vc2       vc3       vc4    vc5
10 0.2092 0.3875 2.605e-22 2.571e-22 0.9082

Changing penalty fraction:

Perform the same run as above, but with lower penalty fraction.

fit.frac1  <- vcpen(response, covmat, Kerns, frac1 = .1)
summary(fit.frac1)
vcpen object
  N-subjects = 100
  N-VC = 5

  Model fits over lambda penalty grid:

   lambda iter   logl loglpen   bic min_bic
1    0.10   25 -154.2   157.4 331.5   FALSE
2    0.09   14 -154.1   157.1 331.1   FALSE
3    0.08   15 -153.8   156.7 330.7   FALSE
4    0.07   14 -153.6   156.3 330.3   FALSE
5    0.06   14 -153.4   155.9 329.9   FALSE
6    0.05   13 -153.3   155.5 329.5   FALSE
7    0.04   13 -153.1   155.0 329.2   FALSE
8    0.03   12 -152.9   154.5 328.9   FALSE
9    0.02   12 -152.8   154.0 328.7   FALSE
10   0.01   12 -152.7   153.4 328.5    TRUE
11   0.00  110 -152.7   152.7 333.0   FALSE

  VC estimates by lambda penalties:

   lambda     vc1    vc2       vc3       vc4    vc5
1    0.10 0.09851 0.1707 0.000e+00 0.000e+00 1.0213
2    0.09 0.10547 0.1832 0.000e+00 0.000e+00 1.0101
3    0.08 0.11416 0.1988 0.000e+00 0.000e+00 0.9959
4    0.07 0.12345 0.2158 0.000e+00 0.000e+00 0.9825
5    0.06 0.13371 0.2347 1.367e-22 0.000e+00 0.9694
6    0.05 0.14496 0.2556 0.000e+00 9.499e-23 0.9569
7    0.04 0.15772 0.2797 0.000e+00 0.000e+00 0.9445
8    0.03 0.17218 0.3073 2.643e-11 0.000e+00 0.9326
9    0.02 0.18923 0.3406 0.000e+00 0.000e+00 0.9208
10   0.01 0.20977 0.3815 0.000e+00 0.000e+00 0.9091
11   0.00 0.23670 0.4372 2.420e-13 5.713e-03 0.8910

Estimates with min BIC:
beta:
   x0    x1 
0.628 1.097 
VC estimates:
      vc1    vc2 vc3 vc4    vc5
10 0.2098 0.3815   0   0 0.9091

Demo of using minque() outside of vcpen()

This demonstrates how users can use minque() as a general approach to approximate REML variance components. Increasing n.iter will cause the resulting variance components to be closer to the fully interative REML estimates.

vcinit <- minque(response, covmat, Kerns, n.iter=2)
names(vcinit)
[1] "vc"        "beta"      "residuals"
vcinit$beta
          [,1]
[1,] 0.6279622
[2,] 1.0961475
vcinit$vc
          [,1]
[1,] 0.2279593
[2,] 0.4254118
[3,] 0.0000000
[4,] 0.0000000
[5,] 1.0161211

References

Schaid DJ, Sinnwell JP, Larson NB, Chen J (2020). Penalized Variance Components for Association of Multiple Genes with Traits. Genet Epidemiol, To Appear.