Tweak printCoefmat() - better Round()ing / Zapping Zeros

I/O

Wishlist

In rare cases, the rounding, e.g. of the ‘Std. Error’. column in printCoefmat() is undesirably rounding too much and then depicting inaccurate values. This is from the recent R-devel mailing list thread starting at https://stat.ethz.ch/pipermail/r-help/2023-July/477688.html

Author

Martin Maechler

Background

Printing of e.g. lm() results; but also glm(), anova(), etc; not just package {stats} but used by other packages as well, e.g., lme4.

Problem statement

Slightly modified from Shu Fai Cheung’s example on the R-devel mailing list:

set.seed(123)
n <- 1000
x1 <- rnorm(n)
x2 <- rnorm(n)
y <- 0.000754 + .5 * x1 + .6 * x2 + .036*rnorm(n)
dat <- data.frame(x1, x2, y)
fm <- lm(y ~ x1 + x2, dat)
##    ---
sfm <- summary(fm)
print(sfm)


Call:
lm(formula = y ~ x1 + x2, data = dat)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.102097 -0.022598 -0.001331  0.023538  0.121632 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 3.761e-07  1.115e-03     0.0        1    
x1          4.992e-01  1.128e-03   442.4   <2e-16 ***
x2          6.010e-01  1.108e-03   542.3   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.03524 on 997 degrees of freedom
Multiple R-squared:  0.9981,    Adjusted R-squared:  0.9981 
F-statistic: 2.677e+05 on 2 and 997 DF,  p-value: < 2.2e-16

The standard print is fine, but we would like the intercept to be “zapped to zero”. So we try zap.ind = 1 (which is passed to printCoefmat by print.summary.lm):

print(sfm, zap.ind = 1)


Call:
lm(formula = y ~ x1 + x2, data = dat)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.102097 -0.022598 -0.001331  0.023538  0.121632 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.000000   0.001115     0.0        1    
x1          0.499200   0.001128   442.4   <2e-16 ***
x2          0.601000   0.001108   542.3   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.03524 on 997 degrees of freedom
Multiple R-squared:  0.9981,    Adjusted R-squared:  0.9981 
F-statistic: 2.677e+05 on 2 and 997 DF,  p-value: < 2.2e-16

Here, we’d rather like to see correctly rounded coefficients

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.000000   0.001115     0.0        1    
x1          0.499263   0.001128   442.4   <2e-16 ***
x2          0.600990   0.001108   542.3   <2e-16 ***

Proposed solution

Improve printCoefmat().

The use of zapsmall() there, can round too much in these cases.

One option is to extend the meaning of the zap.ind argument and allow negative indices to mean that the zapping should not be done with zapsmall() but with an alternative function (to be defined inside printCoefmat()):

zapOnlysmall <- function(x, dig) {
    x[abs(x) <= 10^-dig] <- 0
    x
}

I had tried the following patch

--- anova.R (revision 84998)
+++ anova.R (working copy)
@@ -105,7 +105,13 @@
 
     ok <- !(ina <- is.na(xm))
     ## zap before deciding any formats
-    for (i in zap.ind) xm[, i] <- zapsmall(xm[, i], digits)
+    zapOnlysmall <- function(x, dig) { x[abs(x) <= 10^-dig] <- 0 ; x }
+    for (i in zap.ind) { ai <- abs(i) # i < 0 [zapOnly..]  vs  i > 0 [zapsmall]:
+        xm[, ai] <- (if(i < 0) zapOnlysmall else zapsmall)(xm[, ai], digits)
+        ## problem: if one of these is a cs.ind (below) and is of larger order of magn., it is
+        ## -------  rounded too much here --> shows trailing 00 afterwards.
+   ## ==> TODO: detect and *fix* rather than needing negative zap.ind for zapOnlysmall
+    }

which “solved” the problem, but actually also changes other examples (not shown here) where a change is not necessary.

Also, see my “TODO” above, I think we could try to make this automatic, so the zap.ind = 1 example above would just do “the right thing”.

Testing

This is not yet a strict test, but code you can/should use this code to get a broader picture. Note that this produces a few pages of R console output :

cf <- coef(sfm)

for(dig in c(3:5, 7, 10)) {
    cat("\ndigits = ", dig,":\n================\n")
    for(f in 10^c(-6, -3:3, 6, 9)) {
        cat("cf[,1:2] := cf0[,1:2] * f, f=", formatC(f),":\n")
        fcf <- cf; fcf[,1:2] <- f*cf[,1:2]
        printCoefmat(fcf, signif.legend=FALSE, digits=dig) # using e-07 for intercept --> everything in sci.format
        cat(" ... and with zap.ind = 1:\n")
        printCoefmat(fcf, signif.legend=FALSE, digits=dig, zap.ind = 1)
        cat("-------------------------\n\n")
    }
}

Project requirements

Almost none. Some understanding about floating point arithmetic, notably rounding.

Project outcomes

A patch to the R source. Code changes only in the first file (anova.R) but a really useful patch also updates the documentation (*.Rd) notably adding an example there.

As I have seen (with my own first fix), that it may change output of other examples, and for some of these, the tests/Examples/<pkg>-Ex.Rout.save files also need patching ( tests/Examples/stats-Ex.Rout.save will need an update in any case if we modify/add examples on the help page). Files to change:

doc/NEWS.Rd
src/library/stats/R/anova.R
src/library/stats/man/printCoefmat.Rd
tests/Examples/{stats,graphics,datasets}-Ex.Rout.save