Promoting the use of R in the NHS

### Daniel Weiand – Consultant Medical Microbiologist – Newcastle upon Tyne NHS Foundation Trust

Hello!

This is my first blog post for the NHS R Community, which I stumbled across in the course of my work as a consultant medical microbiologist at Newcastle upon Tyne Hospitals NHS Foundation Trust.

At work, I’ve been trying to use R to create column charts featuring 95% confidence intervals. I approached the friendly people on the NHS R Community’s Slack channel for further information and guidance.

I must add here that the Community’s Slack channel has been extremely helpful to me, as a novice R user, in solving some of the issues I’ve experienced, and highlighting R packages of potential interest. This is the first time I’ve tried to create a ReprEx and now I understand why people love (?) the mtcars database as much as they do!

### Step 1: Calculate some summary statistics

I wanted to calculate some summary statistics, including the mean, and standard error or 95% confidence intervals.

Initially I came across the summary() function of Base R, which is helpful as it calculates the Min., 1st Qu., Median, Mean, 3rd Qu., and Max.

However, the summary() function of {base} R does not calculate either the standard error or the 95% confidence intervals

``````#calculate summary statistics for all numeric data using summary() and where(is.numeric())
mtcars %>%
select(where(is.numeric)) %>%
summary()

#calculate summary statistics for mpg using summary() and where(is.numberic())
mtcars %>%
select(mpg) %>%
summary()

``````

Then zx8754 very kindly pointed me towards a method for calculating the standard error on StackOverflow: https://stackoverflow.com/q/2676554/680068

``````#create stderr function
stderr <- function(x, na.rm=TRUE) {
if (na.rm) x <- na.omit(x)
sqrt(var(x)/length(x))
}
``````

Then I used this function to calculate summary statistics, incl. mean and standard error, using the summarise() and across() functions of {dplyer}

``````#calculate summary statistics using summarise() and across() and n/mean/min/median/max/sd/stderr
# stderr <- function(x, na.rm=TRUE) {
#  if (na.rm) x <- na.omit(x)
#  sqrt(var(x)/length(x))
# }
mtcars %>%
group_by(cyl) %>%
mutate(
across(mpg,
list(
n = ~ n(),
mean = ~ mean(.x, na.rm = TRUE),
min = ~ min(.x, na.rm = TRUE),
median = ~ median(.x, na.rm = TRUE),
max = ~ max(.x, na.rm = TRUE),
sd = ~ sd(.x, na.rm = TRUE),
stderr = ~ stderr(.x)),
.names = NULL)) %>%
select(starts_with("mpg")) %>%
summarise(mean = mean(mpg_mean),
min = mean(mpg_min),
median = mean(mpg_median),
max = mean(mpg_max),
sd = mean(mpg_sd),
stderr = mean(mpg_stderr)) %>%
#create column chart with error bars (using stderr)
ggplot(aes(cyl, mean))+
geom_col(na.rm = TRUE)+
geom_errorbar(aes(ymin = mean-stderr, ymax = mean+stderr), position = "dodge", width = 0.25)
#calculate summary statistics using summarise() and across() and n/mean/min/median/max/sd/stderr
# stderr <- function(x, na.rm=TRUE) {
#  if (na.rm) x <- na.omit(x)
#  sqrt(var(x)/length(x))
# }
mtcars %>%
group_by(cyl) %>%
mutate(
across(mpg,
list(
n = ~ n(),
mean = ~ mean(.x, na.rm = TRUE),
min = ~ min(.x, na.rm = TRUE),
median = ~ median(.x, na.rm = TRUE),
max = ~ max(.x, na.rm = TRUE),
sd = ~ sd(.x, na.rm = TRUE),
stderr = ~ stderr(.x)),
.names = NULL)) %>%
select(starts_with("mpg")) %>%
summarise(mean = mean(mpg_mean),
min = mean(mpg_min),
median = mean(mpg_median),
max = mean(mpg_max),
sd = mean(mpg_sd),
stderr = mean(mpg_stderr)) %>%
#create column chart with error bars (using stderr)
ggplot(aes(cyl, mean))+
geom_col(na.rm = TRUE)+
geom_errorbar(aes(ymin = mean-stderr, ymax = mean+stderr), position = "dodge", width = 0.25)
``````

### Step 2: Create column charts with error bars (using 95% confidence intervals)

Then Seb Fox pointed me towards a method for calculating 95% confidence intervals using the {PHEindicatormethods} package, available on CRAN: https://cran.r-project.org/web/packages/PHEindicatormethods/index.html

``````#create MEAN column chart with error bars (using 95% confidence intervals)
require(PHEindicatormethods)
mtcars %>%
filter(!is.na(cyl)) %>%
group_by(cyl) %>%
#use phe_mean()
phe_mean(x = mpg, #field name from data containing the values to calculate the means for
type = "full", #defines the data and metadata columns to be included in output; can be "value", "lower", "upper", "standard" (for all data) or "full" (for all data and metadata); quoted string; default = "full"
confidence = 0.95) %>% #required level of confidence expressed as a number between 0.9 and 1
#create column chart with error bars (using 95% CI calculated using phe_mean())
ggplot(aes(cyl, value))+
geom_col(na.rm = TRUE)+
geom_errorbar(aes(ymin = lowercl, ymax = uppercl), position = "dodge", width = 0.25)

#create PROPORTION column chart with error bars (using 95% confidence intervals)
require(PHEindicatormethods)
mtcars %>%
group_by(cyl) %>%
summarise(n = n(),
sum = sum(n)) %>%
mutate(sum = sum(n)) %>%
#phe_proportion()
phe_proportion(x = n, #numerator
n = sum, #denominator
type = "full", #defines the data and metadata columns to be included in output; can be "value", "lower", "upper", "standard" (for all data) or "full" (for all data and metadata); quoted string; default = "full"
confidence = 0.95, #required level of confidence expressed as a number between 0.9 and 1
multiplier = 100) %>%   #the multiplier used to express the final values (eg 100 = percentage); numeric; default 1
#create column chart with error bars (using 95% CI calculated using phe_proportion())
ggplot(aes(cyl, value))+
geom_col(na.rm = TRUE)+
geom_errorbar(aes(ymin = lowercl, ymax = uppercl), position = "dodge", width = 0.25)

``````

I hope that the code, above, helps a few colleagues of mine across the NHS, in some small way.

Thank you, again, to all members of the NHS R Community, for all your help. Particular thanks go to everyone who has helped me, to date, on the NHS R Community’s Slack channel.

### Daniel Weiand

(Consultant medical microbiologist)

Newcastle Upon Tyne Hospitals NHS Foundation Trust

### 3 Responses to “Using R to create column charts featuring 95% confidence intervals”

• Chuck Burks

I used to use a function like that, then I realized that I could get a function for standard error through the FSA package, FSA::se(). Arguments can be made either way; why reinvent the wheel vs. why load a large package to avoid writing one function.

• Qin Zeng

This works out about the same:

mtcars %>%
group_by(cyl) %>%
summarise_at(vars(mpg), funs(n(), mean, min, median, max, sd, stderr)) %>%
ggplot(aes(cyl, mean))+
geom_col(na.rm = TRUE)+
geom_errorbar(aes(ymin = mean-stderr, ymax = mean+stderr), position = “dodge”, width = 0.25)

• Stephen

Two suggestions:

mtcars %>%
group_by(cyl) %>%
summarise(n = n(),
sum = sum(n)) %>%
mutate(sum = sum(n))

is more easily said as

mtcars %>%
count(cyl) %>%
mutate(sum = sum(n))

Qin, your sequence is better as

mtcars %>%
group_by(cyl) %>%
summarise(n(), across(mpg, c(mean, min, median, max, sd, stdeQi,

in modern dplyr (summarise_at is deprecated)