I suspect there are many blogs about R and why it’s so great:

- It’s free!
- It’s open source!
- There’s a great community!
- It’s reproducible!

You can certainly read on Twitter (#rstats) about what R cando *for
you* but what about what R *does to*
you, particularly as an analyst in the NHS or social care?

**The learning bit…**

Back in April 2018 when NHS-R Community was running its first introduction to R course in Leeds, my only knowledge of it had come from a free online base R course with edX that I hadn’t really understood and never finished. Online learning for me is like being back at school. I get the right answer, feel good, move on and promptly forget it all. After the NHS-R course I dabbled with the dplyr package, tried to run lots of other people’s scripts and generally failed a lot. It was a frustrating time of slow delivery of tasks and bafflement. When things did work, I had no idea why and I often wondered what all the fuss was about particularly as I could do the same things in familiar programmes.

Hindsight is a wonderful thing and I can now see my frustrations weren’t just one lump of confusion, but could be split down into how I used these ‘familiar programmes’, namely:

- SQL for data engineering and
- Excel for visualisations.

Although I still used (and use) SQL to get my data, I was copying it to Excel and then loading it into R; once loaded I’d then realise I needed to group by and count or remove something I didn’t need and it seemed too long-winded going back to SQL, copying to Excel and then loading it.

The second frustration of visualising in R came with the
inability to replicate the look of the Excel charts in R: getting the same
colours, the same font size headers and so on. I’ve yet to resolve that
completely but it was here that I realised the worth of R wasn’t in making it
look like Excel, but rather that it could do so much more than Excel. I needed
to start thinking about *what *I should
be visualising and *how* to do it.

**Sharing
methodology**

Over my time in the NHS I have learned to be cautious –
protective even – of data. But that has led to the misguided fear of sharing
technical knowledge which was never a conscious thing, just that’s how it is. However,
R has a reputation of *sharing* which
has resulted in an *expectation* of
sharing. And that isn’t just within your own team or organisation, it’s so much
wider – it can even be the world.

As an example of why it’s harder to share Excel methodology, I’ve previously built a benchmarking Excel spreadsheet using MATCH and INDEX so that the bar charts automatically coloured the organisation I worked for and ordered the bars in graphs from the greatest to the smallest. It was one of those tasks that took a lot of effort to automate, looked simple when done but was heavily dependent on the data being in just the right cell or it would break.

Just updating it with a new year’s data would take great care so the thought of writing out the methodology to share never occurred to me. Writing it out would involve describing the positions of data, what the formulae did and how bits all linked. That’s a laborious task which is not necessary if you don’t plan to share – and as there was no need to share, I didn’t.

**It’s all about the
data**

The entire process,
from SQL to Excel, is about the data, e.g. how it joins and what it’s counting.
To get the data ‘just so’, it often requires so many clever solutions to so
many problems that, as I now realise, it consumes so much **thinking** time that there’s often little energy left for considering
why I am doing this – is it the best thing to do and how can I get more meaning
from the data?

If I’d picked up someone else’s script or Excel document on ordering benchmarking data, perhaps the time I spend would be on improving it instead of building it. In a perfect world, I would feed back on what I’d done and share any improvements or corrections.

**But what can R do that SQL and Excel can’t?**

As a very simple example, consider creating a median average.

In Excel it’s reasonably easy to use the formula MEDIAN() but to make it dynamic (such that it automatically updates if you add more data), the formula becomes much more complicated. Here is a page explaining how it’s done:

https://www.extendoffice.com/documents/excel/2611-excel-average-dynamic-range.html

There are lots of graphics are used to describe how to do it and you’ll note this is for AVERAGE which is mean rather than median.

In SQL, creating the MEDIAN can be solved various ways:

https://stackoverflow.com/questions/1342898/function-to-calculate-median-in-sql-server

There are 204 examples to go through to solve this! I didn’t go through it as that’s too much needless thinking required when R can do this…

median(table$column)

Something this
simple removes all that otherwise inevitable necessary thinking to figure out
the best way to get the median… and then having to check that it is right.
Although that may be easy enough in Excel, I know I make mistakes and will have
to repeat the exercise more times than I care to admit, and doing so in Excel
will involve so many steps that each require checking. All of this uses up that
precious resource of focused thinking. With R doing the job so quickly and
reliably I now have time to consider if median is actually the right thing to
use, particularly if it’s a case of “we’ve always done that”. Then I can ponder
on what is it telling me; is it increasing/decreasing and why is that? Is there
any other data that could help explain this change? ** **

Like any great piece of analysis or coding, time invested at the beginning pays off. With the benchmarking example of reordered bar charts, spending a couple of days getting it right made something interactive that was engaging for the users. But Excel continues to require thinking and time checking whereas R doesn’t; once it’s coded, that’s it. And that’s where the analyst should be, that space after the work is done. That’s where you can think about the very analysis itself; was it enough? What does it tell you? What else can you look at? Although I’ve had analyst in my job title for a long time, this is what analysis is all about and it’s something, I now realise, that I’ve not been doing because my “thinking time” has been used up elsewhere.

** This blog was written Zoë Turner, Senior Information Analyst at Nottinghamshire Healthcare NHS Trust.**

**@Letxuga007 and @AppliedInfoNott**