Promoting the use of R in the NHS

Blog Article

Hello. My name is Julian and I am an R addict. I got hooked about 3 years ago when I took on a new role in Public Health England developing a public health data science team. My professional background is as a doctor and Consultant in Public Health and have spent the last 15 years in the health intelligence field so I thought I knew something about data and analysis. I realised I didn’t know anything about data science so I decided to do a course and ended up doing the Coursera data science MOOC from Johns Hopkins (https://www.coursera.org/specializations/jhu-data-science) because it was health related. For the course, you need to learn R – and so my habit started. (It turned out I knew nothing about data and analysis as well).

I had done an R course 15 years ago but never used it. Any analysis I did used spreadsheets, SPSS, Mapinfo and host of other tools, and I had never written a single line of code until 3 years ago (apart some very basic SQL). That’s all changed.

Apart from a brief obsession with Tableau a few years ago (which I still love), learning R has for me, been utterly transformational. Now my basic analytical workflow is R + Google (for getting answers when you are stuck) + Git (for sharing and storing code) + Mendeley (reference management software). That’s it.

I barely open Excel except to look at data structure so I can import data into R; I don’t use GIS; I hardly even open word to write a document – I do that in R (like this blog); and recently the option to output to power point has appeared in R Markdown so I’ve started using that as well.

On top of that I have learned a whole heap of analytical and other skills through using R. I feel comfortable getting and analysing data of any size, shape and complexity including text, websites, APIs, very large datasets; and quickly. I can now rapidly produce scientific reports, scrape websites, mine text, automate analysis, build machine learning pipelines, create high quality graphics using the fab ggplot2 and its relatives, have co-authored a package to read Fingertips data (fingertipsR – very proud of this) and am getting my head around regular expressions. I have even managed a couple of Shiny documents. There is nothing I have wanted to do that I can’t do in R; and a huge range of things I didn’t know you could do or had never heard of.

So what is it about R that makes it so great? In the last 5 years it has moved from an academic stats package to a professional data science tool. One of the reasons is the development of the tidy data framework [1] and tools to make data wrangling or munging much easier. This is a much overlooked part of the analysts life – all the things you need to do with data before you can analyse it (50 – 70% of the process) has been paid serious attention and made much easier with packages like dplyr and tidyr. And a lot of attention has been made to making coding more logical and syntax more “English”. Another reason is the development of R Studio and R Markdown which give you button press outputs in a range of high quality formats. And there is a focus on reproducibility – the ability for analysis to be repeated exactly which requires combining data, analysis and results in a form others can follow. This is good science and will become much more widespread. You can do this in R and Git.

My addiction has infected my team and the analytical community in PHE. We are spreading R rapidly and writing packages to automate routine analysis and reporting. We routinely use Gitlab to share and collaborate on code, and are introducing software development ideas like code review and unit testing. In short we are trying to help analysts (if they want to) become analyst-developers.

There are downsides to R of course. There is a (big) learning curve, ICT get twitchy, there is a huge range of packages and any number of ways of doing things, and things often break. But as any addict would say, these are just obstacles to be overcome and there is a lot of support out there.

R is not the only direction of travel – we do use PowerBI (running R scripts), and we do a bit of development in Python, but one thing is certain – I can’t go back to pre R days.

So there’s my confession. I’m a data junkie and an R addict. If you want to see my descent I put stuff on an RPubs page from time to time and I have a Github page. If you want to help me – feel free to get into touch or send me a pull request.

Thanks to Seb Fox at PHE and David Whiting of Medway Council for inspiration and support.

References

1 Wickham H. 2014;59:1–23. doi:10.18637/jss.v059.i10

Leave a Reply