AphA May2023 Blog

First published for the AphA May 2023 newsletter:

Conferences update

The news update on the upcoming conferences that NHS-R Community is involved with:

HACA23 11-12 July

The inaugural hybrid Health and Care Analytics Conference HACA23 is fast approaching and has received over 150 abstracts. Tickets are free and have all gone now for the in-person but you can sign up to the waiting list and there are plenty of virtual tickets. Follow the Twitter account HACA_Conf for more information.

NHSR Community Conference 17-18 October

The hybrid NHS-R Conference is in its 6th year and the call for abstracts is out now. Talks can be in R and Python, both or with other languages and we are keen to get a range of talks from people from wherever people are in their R and Python journeys.

Documentation about the NHS-R Community

If you’ve ever wondered who is behind the email address nhs.rcommunity@nhs.net, how to get involved with NHS-R Community, where and what is Slack, what to do expect when you sign up to a course then the newly started NHS-R Way book might help. This is based on The Turing Way and currently is a work in progress so comments and suggestions are very welcome.

The NHS-R Way is published through GitHub and Netlify as not every organisation allows access to one or the other site. If you find your security blocks both sites, please just let us know at nhs.rcommunity@nhs.net.

Not just R

Although the NHS-R Community is called NHS, its membership is wider than the NHS and we are interested in more languages than R. With that in mind it might be interesting to see what has been shared on the NHS-R Slack recently about SQL.

SQL finds

In the #finds channel, which was set up by Chris Beeley:

Partly inspired by a Linux podcast that I listen to, and partly by a message in #help-with-r, I have created a channel for posting cool stuff you find. Links, code snippets, communities, whatever.

I am going to add a “Finds” section to the newscast episodes of the podcast, with raw material from the work of myself and the co-hosts, as well as (hopefully) stuff from here too

we had the link https://mystery.knightlab.com/ shared which uses “SQL queries to solve the murder mystery”. Suitable for beginners or experienced SQL sleuths!

Please come and let us know if you have any other training suggestions for SQL!

SQL help

We also had a question in the #help-with-r channel from a newbie SQL coder asking:

How do I summarise a table by concatenating values (or pivot then concatenate across cols)? I don’t know in advance what the output columns of a pivot will be (or at least I can’t be bothered to list them out).

Example input:

Table of one user with two issues A and B in two rows

Desired output:

User1IssueA, IssueB
Table of one user with two issues A and B in one row

In R I would want to do something like:

tibble::tibble(Entity = "User1", 
               Issues = c("IssueA", 
                          "IssueB")) |>
                          \(x) stringr::str_flatten(x,
                                                    collapse = ",")), 
                   .by = Entity)
# A tibble: 1 × 2
  Entity Issues       
  <chr>  <chr>        
1 User1  IssueA,IssueB

In SQL if I try to use CONCAT it says it needs at least 2 arguments. I also can’t get my head round how PIVOT works, yet.

This type of question is brilliant because quite a lot of people already have an idea of how to solve the problem in one language and want to know how to do that in another. It can often be a simple translation like R’s {dplyr} function arrange() is ORDER BY in SQL. However, sometimes we need to approach the problem slightly differently because of the way the languages work.

In SQL CONCAT needs to have inputs of distinct columns into the function so CONCAT(colm1, colm2) but as this example doesn’t yet have those separate columns, the question is how to pivot that data from the long form it’s currently in, to something wider with each column referring to a single issue.

In R’s {tidyr} package, part of the {tidyverse} along with {dplyr}, we’d use the functions pivot_wider() or pivot_longer(). The concept of pivotting exists in SQL but is different in that the code needs to stipulate each and every column and for this analysis those columns are not necessarily known in advance to hard code.

One way we thought to get around this, as R users, was to use a package called {dbplyr} which, like {tidyverse}, is maintained by Posit and has functions that convert R {dplyr} commands to SQL, running them directly on SQL databases. That’s a great way of utilising the power of SQL and also helpful in that it can give you the code it’s translated.

Specific SQL solutions

Suggestions for SQL included:

  • Stackoverflow page which suggests using XML to solve the question: How to create a SQL Server function to “join” multiple rows from a subquery into a single delimited field?
  • use a recursive CTE (Common Text Expression)
  • if you can set an upper limit on the number of items, then PIVOT or MAX(CASE WHEN IssueA THEN Issue END)
  • Use LISTAGG (Oracle) or STRING_AGG for example SELECT Entity, STRING_AGG([Issues], ',') FROM table GROUP BY Entity;

There may be other solutions to this and it would be great to hear from you!

Demos and How tos

One place the NHS-R Community shares snippets of useful code is in the GitHub repository demos-and-how-tos. This is currently a series of folders but could possibly be better in a book format. We’ve got an issue about this along with other suggestions which anyone can contribute to by adding more suggestions or even picking up and issue or two!