From script-based development to function-based development and onwards to Package Based development

At the NHS R Conference, I suggested to people that they should embrace the idea of package-based development rather than script-based work.

I’m going to talk you through that process, using the simplest of scripts – ‘Hello World’. I’m going to assume that you’re using the freely available RStudio Desktop Edition as the editor for this: other versions of RStudio are likely to be essentially identical. Non R-Studio users may need to revert to more basic principles.

First let’s write ‘Hello World’ – the simplest R script in the world. Open a new R script file and get your script underway:

message(“Hello World from the NHS-R Community!”) Save it (for posterity).

In the conference, we discussed that generally writing functions is more helpful than writing scripts as it gives you greater re-usability. This example is a very trivial one (in fact so trivial as to be nearly redundant).

So we consider our script carefully and determine – what does it DO? Clearly it’s a way of greeting a person. What if we wanted to greet the person in a different way? What if we wanted to greet a different person?

So we have defined it’s purpose, and the parameters that are likely to be useful to others.

Let’s re-write our script to be more useable.

We define a function using the function function. You can see a much more detailed tutorial of how to do this here: http://stat545.com/block011_write-your-own-function-01.html

A function is defined by assigning the result of the function() function to a variable which is the function name. The parameters of function() are our new parameter names in our function.

It is really important to name your function clearly so people know what it does. Generally use active verbs to describe intent, and a consistent naming scheme. Also choose appropriate and clear names for the parameters. So let’s call our new function greet_person, and we will call our parameters greeting and recipient.

Our new code will look like this. Stick this into a new R script for now and run it:

greet_person <- function(greeting, sender) {

  message(greeting, " from ", sender)

}

Once you’ve run your script you can now call your function from the console:

greet_person(“Hello World”, “the NHS-R Community!”) And of course if you want to use a different greeting we can now change our parameter value:

greet_person(“Welcome”, “the NHS-R Community!”) So far so good.

But – we’ve had to repeat our sender parameter. What if we know we’re usually going to use that first Hello World greeting; but we just want the option of doing something different if the situation arises?

We can get around that by supplying default values. In the function() function we can set a value to both greeting and sender using =. Let’s set default values for greet_person:

greet_person <- function(greeting = "Hello World", sender = "the NHS-R Community!") 

{
  
  message(greeting, " from ", sender)

}

Now if you want our ‘default’ message you can just call:

greet_person()

But you can customise either parameter without having to specify anything you don’t want to change:

greet_person(sender = "Drew Hill")

Notice how I’ve specified which parameter I’m supplying? If I don’t do that, R doesn’t know whether we want to specify the greeting, or the string (and assumes you mean the first parameter unless told otherwise). I’d strongly recommend specifying every parameter name to every function you call, unless you have a function which only has one parameter. That means if you (or someone else) changes the function later you’ve been explicit and it won’t get misunderstood.

OK – so far so good.

Any sort of software is best to be robust though: so what happens if we abuse our newly created function? The golden rule of creating robust code is ‘fail early, fail cleanly and fail clearly’.

Our function is clearly designed to use strings. The good news for us is that many things can be converted to strings: so let’s say you provided a number into one of those parameters, it will work:

greet_person(sender = 1)

Instead of “Drew Hill” from our previous example, you’ll see the sender is “1”.

What if you accidentally sent a vector of names? R will turn those into a concatenated string of names without spaces:

greet_person(sender = c("Bob", "Jim"))

Some things however certainly could break this process – so it is really important to check that you can handle the inputs you receive within a function before trying to use them.

The first thing we need to do is to make sure we are dealing with something that can be turned into a character. We can check that by using the is.character function – which returns TRUE if a given value is TRUE, and FALSE if it is not something that can be turned into a character.

If is.character is false, we want to stop with an error:

greet_person <- function(greeting = "Hello World", sender = "the NHS-R Community!") {

  if (!is.character(greeting)) {

    stop("greeting must be a string")

  }

  if (!is.character(sender)) {

    stop("sender must be a string")

  }


  message(greeting, " from ", sender)

}

We can test how this works by using NULL as a parameter: in real life this happens quite a lot as you try to pass a variable to your new function but forget to set the variable earlier on!

greet_person(sender = NULL)

We also know that our function actually isn’t very good at handling vectors of strings (ie where there is more than one name): it will simply shove them all together without spaces. However it works and is perfectly functional. So we have a design decision: do we want to allow that, or not? A third way might be to allow it but to use a warning – perhaps a little over the top in our example, but for complex examples that may make more sense. Whereas stop will halt the code and force you to fix your bugs, the warning() function lets the code continue but tells you to go back and do it better later. Let’s add a warning if there was more than one sender:

greet_person <- function(greeting = "Hello World", sender = "the NHS-R Community!") {

  if (!is.character(greeting)) {

    stop("greeting must be a string")

  }

  if (!is.character(sender)) {

    stop("sender must be a string")

  }


  if (length(sender) > 1) {
warning("greet_person isn't very good at handling more than one sender. It is better to use just one sender at a time.")

  }

  message(greeting, " from ", sender)

}

If we now called the function with two senders we’d be able to do so but would get politely told that it’s not a good idea:

greet_person(sender = c("Jim", "Bob"))

So – hopefully from this you’ve moved from having a script which would only do precisely what you wanted in a single set of circumstances, to now having a natty little function which will say greet whoever you want, with the type of greeting that you want.

As an exercise to complete: imagine you work in the NHS-R community welcoming team. You are tasked with sending greetings from the team on a regular basis.

You used to use a script to do this and had to remember to get the style right every time – but now you sit at your console , run your script containing your function, and greet_person() on demand.

Your boss has come to you and urgently wants you to change your way of working. Rather than sending a greeting from the team using just a single team name, he wants you to send the individual names in the greeting from both Jim and Bob.

Have a think about how you could change the function so that we can cope with multiple senders.

The greetings will continue as we then think about scaling up the NHS R Community Greetings division in our next installment.

This blog was written by:

Dr. Andrew Hill

Clinical Lead for Stroke, St Helens and Knowsley Teaching Hospitals