Promoting the use of R in the NHS

Blog Article

This post was originally published on this site

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

By Hong Ooi and Alex Kyllo

This post is to announce the availability of AzureKusto, the R interface to Azure Data Explorer (internally codenamed “Kusto”), a fast, fully managed data analytics service from Microsoft. It is available from CRAN, or you can install the development version from GitHub via devtools::install_github("cloudyr/AzureKusto").

AzureKusto provides an interface (including DBI compliant methods for connecting to Kusto clusters and submitting Kusto Query Language (KQL) statements, as well as a dbplyr style backend that translates dplyr queries into KQL statements. On the administrator side, it extends the AzureRMR framework to allow for creating clusters and managing database principals.

Connecting to a cluster

To connect to a Data Explorer cluster, call the kusto_database_endpoint() function. Once you are connected, call run_query() to execute queries and command statements.


## Connect to a Data Explorer cluster with (default) device code authentication
Samples kusto_database_endpoint(
database="Samples") res run_query(Samples,
"StormEvents | summarize EventCount = count() by State | order by State asc") head(res) ## State EventCount ## 1 ALABAMA 1315 ## 2 ALASKA 257 ## 3 AMERICAN SAMOA 16 ## 4 ARIZONA 340 ## 5 ARKANSAS 1028 ## 6 ATLANTIC NORTH 188 # run_query can also handle command statements, which begin with a '.' character res run_query(Samples, ".show tables | count") res[[1]] ## Count ## 1 5

dplyr Interface

The package also implements a dplyr-style interface for building a query upon a tbl_kusto object and then running it on the remote Kusto database and returning the result as a regular tibble object with collect(). All the standard verbs are supported.

StormEvents tbl_kusto(Samples, "StormEvents")
q %>%
    group_by(State) %>%
    summarize(EventCount=n()) %>%
show_query(q) ## database('Samples').['StormEvents'] ## | summarize ['EventCount'] = count() by ['State'] ## | order by ['State'] asc
(q) ## # A tibble: 67 x 2 ## State EventCount ## ## 1 ALABAMA 1315 ## 2 ALASKA 257 ## 3 AMERICAN SAMOA 16 ## ...

DBI interface

AzureKusto implements a subset of the DBI specification for interfacing with databases in R.

The following methods are supported:

  • Connections: dbConnect, dbDisconnect, dbCanConnect
  • Table management: dbExistsTable, dbCreateTable, dbRemoveTable, dbReadTable, dbWriteTable
  • Querying: dbGetQuery, dbSendQuery, dbFetch, dbSendStatement, dbExecute, dbListFields, dbColumnInfo

It should be noted, though, that Data Explorer is quite different to the SQL databases that DBI targets. This affects the behaviour of certain DBI methods and renders other moot.


Samples dbConnect(AzureKusto(),

## [1] "StormEvents"       "demo_make_series1" "demo_series2"     
## [4] "demo_series3"      "demo_many_series1"

dbExistsTable(Samples, "StormEvents")
##[1] TRUE

dbGetQuery(Samples, "StormEvents | summarize ct = count()")
##      ct
## 1 59066

If you have any questions, comments or other feedback, please feel free to open an issue on the GitHub repo.

And one more thing…

As of Build 2019, Data Explorer can also run R (and Python) scripts in-database. For more information on this feature, currently in public preview, see the Azure blog and the documentation article.



To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Comments are closed.