Category Archives: R

easyMTS: My First R Package (Story, and Results)

This weekend I decided to create my first R package… it’s here! easyMTS makes it possible to create and evaluate a Mahalanobis-Taguchi System (MTS) for pseudo-classification:

https://github.com/NicoleRadziwill/easyMTS

Although I’ve been using R for 15 years, developing a package has been the one thing slightly out of reach for me. Now that I’ve been through the process once, with a package that’s not completely done (but at least has a firm foundation, and is usable to some degree), I can give you some advice:

  • Make sure you know R Markdown before you begin.
  • Some experience with Git and Github will be useful. Lots of experience will be very, very useful.
  • Write the functions that will go into your package into a file that you can source into another R program and use. If your programs work when you run the code this way, you will have averted many problems early.

The process I used to make this happen was:

I hope you enjoy following along with my process, and that it helps you write packages too. If I can do it, so can you!

My First R Package (Part 3)

After refactoring my programming so that it was only about 10 lines of code, using 12 functions I wrote an loaded in via the source command, I went through all the steps in Part 1 of this blog post and Part 2 of this blog post to set up the R package infrastructure using testthis in RStudio. Then things started humming along with the rest of the setup:

> use_mit_license("Nicole Radziwill")
✔ Setting active project to 'D:/R/easyMTS'
✔ Setting License field in DESCRIPTION to 'MIT + file LICENSE'
✔ Writing 'LICENSE.md'
✔ Adding '^LICENSE\\.md$' to '.Rbuildignore'
✔ Writing 'LICENSE'

> use_testthat()
✔ Adding 'testthat' to Suggests field in DESCRIPTION
✔ Creating 'tests/testthat/'
✔ Writing 'tests/testthat.R'
● Call `use_test()` to initialize a basic test file and open it for editing.

> use_vignette("easyMTS")
✔ Adding 'knitr' to Suggests field in DESCRIPTION
✔ Setting VignetteBuilder field in DESCRIPTION to 'knitr'
✔ Adding 'inst/doc' to '.gitignore'
✔ Creating 'vignettes/'
✔ Adding '*.html', '*.R' to 'vignettes/.gitignore'
✔ Adding 'rmarkdown' to Suggests field in DESCRIPTION
✔ Writing 'vignettes/easyMTS.Rmd'
● Modify 'vignettes/easyMTS.Rmd'

> use_citation()
✔ Creating 'inst/'
✔ Writing 'inst/CITATION'
● Modify 'inst/CITATION'

Add Your Dependencies

> use_package("ggplot2")
✔ Adding 'ggplot2' to Imports field in DESCRIPTION
● Refer to functions with `ggplot2::fun()`
> use_package("dplyr")
✔ Adding 'dplyr' to Imports field in DESCRIPTION
● Refer to functions with `dplyr::fun()`

> use_package("magrittr")
✔ Adding 'magrittr' to Imports field in DESCRIPTION
● Refer to functions with `magrittr::fun()`
> use_package("tidyr")
✔ Adding 'tidyr' to Imports field in DESCRIPTION
● Refer to functions with `tidyr::fun()`

> use_package("MASS")
✔ Adding 'MASS' to Imports field in DESCRIPTION
● Refer to functions with `MASS::fun()`

> use_package("qualityTools")
✔ Adding 'qualityTools' to Imports field in DESCRIPTION
● Refer to functions with `qualityTools::fun()`

> use_package("highcharter")
Registered S3 method overwritten by 'xts':
  method     from
  as.zoo.xts zoo 
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
✔ Adding 'highcharter' to Imports field in DESCRIPTION
● Refer to functions with `highcharter::fun()`

> use_package("cowplot")
✔ Adding 'cowplot' to Imports field in DESCRIPTION
● Refer to functions with `cowplot::fun()`

Adding Data to the Package

I want to include two files, one data frame containing 50 observations of a healthy group with 5 predictors each, and another data frame containing 15 observations from an abnormal or unhealthy group (also with 5 predictors). I made sure the two CSV files I wanted to add to the package were in my working directory first by using dir().

> use_data_raw()
✔ Creating 'data-raw/'
✔ Adding '^data-raw$' to '.Rbuildignore'
✔ Writing 'data-raw/DATASET.R'
● Modify 'data-raw/DATASET.R'
● Finish the data preparation script in 'data-raw/DATASET.R'
● Use `usethis::use_data()` to add prepared data to package

> mtsdata1 <- read.csv("MTS-Abnormal.csv") %>% mutate(abnormal=1)
> usethis::use_data(mtsdata1)
✔ Creating 'data/'
✔ Saving 'mtsdata1' to 'data/mtsdata1.rda'

> mtsdata2 <- read.csv("MTS-Normal.csv") %>% mutate(normal=1)
> usethis::use_data(mtsdata2)
✔ Saving 'mtsdata2' to 'data/mtsdata2.rda'

Magically, this added my two files (in .rds format) into my /data directory. (Now, though, I don’t know why the /data-raw directory is there… maybe we’ll figure that out later.) I decided it was time to commit these to my repository again:

Following the instruction above, I re-knit the README.Rmd and then it was possible to commit everything to Github again. At which point I ended up in a fistfight with git, again saved only by my software engineer partner who uses Github all the time:

I think it should be working. The next test will be if anyone can install this from github using devtools. Let me know if it works for you… it works for me locally, but you know how that goes. The next post will show you how to use it 🙂

install.packages("devtools")
install_github("NicoleRadziwill/easyMTS")

SEE WHAT WILL BECOME THE easyMTS VIGNETTE –>

My First R Package (Part 2)

In Part 1, I set up RStudio with usethis, and created my first Minimum Viable R Package (MVRP?) which was then pushed to Github to create a new repository.

I added a README:

> use_readme_rmd()
✔ Writing 'README.Rmd'
✔ Adding '^README\\.Rmd$' to '.Rbuildignore'
● Modify 'README.Rmd'
✔ Writing '.git/hooks/pre-commit'

Things were moving along just fine, until I got this unkind message (what do you mean NOT an R package???!! What have I been doing the past hour?)

> use_testthat()
Error: `use_testthat()` is designed to work with packages.
Project 'easyMTS' is not an R package.

> use_mit_license("Nicole Radziwill")
✔ Setting active project to 'D:/R/easyMTS'
Error: `use_mit_license()` is designed to work with packages.
Project 'easyMTS' is not an R package.

Making easyMTS a Real Package

I sent out a tweet hoping to find some guidance, because Stack Overflow and Google and the RStudio community were coming up blank. As soon as I did, I discovered this button in RStudio:

The first time I ran it, it complained that I needed Rtools, but that Rtools didn’t exist for version 3.6.1. I decided to try finding and installing Rtools anyway because what could I possibly lose. I went to my favorite CRAN repository and found a link for Rtools just under the link for the base install:

I’m on Windows 10, so this downloaded an .exe which I quickly right-clicked on to run… the installer did its thing, and I clicked “Finish”, assuming that all was well. Then I went back into RStudio and tried to do Build -> Clean and Rebuild… and here’s what happened:

IT WORKED!! (I think!!!)

It created a package (top right) and then loaded it into my RStudio session (bottom left)! It loaded the package name into the package console (bottom right)!

I feel like this is a huge accomplishment for now, so I’m going to move to Part 3 of my blog post. We’ll figure out how to close the gaps that I’ve invariably introduced by veering off-tutorial.

GO TO PART 3 –>

My First R Package (Part 1)

(What does this new package do? Find out here.)

I have had package-o-phobia for years, and have skillfully resisted learning how to build a new R package. However, I do have a huge collection of scripts on my hard drive with functions in them, and I keep a bunch of useful functions up on Github so anyone who wants can source and use them. I source them myself! So, really, there’s no reason to package them up and (god forbid) submit them to CRAN. I’m doing fine without packages!

Reality check: NO. As I’ve been told by so many people, if you have functions you use a lot, you should write a package. You don’t even have to think about a package as something you write so that other people can use. It is perfectly fine to write a package for an audience of one — YOU.

But I kept making excuses for myself until very recently, when I couldn’t find a package to do something I needed to do, and all the other packages were either not getting the same answers as in book examples OR they were too difficult to use. It was time.

So armed with moral support and some exciting code, I began the journey of a thousand miles with the first step, guided by Tomas Westlake and Emil Hvitfeldt and of course Hadley. I already had some of the packages I needed, but did not have the most magical one of all, usethis:

install.packages("usethis")

library(usethis)
library(roxygen2)
library(devtools)

Finding a Package Name

First, I checked to see if the package name I wanted was available. It was not available on CRAN, which was sad:

> available("MTS")
Urban Dictionary can contain potentially offensive results,
  should they be included? [Y]es / [N]o:
1: Y
-- MTS -------------------------------------------------------------------------
Name valid: ✔
Available on CRAN: ✖ 
Available on Bioconductor: ✔
Available on GitHub:  ✖ 
Abbreviations: http://www.abbreviations.com/MTS
Wikipedia: https://en.wikipedia.org/wiki/MTS
Wiktionary: https://en.wiktionary.org/wiki/MTS

My second package name was available though, and I think it’s even better. I’ve written code to easily create and evaluate diagnostic algorithms using the Mahalanobis-Taguchi System (MTS), so my target package name is easyMTS:

> available("easyMTS")
-- easyMTS ------------------------------------------------------------
Name valid: ✔
Available on CRAN: ✔ 
Available on Bioconductor: ✔
Available on GitHub:  ✔ 
Abbreviations: http://www.abbreviations.com/easy
Wikipedia: https://en.wikipedia.org/wiki/easy
Wiktionary: https://en.wiktionary.org/wiki/easy
Sentiment:+++

Create Minimum Viable Package

Next, I set up the directory structure locally. Another RStudio session started up on its own; I’m hoping this is OK.

> create_package("D:/R/easyMTS")
✔ Creating 'D:/R/easyMTS/'
✔ Setting active project to 'D:/R/easyMTS'
✔ Creating 'R/'
✔ Writing 'DESCRIPTION'
Package: easyMTS
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R (parsed):
    * First Last <first.last@example.com> [aut, cre] (<https://orcid.org/YOUR-ORCID-ID>)
Description: What the package does (one paragraph).
License: What license it uses
Encoding: UTF-8
LazyData: true
✔ Writing 'NAMESPACE'
✔ Writing 'easyMTS.Rproj'
✔ Adding '.Rproj.user' to '.gitignore'
✔ Adding '^easyMTS\\.Rproj$', '^\\.Rproj\\.user$' to '.Rbuildignore'
✔ Opening 'D:/R/easyMTS/' in new RStudio session
✔ Setting active project to '<no active project>'

Syncing with Github

use_git_config(user.name = "nicoleradziwill", user.email = "nicole.radziwill@gmail.com")

browse_github_token()

This took me to a page on Github where I entered my password, and then had to go down to the bottom of the page to click on the green button that said “Generate Token.” They said I would never be able to see it again, so I gmailed it to myself for easy searchability. Next, I put this token where it is supposed to be locally:

edit_r_environ()

A blank file popped up in RStudio, and I added this line, then saved the file to its default location (not my real token):

GITHUB_PAT=e54545x88f569fff6c89abvs333443433d

Then I had to restart R and confirm it worked:

github_token()

This revealed my token! I must have done the Github setup right. Finally I could proceed with the rest of the git setup:

> use_github()
✔ Setting active project to 'D:/R/easyMTS'
Error: Cannot detect that project is already a Git repository.
Do you need to run `use_git()`?
> use_git()
✔ Initialising Git repo
✔ Adding '.Rhistory', '.RData' to '.gitignore'
There are 5 uncommitted files:
* '.gitignore'
* '.Rbuildignore'
* 'DESCRIPTION'
* 'easyMTS.Rproj'
* 'NAMESPACE'
Is it ok to commit them?

1: No
2: Yeah
3: Not now

Selection: use_github()
Enter an item from the menu, or 0 to exit
Selection: 2
✔ Adding files
✔ Commit with message 'Initial commit'
● A restart of RStudio is required to activate the Git pane
Restart now?

1: No way
2: For sure
3: Nope

Selection: 2

When I tried to commit to Github, it was asking me if the description was OK, but it was NOT. Every time I said no, it kicked me out. Turns out it wanted me to go directly into the DESCRIPTION file and edit it, so I did. I used Notepad because this was crashing RStudio. But this caused a new problem:

Error: Uncommited changes. Please commit to git before continuing.

This is the part of the exercise where it’s great to be living with a software engineer who uses git and Github all the time. He pointed me to a tiny little tab that said “Terminal” in the bottom left corner of RStudio, just to the right of “Console”. He told me to type this, which unstuck me:

THEN, when I went back to the Console, it all worked:

> use_git()
> use_github()
✔ Checking that current branch is 'master'
Which git protocol to use? (enter 0 to exit) 

1: ssh   <-- presumes that you have set up ssh keys
2: https <-- choose this if you don't have ssh keys (or don't know if you do)

Selection: 2
● Tip: To suppress this menu in future, put
  `options(usethis.protocol = "https")`
  in your script or in a user- or project-level startup file, '.Rprofile'.
  Call `usethis::edit_r_profile()` to open it for editing.
● Check title and description
  Name:        easyMTS
  Description: 
Are title and description ok?

1: Yes
2: Negative
3: No

Selection: 1
✔ Creating GitHub repository
✔ Setting remote 'origin' to 'https://github.com/NicoleRadziwill/easyMTS.git'
✔ Pushing 'master' branch to GitHub and setting remote tracking branch
✔ Opening URL 'https://github.com/NicoleRadziwill/easyMTS'

This post is getting long, so I’ll split it into parts. See you in Part 2.

GO TO PART 2 –>

Object of Type Closure is Not Subsettable

I started using R in 2004. I started using R religiously on the day of the annular solar eclipse in Madrid (October 3, 2005) after being inspired by David Hunter’s talk at ADASS.

It took me exactly 4,889 days to figure out what this vexing error means, even though trial and error helped me move through it most every time it happened! I’m so moved by what I learned (and embarrassed that it took so long to make sense), that I have to share it with you.

This error happens when you’re trying to treat a function like a list, vector, or data frame. To fix it, start treating the function like a function.

Let’s take something that’s very obviously a function. I picked the sampling distribution simulator from a 2015 blog post I wrote. Cut and paste it into your R console:

sdm.sim <- function(n,src.dist=NULL,param1=NULL,param2=NULL) {
   r <- 10000  # Number of replications/samples - DO NOT ADJUST
   # This produces a matrix of observations with  
   # n columns and r rows. Each row is one sample:
   my.samples <- switch(src.dist,
	"E" = matrix(rexp(n*r,param1),r),
	"N" = matrix(rnorm(n*r,param1,param2),r),
	"U" = matrix(runif(n*r,param1,param2),r),
	"P" = matrix(rpois(n*r,param1),r),
        "B" = matrix(rbinom(n*r,param1,param2),r),
	"G" = matrix(rgamma(n*r,param1,param2),r),
	"X" = matrix(rchisq(n*r,param1),r),
	"T" = matrix(rt(n*r,param1),r))
   all.sample.sums <- apply(my.samples,1,sum)
   all.sample.means <- apply(my.samples,1,mean)   
   all.sample.vars <- apply(my.samples,1,var) 
   par(mfrow=c(2,2))
   hist(my.samples[1,],col="gray",main="Distribution of One Sample")
   hist(all.sample.sums,col="gray",main="Sampling Distribution\nof
	the Sum")
   hist(all.sample.means,col="gray",main="Sampling Distribution\nof the Mean")
   hist(all.sample.vars,col="gray",main="Sampling Distribution\nof
	the Variance")
}

The right thing to do with this function is use it to simulate a bunch of distributions and plot them using base R. You write the function name, followed by parenthesis, followed by each of the four arguments the function needs to work. This will generate a normal distribution with mean of 20 and standard deviation of 3, along with three sampling distributions, using a sample size of 100 and 10000 replications:

sdm.sim(100, src.dist="N", param1=20, param2=3)

(You should get four plots, arranged in a 2×2 grid.)

But what if we tried to treat sdm.sim like a list, and call the 3rd element of it? Or what if we tried to treat it like a data frame, and we wanted to call one of the variables in the column of the data frame?

> sdm.sim[3]
Error in sdm.sim[3] : object of type 'closure' is not subsettable

> sdm.sim$values
Error in sdm.sim$values : object of type 'closure' is not subsettable

SURPRISE! Object of type closure is not subsettable. This happens because sdm.sim is a function, and its data type is (shockingly) something called “closure”:

> class(sdm.sim)
[1] "function"

> typeof(sdm.sim)
[1] "closure"

I had read this before on Stack Overflow a whole bunch of times, but it never really clicked until I saw it like this! And now that I had a sense for why the error was occurring, turns out it’s super easy to reproduce with functions in base R or functions in anyfamiliar packages you use:

> ggplot$a
Error in ggplot$a : object of type 'closure' is not subsettable

> table$a
Error in table$a : object of type 'closure' is not subsettable

> ggplot[1]
Error in ggplot[1] : object of type 'closure' is not subsettable

> table[1]
Error in table[1] : object of type 'closure' is not subsettable

As a result, if you’re pulling your hair out over this problem, check and see where in your rogue line of code you’re treating something like a non-function. Or maybe you picked a variable name that competes with a function name. Or maybe you got your operations out of order. In any case, change your notation so that your function is doing function things, and your code should start working.

But as Luke Smith pointed out, this is not true for functional sequences (which you can also write as functions). Functional sequences are those chains of commands you write when you’re in tidyverse mode, all strung together with %>% pipes:

Luke’s code that you can cut and paste (and try), with the top line that got cut off by Twitter, is:

fun1 <- . %>% 
    group_by(col1) %>% 
    mutate(n=n+h) %>% 
    filter(n==max(n)) %>% 
    ungroup() %>% 
    summarize(new_col=mean(n))

fun2 <- fun1[-c(2,5)] 

Even though Luke’s fun1 and fun2 are of type closure, they are subsettable because they contain a sequence of functions:

> typeof(fun1)
[1] "closure"

> typeof(fun2)
[1] "closure"

> fun1[1]
Functional sequence with the following components:

 1. group_by(., col1)

Use 'functions' to extract the individual functions. 

> fun1[[1]]
function (.) 
group_by(., col1)

Don’t feel bad! This problem has plagued all of us for many, many hours (me: years), and yet it still happens to us more often than we would like to admit. Awareness of this issue will not prevent you from attempting things that give you this error in the future. It’s such a popular error that there have been memes about it and sad valentines written about it:

SCROLL DOWN PAST STEPH’S TWEET TO SEE THE JOKE!!

(Also, if you’ve made it this far, FOLLOW THESE GOOD PEOPLE ON TWITTER: @stephdesilva @djnavarro @lksmth – They all share great information on data science in general, and R in particular. Many thanks too to all the #rstats crowd who shared in my glee last night and didn’t make me feel like an idiot for not figuring this out for ALMOST 14 YEARS. It seems so simple now.

Steph is also a Microsoft whiz who you should definitely hire if you need anything R+ Microsoft. Thanks to all of you!)

A Simple Intro to Q-Learning in R: Floor Plan Navigation

This example is drawn from “A Painless Q-Learning Tutorial” at http://mnemstudio.org/path-finding-q-learning-tutorial.htm which explains how to manually calculate iterations using the updating equation for Q-Learning, based on the Bellman Equation (image from https://www.is.uni-freiburg.de/ressourcen/business-analytics/13_reinforcementlearning.pdf):

The example explores path-finding through a house:

The question to be answered here is: What’s the best way to get from Room 2 to Room 5 (outside)? Notice that by answering this question using reinforcement learning, we will also know how to find optimal routes from any room to outside. And if we run the iterative algorithm again for a new target state, we can find out the optimal route from any room to that new target state.

Since Q-Learning is model-free, we don’t need to know how likely it is that our agent will move between any room and any other room (the transition probabilities). If you had observed the behavior in this system over time, you might be able to find that information, but it many cases it just isn’t available. So the key for this problem is to construct a Rewards Matrix that explains the benefit (or penalty!) of attempting to go from one state (room) to another.

Assigning the rewards is somewhat arbitrary, but you should give a large positive value to your target state and negative values to states that are impossible or highly undesirable. Here’s the guideline we’ll use for this problem:

  • -1 if “you can’t get there from here”
  • 0 if the destination is not the target state
  • 100 if the destination is the target state

We’ll start constructing our rewards matrix by listing the states we’ll come FROM down the rows, and the states we’ll go TO in the columns. First, let’s fill the diagonal with -1 rewards, because we don’t want our agent to stay in the same place (that is, move from Room 1 to Room 1, or from Room 2 to Room 2, and so forth). The final one gets a 100 because if we’re already in Room 5, we want to stay there.

Next, let’s move across the first row. Starting in Room 0, we only have one choice: go to Room 4. All other possibilities are blocked (-1):

Now let’s fill in the row labeled 1. From Room 1, you have two choices: go to Room 3 (which is not great but permissible, so give it a 0) or go to Room 5 (the target, worth 100)!

Continue moving row by row, determining if you can’t get there from here (-1), you can but it’s not the target (0), or it’s the target(100). You’ll end up with a final rewards matrix that looks like this:

Now create this rewards matrix in R:

R <- matrix(c(-1, -1, -1, -1, 0, -1,
       -1, -1, -1, 0, -1, 100,
       -1, -1, -1, 0, -1, -1, 
       -1, 0, 0, -1, 0, -1,
        0, -1, -1, 0, -1, 100,
       -1, 0, -1, -1, 0, 100), nrow=6, ncol=6, byrow=TRUE)

And run the code. Notice that we’re calling the target state 6 instead of 5 because even though we have a room labeled with a zero, our matrix starts with a 1s so we have to adjust:

source("https://raw.githubusercontent.com/NicoleRadziwill/R-Functions/master/qlearn.R")

results <- q.learn(R,10000,alpha=0.1,gamma=0.8,tgt.state=6) 
> round(results)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    0    0    0   80    0
[2,]    0    0    0   64    0  100
[3,]    0    0    0   64    0    0
[4,]    0   80   51    0   80    0
[5,]   64    0    0   64    0  100
[6,]    0   80    0    0   80  100

You can read this table of average value to obtain policies. A policy is a “path” through the states of the system:

  • Start at Room 0 (first row, labeled 1): Choose Room 4 (80), then from Room 4 choose Room 5 (100)
  • Start at Room 1: Choose Room 5 (100)
  • Start at Room 2: Choose Room 3 (64), from Room 3 choose Room 1 or Room 4 (80); from 1 or 4 choose 5 (100)
  • Start at Room 3: Choose Room 1 or Room 4 (80), then Room 5 (100)
  • Start at Room 4: Choose Room 5 (100)
  • Start at Room 5: Stay at Room 5 (100)

To answer the original problem, we would take route 2-3-1-5 or 2-3-4-5 to get out the quickest if we started in Room 2. This is easy to see with a simple map, but is much more complicated when the maps get bigger.

« Older Entries