Free Speech in the Internet of Things (IoT)

Image Credit: from "Reclaim Democracy" at http://reclaimdemocracy.org/who-are-citizens-united/

IF YOUR TOASTER COULD TALK, IT WOULD HAVE THE RIGHT TO FREE SPEECH. Image Credit: from “Reclaim Democracy” at http://reclaimdemocracy.org/who-are-citizens-united/

By the end of 2016, Gartner estimates that over 6.4 BILLION “things” will be connected to one another in the nascent Internet of Things (IoT). As innovation yields new products, services, and capabilities that leverage this ecosystem, we will need new conceptual models to ensure quality and support continuous improvement in this environment.

I wasn’t thinking about quality or IoT this morning… but instead, was trying to understand why so many people on Twitter and Facebook are linking Justice Scalia’s recent death to Citizens United. (I’d heard of Citizens United, but quite frankly, thought it was a soccer team. Embarrassing, I know.) I was surprised to find out that instead, Citizens United is a conservative U.S. political organization best known for its role in the 2010 Supreme Court Case Citizens United v. FEC.

That case removed many restrictions on political spending. With the “super-rich donating more than ever before to individual campaigns plus the ‘enormous’ chasm in wealth has given the super-rich the power to steer the economic and political direction of the United States and undermine its democracy.” Interesting, sure… but what’s more interesting to me is that the Citizens United case, according to this source

  • Strengthened First Amendment protection for corporations, 
  • Affirmed that Money = Speech, and
  • Affirmed that Non-Persons have the right to free speech.

The article goes on to state that “if your underpants could talk, they would be protected by free speech.”

Not too long ago, a statement like this would just be silly. But today, with immersive IoT looming, this isn’t too far-fetched. 

  • What will the world look (and feel) like when everything you interact with has a “voice”?
  • How will the “Voice of the Customer” be heard when all of that customer’s stuff ALSO has a voice?
  • What IS the “Voice of the Customer” in a world like this?

Analytic Hierarchy Process (AHP) using preferenceFunction in ahp

Yesterday, I wrote about how to use gluc‘s new ahp package on a simple Tom-Dick-Harry one level decision making problem using Analytic Hierarchy Process (AHP). One of the cool things about that package is that in addition to specifying the pairwise comparisons directly using Saaty’s scale (below, from https://kristalaace2014.wordpress.com/2014/05/14/w12_al_vendor-evaluation/)…

saaty-scale

…you can also describe each of the Alternatives in terms of descriptive variables which you can use inside a function to make the pairwise comparisons automatically. This is VERY helpful if you have lots of criteria, subcriteria, or alternatives to evaluate!! For example, I used preferenceFunction to compare 55 alternatives using 6 criteria and 4 subcriteria, and was very easily able to create functions to represent my judgments. This was much easier than manually entering all the comparisons.

This post shows HOW I replaced some of my manual comparisons with automated comparisons using preferenceFunction. (The full YAML file is included at the bottom of this post for you to use if you want to run this example yourself.) First, recall that the YAML file starts with specifying the alternatives that you are trying to choose from (at the bottom level of the decision hierarchy) and some variables that characterize those alternatives. I used the descriptions in the problem statement to come up with some assessments between 1=not great and 10=great:

#########################
# Alternatives Section
# THIS IS FOR The Tom, Dick, & Harry problem at
# https://en.wikipedia.org/wiki/Analytic_hierarchy_process_%E2%80%93_leader_example
#
Alternatives: &alternatives
# 1= not well; 10 = best possible
# Your assessment based on the paragraph descriptions may be different.
  Tom:
    age: 50
    experience: 7
    education: 4
    leadership: 10
  Dick:
    age: 60
    experience: 10
    education: 6
    leadership: 6
  Harry:
    age: 30
    experience: 5
    education: 8
    leadership: 6
#
# End of Alternatives Section
#####################################

Here is a snippet from my original YAML file specifying my AHP problem manually ():

  children: 
    Experience:
      preferences:
        - [Tom, Dick, 1/4]
        - [Tom, Harry, 4]
        - [Dick, Harry, 9]
      children: *alternatives
    Education:
      preferences:
        - [Tom, Dick, 3]
        - [Tom, Harry, 1/5]
        - [Dick, Harry, 1/7]
      children: *alternatives

And here is what I changed that snippet to, so that it would do my pairwise comparisons automatically. The functions are written in standard R (fortunately), and each function has access to a1 and a2 (the two alternatives). Recursion is supported which makes this capability particularly useful. I tried to write a function using two of the characteristics in the decision (a1$age and a1$experience) but this didn’t seen to work. I’m not sure whether the package supports it or not. Here are my comparisons rewritten as functions:

  children: 
    Experience:
          preferenceFunction: >
            ExperiencePreference <- function(a1, a2) {
              if (a1$experience < a2$experience) return (1/ExperiencePreference(a2, a1))
              ratio <- a1$experience / a2$experience
              if (ratio < 1.05) return (1)
              if (ratio < 1.2) return (2)
              if (ratio < 1.5) return (3)
              if (ratio < 1.8) return (4)
              if (ratio < 2.1) return (5) return (6) } children: *alternatives Education: preferenceFunction: >
            EducPreference <- function(a1, a2) {
              if (a1$education < a2$education) return (1/EducPreference(a2, a1))
              ratio <- a1$education / a2$education
              if (ratio < 1.05) return (1)
              if (ratio < 1.15) return (2)
              if (ratio < 1.25) return (3)
              if (ratio < 1.35) return (4)
              if (ratio < 1.55) return (5)
              return (5)
            }
          children: *alternatives

To run the AHP with functions in R, I used this code (I am including the part that gets the ahp package, in case you have not done that yet). BE CAREFUL and make sure, like in FORTRAN, that you line things up so that the words START in the appropriate columns. For example, the “p” in preferenceFunction MUST be immediately below the 7th character of your criterion’s variable name.

devtools::install_github("gluc/ahp", build_vignettes = TRUE)
install.packages("data.tree")

library(ahp)
library(data.tree)

setwd("C:/AHP/artifacts")
nofxnAhp <- LoadFile("tomdickharry.txt")
Calculate(nofxnAhp)
fxnAhp <- LoadFile("tomdickharry-fxns.txt")
Calculate(fxnAhp)

print(nofxnAhp, "weight")
print(fxnAhp, "weight")

You can see that the weights are approximately the same, indicating that I did a good job at developing functions that represent the reality of how I used the variables attached to the Alternatives to make my pairwise comparisons. The results show that Dick is now the best choice, although there is some inconsistency in our judgments for Experience that we should examine further. (I have not examined this case to see whether rank reversal could be happening).

> print(nofxnAhp, "weight")
                         levelName     weight
1  Choose the Most Suitable Leader 1.00000000
2   ¦--Experience                  0.54756924
3   ¦   ¦--Tom                     0.21716561
4   ¦   ¦--Dick                    0.71706504
5   ¦   °--Harry                   0.06576935
6   ¦--Education                   0.12655528
7   ¦   ¦--Tom                     0.18839410
8   ¦   ¦--Dick                    0.08096123
9   ¦   °--Harry                   0.73064467
10  ¦--Charisma                    0.26994992
11  ¦   ¦--Tom                     0.74286662
12  ¦   ¦--Dick                    0.19388163
13  ¦   °--Harry                   0.06325174
14  °--Age                         0.05592555
15      ¦--Tom                     0.26543334
16      ¦--Dick                    0.67162545
17      °--Harry                   0.06294121

> print(fxnAhp, "weight")
                         levelName     weight
1  Choose the Most Suitable Leader 1.00000000
2   ¦--Experience                  0.54756924
3   ¦   ¦--Tom                     0.25828499
4   ¦   ¦--Dick                    0.63698557
5   ¦   °--Harry                   0.10472943
6   ¦--Education                   0.12655528
7   ¦   ¦--Tom                     0.08273483
8   ¦   ¦--Dick                    0.26059839
9   ¦   °--Harry                   0.65666678
10  ¦--Charisma                    0.26994992
11  ¦   ¦--Tom                     0.74286662
12  ¦   ¦--Dick                    0.19388163
13  ¦   °--Harry                   0.06325174
14  °--Age                         0.05592555
15      ¦--Tom                     0.26543334
16      ¦--Dick                    0.67162545
17      °--Harry                   0.06294121

> ShowTable(fxnAhp)

tomdick-ahp-fxns

Here is the full YAML file for the “with preferenceFunction” case.

#########################
# Alternatives Section
# THIS IS FOR The Tom, Dick, & Harry problem at
# https://en.wikipedia.org/wiki/Analytic_hierarchy_process_%E2%80%93_leader_example
#
Alternatives: &alternatives
# 1= not well; 10 = best possible
# Your assessment based on the paragraph descriptions may be different.
  Tom:
    age: 50
    experience: 7
    education: 4
    leadership: 10
  Dick:
    age: 60
    experience: 10
    education: 6
    leadership: 6
  Harry:
    age: 30
    experience: 5
    education: 8
    leadership: 6
#
# End of Alternatives Section
#####################################
# Goal Section
#
Goal:
# A Goal HAS preferences (within-level comparison) and HAS Children (items in level)
  name: Choose the Most Suitable Leader
  preferences:
    # preferences are defined pairwise
    # 1 means: A is equal to B
    # 9 means: A is highly preferrable to B
    # 1/9 means: B is highly preferrable to A
    - [Experience, Education, 4]
    - [Experience, Charisma, 3]
    - [Experience, Age, 7]
    - [Education, Charisma, 1/3]
    - [Education, Age, 3]
    - [Age, Charisma, 1/5]
  children: 
    Experience:
          preferenceFunction: >
            ExperiencePreference <- function(a1, a2) {
              if (a1$experience < a2$experience) return (1/ExperiencePreference(a2, a1))
              ratio <- a1$experience / a2$experience
              if (ratio < 1.05) return (1)
              if (ratio < 1.2) return (2)
              if (ratio < 1.5) return (3)
              if (ratio < 1.8) return (4)
              if (ratio < 2.1) return (5) return (6) } children: *alternatives Education: preferenceFunction: >
            EducPreference <- function(a1, a2) {
              if (a1$education < a2$education) return (1/EducPreference(a2, a1))
              ratio <- a1$education / a2$education
              if (ratio < 1.05) return (1)
              if (ratio < 1.15) return (2)
              if (ratio < 1.25) return (3)
              if (ratio < 1.35) return (4)
              if (ratio < 1.55) return (5)
              return (5)
            }
          children: *alternatives
    Charisma:
      preferences:
        - [Tom, Dick, 5]
        - [Tom, Harry, 9]
        - [Dick, Harry, 4]
      children: *alternatives
    Age:
      preferences:
        - [Tom, Dick, 1/3]
        - [Tom, Harry, 5]
        - [Dick, Harry, 9]
      children: *alternatives
#
# End of Goal Section
#####################################

Analytic Hierarchy Process (AHP) with the ahp Package

On my December to-do list, I had “write an R package to make analytic hierarchy process (AHP) easier” — but fortunately gluc beat me to it, and saved me tons of time that I spent using AHP to do an actual research problem. First of all, thank you for writing the new ahp package! Next, I’d like to show everyone just how easy this package makes performing AHP and displaying the results. We will use the Tom, Dick, and Harry example that is described on Wikipedia. – the goal is to choose a new employee, and you can pick either Tom, Dick, or Harry. Read the problem statement on Wikipedia before proceeding.

AHP is a method for multi-criteria decision making that breaks the problem down based on decision criteria, subcriteria, and alternatives that could satisfy a particular goal. The criteria are compared to one another, the alternatives are compared to one another based on how well they comparatively satisfy the subcriteria, and then the subcriteria are examined in terms of how well they satisfy the higher-level criteria. The Tom-Dick-Harry problem is a simple hierarchy: only one level of criteria separates the goal (“Choose the Most Suitable Leader”) from the alternatives (Tom, Dick, or Harry):

tom-dick-harry

To use the ahp package, the most challenging part involves setting up the YAML file with your hierarchy and your rankings. THE MOST IMPORTANT THING TO REMEMBER IS THAT THE FIRST COLUMN IN WHICH A WORD APPEARS IS IMPORTANT. This feels like FORTRAN. YAML experts may be appalled that I just didn’t know this, but I didn’t. So most of the first 20 hours I spent stumbling through the ahp package involved coming to this very critical conclusion. The YAML AHP input file requires you to specify 1) the alternatives (along with some variables that describe the alternatives; I didn’t use them in this example, but I’ll post a second example that does use them) and 2) the goal hierarchy, which includes 2A) comparisons of all the criteria against one another FIRST, and then 2B) comparisons of the criteria against the alternatives. I saved my YAML file as tomdickharry.txt and put it in my C:/AHP/artifacts directory:

#########################
# Alternatives Section
# THIS IS FOR The Tom, Dick, & Harry problem at
# https://en.wikipedia.org/wiki/Analytic_hierarchy_process_%E2%80%93_leader_example
#
Alternatives: &alternatives
# 1= not well; 10 = best possible
# Your assessment based on the paragraph descriptions may be different.
  Tom:
    age: 50
    experience: 7
    education: 4
    leadership: 10
  Dick:
    age: 60
    experience: 10
    education: 6
    leadership: 6
  Harry:
    age: 30
    experience: 5
    education: 8
    leadership: 6
#
# End of Alternatives Section
#####################################
# Goal Section
#
Goal:
# A Goal HAS preferences (within-level comparison) and HAS Children (items in level)
  name: Choose the Most Suitable Leader
  preferences:
    # preferences are defined pairwise
    # 1 means: A is equal to B
    # 9 means: A is highly preferable to B
    # 1/9 means: B is highly preferable to A
    - [Experience, Education, 4]
    - [Experience, Charisma, 3]
    - [Experience, Age, 7]
    - [Education, Charisma, 1/3]
    - [Education, Age, 3]
    - [Age, Charisma, 1/5]
  children: 
    Experience:
      preferences:
        - [Tom, Dick, 1/4]
        - [Tom, Harry, 4]
        - [Dick, Harry, 9]
      children: *alternatives
    Education:
      preferences:
        - [Tom, Dick, 3]
        - [Tom, Harry, 1/5]
        - [Dick, Harry, 1/7]
      children: *alternatives
    Charisma:
      preferences:
        - [Tom, Dick, 5]
        - [Tom, Harry, 9]
        - [Dick, Harry, 4]
      children: *alternatives
    Age:
      preferences:
        - [Tom, Dick, 1/3]
        - [Tom, Harry, 5]
        - [Dick, Harry, 9]
      children: *alternatives
#
# End of Goal Section
#####################################

Next, I installed gluc’s ahp package and a helper package, data.tree, then loaded them into R:

devtools::install_github("gluc/ahp", build_vignettes = TRUE)
install.packages("data.tree")

library(ahp)
library(data.tree)

Running the calculations was ridiculously easy:

setwd("C:/AHP/artifacts")
myAhp <- LoadFile("tomdickharry.txt")
Calculate(myAhp)

And then generating the output was also ridiculously easy:

> GetDataFrame(myAhp)
                                  Weight  Dick   Tom Harry Consistency
1 Choose the Most Suitable Leader 100.0% 49.3% 35.8% 14.9%        4.4%
2  ¦--Experience                   54.8% 39.3% 11.9%  3.6%        3.2%
3  ¦--Education                    12.7%  1.0%  2.4%  9.2%        5.6%
4  ¦--Charisma                     27.0%  5.2% 20.1%  1.7%        6.1%
5  °--Age                           5.6%  3.8%  1.5%  0.4%        2.5%
> 
> print(myAhp, "weight", filterFun = isNotLeaf)
                        levelName     weight
1 Choose the Most Suitable Leader 1.00000000
2  ¦--Experience                  0.54756924
3  ¦--Education                   0.12655528
4  ¦--Charisma                    0.26994992
5  °--Age                         0.05592555
> print(myAhp, "weight")
                         levelName     weight
1  Choose the Most Suitable Leader 1.00000000
2   ¦--Experience                  0.54756924
3   ¦   ¦--Tom                     0.21716561
4   ¦   ¦--Dick                    0.71706504
5   ¦   °--Harry                   0.06576935
6   ¦--Education                   0.12655528
7   ¦   ¦--Tom                     0.18839410
8   ¦   ¦--Dick                    0.08096123
9   ¦   °--Harry                   0.73064467
10  ¦--Charisma                    0.26994992
11  ¦   ¦--Tom                     0.74286662
12  ¦   ¦--Dick                    0.19388163
13  ¦   °--Harry                   0.06325174
14  °--Age                         0.05592555
15      ¦--Tom                     0.26543334
16      ¦--Dick                    0.67162545
17      °--Harry                   0.06294121

You can also generate very beautiful output with the command below (but you’ll have to run the example yourself if you want to see how fantastically it turns out — maybe that will provide some motivation!)

ShowTable(myAhp)

I’ll post soon with an example of how to use AHP preference functions in the Tom, Dick, & Harry problem.

Course Materials for Statistics (The Easier Way) With R

very-quick-cover-outlineAre you an instructor with a Spring 2016 intro to statistics course coming up… and yet you haven’t started preparing? If so, I have a potential solution for you to consider. The materials (lecture slides, in-class practice problems in R, exams, syllabus) go with my book and are about 85% compiled, but good enough to get a course started this week (I will be finishing the collection by mid-January).
There is also a 36MB .zip file if you would like to download the materials.
Whether you will be using them or just considering them, please fill in the Google Form at https://docs.google.com/forms/d/1Z7djuKHg1L4k7bTtktHXI7Juduad3fUW9G69bxN6jRA/viewform so I can keep track of everyone and provide you with updates. Also, I want to make sure that I’m providing the materials to INSTRUCTORS (and not students), so please use the email account from your institution when you sign up. Once I get your contact information, I will email you the link to the materials.
If you would like permission to edit the materials, I can do that as well — I know a couple of you have expressed that you would like to add to the collection (e.g translate them to another language). If you see any issues or errors, please either fix and/or email to tell me to fix!
Thanks for your interest and participation! Also, Happy New Year!

Deploying Your Very Own Shiny Server

Nicole has been having a lot of fun the last few days creating her own Shiny apps. We work in the same space, and let’s just say her enthusiasm is very contagious. While she focused on deploying R-based web apps on ShinyApps.io, I’m more of a web development geek, so I put my energy towards setting up a server where she could host her apps. This should come in handy, since she blew through all of her free server time on ShinyApps after just a couple of days!

Before you begin, you can see a working example of this at https://shinyisat.net/sample-apps/sampdistclt/.

In this tutorial, I’m going to walk you through the process of:

  1. Setting up an Ubuntu 14.04 + NGINX server at DigitalOcean
  2. Installing and configuring R
  3. Installing and configuring Shiny and the open-source edition of Shiny Server
  4. Installing a free SSL certificate from Let’s Encrypt
  5. Securing the Shiny Server using the SSL cert and reverse proxy through NGINX
  6. Setting appropriate permissions on the files to be served
  7. Creating and launching the app Nicole created in her recent post

Setting Up an Ubuntu 14.04 Server at DigitalOcean

DigitalOcean is my new favorite web host. (Click this link to get a $10 credit when you sign up!) They specialize in high-performance, low-cost, VPS (virtual private servers) targeted at developers. If you want full control over your server, you can’t beat their $5/month offering. They also provide excellent documentation. In order to set up your server, you should start by following these tutorials:

  1. How to Create Your First DigitalOcean Droplet Virtual Server
  2. How to Connect to Your Droplet with SSH
  3. Initial Server Setup with Ubuntu 14.04
  4. Additional Recommended Steps for New Ubuntu 14.04 Servers
  5. How To Protect SSH with Fail2Ban on Ubuntu 14.04

I followed these pretty much exactly without any difficulties. I did make a few changes to their procedure, which I’ll describe next.

Allowing HTTPS with UFW

I found that the instructions for setting up ufw needed a tweak. Since HTTPS traffic uses port 443 on the server, I thought that sudo ufw allow 443/tcp should take care of letting HTTPS traffic through the firewall. Unfortunately, it doesn’t. In addition you should run the following:


$ sudo ufw allow https

$ sudo ufw enable

Your web server may not accept incoming HTTPS traffic if you do not do this. Note: you may not have noticed, but you also installed NGINX as part of the UFW tutorial.

Setting up Automatic Updates on Your Server

The default install of Ubuntu at DigitalOcean comes with the automatic updates package already installed. This means your server will get security packages and upgrades without you having to do it manually. However, this package needs to be configured. First, edit /etc/apt/apt.conf.d/50unattended-upgrades to look like this:

Unattended-Upgrade::Allowed-Origins {
   "${distro_id}:${distro_codename}-security";
   "${distro_id}:${distro_codename}-updates";
};
Unattended-Upgrade::Mail "admin@mydomain.com";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
Unattended-Upgrade::Automatic-Reboot "true";
Unattended-Upgrade::Automatic-Reboot-Time "02:00";

Note, that this configuration will install upgrades and security updates, and will automatically reboot your server, if necessary, at 2:00AM, and it will purge unused packages from your system completely. Some people don’t like to have that much stuff happen automatically without supervision. Also, my /etc/apt/apt.conf.d/10periodic file looks like:

APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "1";
APT::Periodic::AutocleanInterval "7";
APT::Periodic::Unattended-Upgrade "1";

Which sets upgrades to happen daily, and purges to happen once a week.

Installing and Configuring R

Okay, now that your server is set up (you should be able to view the default NGINX page at http://your-domain-name.com), it’s time to install R.

Set the CRAN Repository in Ubuntu’s sources.list

The first step is to add your favorite CRAN repository to Ubuntu’s sources list. This will ensure that you get the latest version of R when you install it. To open and edit the sources list, type the following:


$ sudo nano /etc/apt/sources.list

Move the cursor down to the bottom of this file using the arrow keys, and add the following line at the bottom:


deb https://cran.cnr.berkeley.edu/bin/linux/ubuntu trusty/

Of course, you can substitute your favorite CRAN repo here. I like Berkeley. Don’t miss that there is a space between “ubuntu” and “trusty”. Hit CTRL+x to exit from this file. Say “yes” when they ask if you want to save your changes. The official docs on installing R packages on Ubuntu also recommend that you activate the backports repositories as well, but I found that this was already done on my DigitalOcean install.

Add the Public Key for the Ubuntu R Package

In order for Ubuntu to be able to recognize, and therefore trust, download, and install the R packages from the CRAN repo, we need to install the public key. This can be done with the following command:


$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 51716619E084DAB9

Install R

Run the following:


$ sudo apt-get update

$ sudo apt-get install r-base

When this is finished, you should be able to type R –version and get back the following message:


$ R --version

R version 3.2.2 (2015-08-14) -- "Fire Safety"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
http://www.gnu.org/licenses/.

If you get this, you’ll know that R was successfully installed on your server. If not, you’ll need to do some troubleshooting.

Configure R to Use curl and Your CRAN Repository of Choice

Type the following to open up the Rprofile.site file:


$ sudo pico /etc/R/Rprofile.site

You may delete all of the content and add the following:


options(download.file.method="libcurl")

local({
    r <- getOption("repos")
    r["CRAN"] <- "https://cran.rstudio.com/"
    options(repos=r)
})

This will allow us to run install.packages('packagename') without specifying the repository later.

Install Dependencies and Packages Needed by Shiny Server

We’re going to need the devtools package, which means we need to install the libraries upon which it depends first (libcurl and libxml2):


$ sudo apt-get -y build-dep libcurl4-gnutls-dev

$ sudo apt-get -y install libcurl4-gnutls-dev

$ sudo apt-get -y build-dep libxml2-dev

$ sudo apt-get -y install libxml2-dev

Now we can install devtools, rsconnect, and rmarkdown:


$ sudo su - -c "R -e \"install.packages('devtools')\""

$ sudo su - -c "R -e \"devtools::install_github('rstudio/rsconnect')\""

$ sudo su - -c "R -e \"install.packages('rmarkdown')\""

$ sudo su - -c "R -e \"install.packages('shiny')\""

Install Shiny Server

Okay! Now we’re finally ready to install Shiny Server. Run the following:


$ cd ~ 
$ sudo apt-get install gdebi-core
$ wget https://download3.rstudio.org/ubuntu-12.04/x86_64/shiny-server-1.4.1.759-amd64.deb
$ sudo gdebi shiny-server-1.4.1.759-amd64.deb

At this point, your Shiny Server should be up and running, but we can’t visit it on the web yet because by default, it runs on port 3838, which is blocked by the firewall we set up earlier. We’re now going to secure it, and use a reverse proxy to run it through NGINX.

Install an SSL Certificate with Let’s Encrypt

Let’s Encrypt is a new, free service that will allow you to install a trusted SSL certificate on your server. Since Google and Mozilla are working hard to phase out all non-HTTPS traffic on the web, it’s a good idea to get into the habit of installing SSL certs whenever you set up a new website. First install git, then use it to download letsencrypt:


$ sudo apt-get install git
$ git clone https://github.com/letsencrypt/letsencrypt
$ cd letsencrypt

Now before we install the certificate, we have to stop our web server (NGINX). In the code below, replace yourdomain.com with your actual domain name that you registered for this site.


$ sudo service nginx stop
$ sudo ./letsencrypt-auto certonly --standalone -d yourdomain.com -d www.yourdomain.com

If all goes well, it should have installed your new certificates in the /etc/letsencrypt/live/yourdomain.com folder.

Configure the Reverse Proxy on NGINX

Open up the following file for editing:


$ sudo nano /etc/nginx/nginx.conf

And add the following lines near the bottom of the main http block, just before the section labeled “Virtual Host Configs”. In my file, this started around line 62:


...

##
# Map proxy settings for RStudio
##
map $http_upgrade $connection_upgrade {
    default upgrade;
    '' close;
}

##
# Virtual Host Configs
##
...

And then open up the default site config file:


$ sudo nano /etc/nginx/sites-available/default

And replace its contents with the following. Note you should replace yourdomain.com with your actual domain name, and 123.123.123.123 with the actual IP address of your server.


server {
   listen 80 default_server;
   listen [::]:80 default_server ipv6only=on;
   server_name yourdomain.com www.yourdomain.com;
   return 301 https://$server_name$request_uri;
}
server {
   listen 443 ssl;
   server_name yourdomain.com www.yourdomain.com;
   ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
   ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;
   ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
   ssl_prefer_server_ciphers on;
   ssl_ciphers AES256+EECDH:AES256+EDH:!aNULL;

   location / {
       proxy_pass http://123.123.123.123:3838;
       proxy_redirect http://123.123.123.123:3838/ https://$host/;
       proxy_http_version 1.1;
       proxy_set_header Upgrade $http_upgrade;
       proxy_set_header Connection $connection_upgrade;
       proxy_read_timeout 20d;
   }
}

Now start NGINX up again:


$ sudo service nginx start

And if all went well, your new Shiny Server should be up and running at https://yourdomain.com!

Note that even if you try to go to the insecure URL, traffic will be automatically redirected through HTTPS.

Setting Appropriate Permissions

Sometimes, your Shiny apps will need access to the filesystem to read or write files. Since the Shiny server runs as the user shiny, and since all the files that are being served are owned by root, then your apps will crash when they try to access files. I like Dean Attali’s solution. Run the following commands, substituting yourusername with the username you are using to access the server:


$ sudo groupadd shiny-apps
$ sudo usermod -aG shiny-apps yourusername
$ sudo usermod -aG shiny-apps shiny
$ cd /srv/shiny-server
$ sudo chown -R yourusername:shiny-apps .
$ sudo chmod g+w .
$ sudo chmod g+s .

In the future, any time you add files under /srv/shiny-server, you may need to change the permissions so the Shiny server can read them. We’ll do that in a moment.

Installing a New App

Finally, I’m going to show you how to put a new app on the server. We’re going to use the app that Nicole created and add it into the “sample apps” folder. Run the following:


$ cd /srv/shiny-server/sample-apps
$ mkdir sampdistclt
$ cd sampdistclt
$ nano server.R

This will create a new file called server.R and open it for editing. Copy and paste the second half of the code from Nicole’s post (the part that starts with ## server) into this file. Save and exit. Now create a second file in this directory called ui.R and paste the code from the first half of Nicole’s post (the part that starts with ## ui up to but not including the part that starts ## server). Save and exit.

Now you need to make sure that the permissions are set correctly:


$ chown -R :shiny-apps .

You may also need to restart the Shiny and/or NGINX servers. The commands to do that are:


$ sudo service nginx restart
$ sudo service shiny-server restart

If all has gone well, you can now view the app up and running at https://yourdomain.com/sample-apps/sampdistclt!

Conclusion

I haven’t had a lot of time to use this configuration, so please let me know if you find any bugs, or things that need to be tweaked. On the plus side, this configuration may be cheaper than using ShinyApps.io, but it also doesn’t have all the cool bells and whistles that you get there, either, like their user interface and monitoring traffic. At the very least, it should be a way to experiment, and put things out in the public for others to play with. Enjoy!

A Discrete Time Markov Chain (DTMC) SIR Model in R

Image Credit: Doug Buckley of http://hyperactive.to

Image Credit: Doug Buckley of http://hyperactive.to

There are many different techniques that be used to model physical, social, economic, and conceptual systems. The purpose of this post is to show how the Kermack-McKendrick (1927) formulation of the SIR Model for studying disease epidemics (where S stands for Susceptible, I stands for Infected, and R for Recovered) can be easily implemented in R as a discrete time Markov Chain using the markovchain package.

A Discrete Time Markov Chain (DTMC) is a model for a random process where one or more entities can change state between distinct timesteps. For example, in SIR, people can be labeled as Susceptible (haven’t gotten a disease yet, but aren’t immune), Infected (they’ve got the disease right now), or Recovered (they’ve had the disease, but no longer have it, and can’t get it because they have become immune). If they get the disease, they change states from Susceptible to Infected. If they get well, they change states from Infected to Recovered. It’s impossible to change states between Susceptible and Recovered, without first going through the Infected state. It’s totally possible to stay in the Susceptible state between successive checks on the population, because there’s not a 100% chance you’ll actually be infected between any two timesteps. You might have a particularly good immune system, or maybe you’ve been hanging out by yourself for several days programming.

Discrete time means you’re not continuously monitoring the state of the people in the system. It would get really overwhelming if you had to ask them every minute “Are you sick yet? Did you get better yet?” It makes more sense to monitor individuals’ states on a discrete basis rather than continuously, for example, like maybe once a day. (Ozgun & Barlas (2009) provide a more extensive illustration of the difference between discrete and continuous modeling, using a simple queuing system.)

To create a Markov Chain in R, all you need to know are the 1) transition probabilities, or the chance that an entity will move from one state to another between successive timesteps, 2) the initial state (that is, how many entities are in each of the states at time t=0), and 3) the markovchain package in R. Be sure to install markovchain before moving forward.

Imagine that there’s a 10% infection rate, and a 20% recovery rate. That implies that 90% of Susceptible people will remain in the Susceptible state, and 80% of those who are Infected will move to the Recovered Category, between successive timesteps. 100% of those Recovered will stay recovered. None of the people who are Recovered will become Susceptible.

Say that you start with a population of 100 people, and only 1 is infected. That means your “initial state” is that 99 are Susceptible, 1 is Infected, and 0 are Recovered. Here’s how you set up your Markov Chain:

library(markovchain)
mcSIR <- new("markovchain", states=c("S","I","R"),
    transitionMatrix=matrix(data=c(0.9,0.1,0,0,0.8,0.2,0,0,1),
    byrow=TRUE, nrow=3), name="SIR")
initialState <- c(99,1,0)

At this point, you can ask R to see your transition matrix, which shows the probability of moving FROM each of the three states (that form the rows) TO each of the three states (that form the columns).

> show(mcSIR)
SIR
 A  3 - dimensional discrete Markov Chain with following states
 S I R 
 The transition matrix   (by rows)  is defined as follows
    S   I   R
S 0.9 0.1 0.0
I 0.0 0.8 0.2
R 0.0 0.0 1.0

You can also plot your transition probabilities:

plot(mcSIR,package="diagram")

dtmc-sir-transitionnetwork

But all we’ve done so far is to create our model. We haven’t yet done a simulation, which would show us how many people are in each of the three states as you move from one discrete timestep to many others. We can set up a data frame to contain labels for each timestep, and a count of how many people are in each state at each timestep. Then, we fill that data frame with the results after each timestep i, calculated by initialState*mcSIR^i:

timesteps <- 100
sir.df <- data.frame( "timestep" = numeric(),
 "S" = numeric(), "I" = numeric(),
 "R" = numeric(), stringsAsFactors=FALSE)
 for (i in 0:timesteps) {
newrow <- as.list(c(i,round(as.numeric(initialState * mcSIR ^ i),0)))
 sir.df[nrow(sir.df) + 1, ] <- newrow
 }

Now that we have a data frame containing our SIR results (sir.df), we can display them to see what the values look like:

> head(sir.df)
  timestep  S  I  R
1        0 99  1  0
2        1 89 11  0
3        2 80 17  2
4        3 72 22  6
5        4 65 25 10
6        5 58 26 15

And then plot them to view our simulation results using this DTMC SIR Model:

plot(sir.df$timestep,sir.df$S)
points(sir.df$timestep,sir.df$I, col="red")
points(sir.df$timestep,sir.df$R, col="green")

dtmc-sir-simulation

It’s also possible to use the markovchain package to identify elements of your system as it evolves over time:

> absorbingStates(mcSIR)
[1] "R"
> transientStates(mcSIR)
[1] "S" "I"
> steadyStates(mcSIR)
     S I R
[1,] 0 0 1

And you can calculate the first timestep that your Markov Chain reaches its steady state (the “time to absorption”), which your plot should corroborate:

> ab.state <- absorbingStates(mcSIR)
> occurs.at <- min(which(sir.df[,ab.state]==max(sir.df[,ab.state])))
> (sir.df[row,]$timestep)+1
[1] 58

You can use this code to change the various transition probabilities to see what the effects are on the outputs yourself (sensitivity analysis). Also, there are methods you can use to perform uncertainty analysis, e.g. putting confidence intervals around your transition probabilities. We won’t do either of these here, nor will we create a Shiny app to run this simulation, despite the significant temptation.

My Second (R) Shiny App: Sampling Distributions & CLT

Image Credit: Doug Buckley of http://hyperactive.to

Image Credit: Doug Buckley of http://hyperactive.to

I was so excited about my initial foray into Shiny development using jennybc‘s amazing googlesheets package that I stayed up half the night last night (again) working on my second Shiny app: a Shiny-fied version of the function I shared in March to do simulations illustrating sampling distributions and the Central Limit Theorem using many different source distributions. (Note that Cauchy doesn’t play by the rules!) Hope this info is useful to all new Shiny developers.

If the app doesn’t work for you, it’s possible that I’ve exhausted my purchased hours at http://shinyapps.io — no idea how much traffic this post might generate. So if that happens to you, please try getting Shiny to work locally, cutting and pasting the code below into server.R and ui.R files, and then launching the simulation from your R console.

Here are some important lessons I learned on my 2nd attempt at Shiny development:

  • Creating a container (rv) for the server-side values that would change as a result of inputs from the UI was important. That container was then available to the portions of my Shiny code that prepared data for the UI, e.g. output$plotSample.
  • Because switch only takes arguments that are 1 character long, using radio buttons in the Shiny UI was really useful: I can map the label on each radio button to one character that will get passed into the data processing on the server side.
  • I was able to modify the CSS for the page by adding a couple lines to mainPanel() in my UI.
  • Although it was not mentally easy (for me) to convert from an R function to a Shiny app when initially presented with the problem, in retrospect, it was indeed straightforward. All I had to do was take the original function, split out the data processing from the presentation (par & hist commands), put the data processing code on the server side and the presentation code on the UI side, change the variable names on the server side so that they had the input$ prefix, and make sure the variable names were consistent between server and UI.
  • I originally tried writing one app.R file, but http://shinyapps.io did not seem to like that, so I put all the code that was not UI into the server side and tried deploying with server.R and ui.R, which worked. I don’t know what I did wrong.
  • If you want to publish to http://shinyapps.io, the directory name that hosts your files must be at least 4 characters long or you will get a “validation error” when you attempt to deployApp().
## Nicole's Second Shiny Demo App
## N. Radziwill, 12/6/2015, http://qualityandinnovation.com
## Used code from http://github.com/homerhanumat as a base
###########################################################
## ui
###########################################################

ui <- fluidPage(
titlePanel('Sampling Distributions and the Central Limit Theorem'),
sidebarPanel(
helpText('Choose your source distribution and number of items, n, in each
sample. 10000 replications will be run when you click "Sample Now".'),
h6(a("Read an article about this simulation at http://www.r-bloggers.com",
href="http://www.r-bloggers.com/sampling-distributions-and-central-limit-theorem-in-r/", target="_blank")),
sliderInput(inputId="n","Sample Size n",value=30,min=5,max=100,step=2),
radioButtons("src.dist", "Distribution type:",
c("Exponential: Param1 = mean, Param2 = not used" = "E",
"Normal: Param1 = mean, Param2 = sd" = "N",
"Uniform: Param1 = min, Param2 = max" = "U",
"Poisson: Param1 = lambda, Param2 = not used" = "P",
"Cauchy: Param1 = location, Param2 = scale" = "C",
"Binomial: Param1 = size, Param2 = success prob" = "B",
"Gamma: Param1 = shape, Param2 = scale" = "G",
"Chi Square: Param1 = df, Param2 = ncp" = "X",
"Student t: Param1 = df, Param2 = not used" = "T")),
numericInput("param1","Parameter 1:",10),
numericInput("param2","Parameter 2:",2),
actionButton("takeSample","Sample Now")
), # end sidebarPanel
mainPanel(
# Use CSS to control the background color of the entire page
tags$head(
tags$style("body {background-color: #9999aa; }")
),
plotOutput("plotSample")
) # end mainPanel
) # end UI

##############################################################
## server
##############################################################

library(shiny)
r <- 10000 # Number of replications... must be ->inf for sampling distribution!

palette(c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3",
"#FF7F00", "#FFFF33", "#A65628", "#F781BF", "#999999"))

server <- function(input, output) {
set.seed(as.numeric(Sys.time()))

# Create a reactive container for the data structures that the simulation
# will produce. The rv$variables will be available to the sections of your
# server code that prepare output for the UI, e.g. output$plotSample
rv <- reactiveValues(sample = NULL,
all.sums = NULL,
all.means = NULL,
all.vars = NULL)

# Note: We are giving observeEvent all the output connected to the UI actionButton.
# We can refer to input variables from our UI as input$variablename
observeEvent(input$takeSample,
{
my.samples <- switch(input$src.dist,
"E" = matrix(rexp(input$n*r,input$param1),r),
"N" = matrix(rnorm(input$n*r,input$param1,input$param2),r),
"U" = matrix(runif(input$n*r,input$param1,input$param2),r),
"P" = matrix(rpois(input$n*r,input$param1),r),
"C" = matrix(rcauchy(input$n*r,input$param1,input$param2),r),
"B" = matrix(rbinom(input$n*r,input$param1,input$param2),r),
"G" = matrix(rgamma(input$n*r,input$param1,input$param2),r),
"X" = matrix(rchisq(input$n*r,input$param1),r),
"T" = matrix(rt(input$n*r,input$param1),r))

# It was very important to make sure that rv contained numeric values for plotting:
rv$sample <- as.numeric(my.samples[1,])
rv$all.sums <- as.numeric(apply(my.samples,1,sum))
rv$all.means <- as.numeric(apply(my.samples,1,mean))
rv$all.vars <- as.numeric(apply(my.samples,1,var))
}
)

output$plotSample <- renderPlot({
# Plot only when user input is submitted by clicking "Sample Now"
if (input$takeSample) {
# Create a 2x2 plot area & leave a big space (5) at the top for title
par(mfrow=c(2,2), oma=c(0,0,5,0))
hist(rv$sample, main="Distribution of One Sample",
ylab="Frequency",col=1)
hist(rv$all.sums, main="Sampling Distribution of the Sum",
ylab="Frequency",col=2)
hist(rv$all.means, main="Sampling Distribution of the Mean",
ylab="Frequency",col=3)
hist(rv$all.vars, main="Sampling Distribution of the Variance",
ylab="Frequency",col=4)
mtext("Simulation Results", outer=TRUE, cex=3)
}
}, height=660, width=900) # end plotSample

} # end server
« Older Entries Recent Entries »