A Discrete Time Markov Chain (DTMC) SIR Model in R Image Credit: Doug Buckley of http://hyperactive.to

There are many different techniques that be used to model physical, social, economic, and conceptual systems. The purpose of this post is to show how the Kermack-McKendrick (1927) formulation of the SIR Model for studying disease epidemics (where S stands for Susceptible, I stands for Infected, and R for Recovered) can be easily implemented in R as a discrete time Markov Chain using the markovchain package.

A Discrete Time Markov Chain (DTMC) is a model for a random process where one or more entities can change state between distinct timesteps. For example, in SIR, people can be labeled as Susceptible (haven’t gotten a disease yet, but aren’t immune), Infected (they’ve got the disease right now), or Recovered (they’ve had the disease, but no longer have it, and can’t get it because they have become immune). If they get the disease, they change states from Susceptible to Infected. If they get well, they change states from Infected to Recovered. It’s impossible to change states between Susceptible and Recovered, without first going through the Infected state. It’s totally possible to stay in the Susceptible state between successive checks on the population, because there’s not a 100% chance you’ll actually be infected between any two timesteps. You might have a particularly good immune system, or maybe you’ve been hanging out by yourself for several days programming.

Discrete time means you’re not continuously monitoring the state of the people in the system. It would get really overwhelming if you had to ask them every minute “Are you sick yet? Did you get better yet?” It makes more sense to monitor individuals’ states on a discrete basis rather than continuously, for example, like maybe once a day. (Ozgun & Barlas (2009) provide a more extensive illustration of the difference between discrete and continuous modeling, using a simple queuing system.)

To create a Markov Chain in R, all you need to know are the 1) transition probabilities, or the chance that an entity will move from one state to another between successive timesteps, 2) the initial state (that is, how many entities are in each of the states at time t=0), and 3) the markovchain package in R. Be sure to install markovchain before moving forward.

Imagine that there’s a 10% infection rate, and a 20% recovery rate. That implies that 90% of Susceptible people will remain in the Susceptible state, and 80% of those who are Infected will move to the Recovered Category, between successive timesteps. 100% of those Recovered will stay recovered. None of the people who are Recovered will become Susceptible.

Say that you start with a population of 100 people, and only 1 is infected. That means your “initial state” is that 99 are Susceptible, 1 is Infected, and 0 are Recovered. Here’s how you set up your Markov Chain:

library(markovchain)
mcSIR <- new("markovchain", states=c("S","I","R"),
transitionMatrix=matrix(data=c(0.9,0.1,0,0,0.8,0.2,0,0,1),
byrow=TRUE, nrow=3), name="SIR")
initialState <- c(99,1,0)

At this point, you can ask R to see your transition matrix, which shows the probability of moving FROM each of the three states (that form the rows) TO each of the three states (that form the columns).

> show(mcSIR)
SIR
A  3 - dimensional discrete Markov Chain with following states
S I R
The transition matrix   (by rows)  is defined as follows
S   I   R
S 0.9 0.1 0.0
I 0.0 0.8 0.2
R 0.0 0.0 1.0

You can also plot your transition probabilities:

plot(mcSIR,package="diagram") But all we’ve done so far is to create our model. We haven’t yet done a simulation, which would show us how many people are in each of the three states as you move from one discrete timestep to many others. We can set up a data frame to contain labels for each timestep, and a count of how many people are in each state at each timestep. Then, we fill that data frame with the results after each timestep i, calculated by initialState*mcSIR^i:

timesteps <- 100
sir.df <- data.frame( "timestep" = numeric(),
"S" = numeric(), "I" = numeric(),
"R" = numeric(), stringsAsFactors=FALSE)
for (i in 0:timesteps) {
newrow <- as.list(c(i,round(as.numeric(initialState * mcSIR ^ i),0)))
sir.df[nrow(sir.df) + 1, ] <- newrow
}

Now that we have a data frame containing our SIR results (sir.df), we can display them to see what the values look like:

timestep  S  I  R
1        0 99  1  0
2        1 89 11  0
3        2 80 17  2
4        3 72 22  6
5        4 65 25 10
6        5 58 26 15

And then plot them to view our simulation results using this DTMC SIR Model:

plot(sir.df\$timestep,sir.df\$S)
points(sir.df\$timestep,sir.df\$I, col="red")
points(sir.df\$timestep,sir.df\$R, col="green") It’s also possible to use the markovchain package to identify elements of your system as it evolves over time:

> absorbingStates(mcSIR)
 "R"
> transientStates(mcSIR)
 "S" "I"
S I R
[1,] 0 0 1

And you can calculate the first timestep that your Markov Chain reaches its steady state (the “time to absorption”), which your plot should corroborate:

> ab.state <- absorbingStates(mcSIR)
> occurs.at <- min(which(sir.df[,ab.state]==max(sir.df[,ab.state])))
> (sir.df[row,]\$timestep)+1
 58

You can use this code to change the various transition probabilities to see what the effects are on the outputs yourself (sensitivity analysis). Also, there are methods you can use to perform uncertainty analysis, e.g. putting confidence intervals around your transition probabilities. We won’t do either of these here, nor will we create a Shiny app to run this simulation, despite the significant temptation.

• Reblogged this on d2ab.

• Pingback: Distilled News | Data Analytics & R

• I think the last line should read
sir.df[occurs.at,]\$timestep+1

and there is an extra &g; in front of absorbingStates(mcSIR) Are you reading from R Bloggers or from qualityandinnovation.com?

When R Bloggers picked up the post, it mangled my code and erased some characters. All of the characters should be visible if you view the original article.

• @Nicole and @Holger: Hi!

I respectfully disagree with adding 1 to:
(sir.df[row,]\$timestep) +1 # result: 58 days to reach Steady State (nope…)

I think it should simply be:
(sir.df[row,]\$timestep) # result: 57 days to reach Steady State (yes!)

Why?
Because, if you do a

line# timestamp S I R

56 55 0 0 99
57 56 0 0 99
58 57 0 0 100 <== Steady State here, for timestamp=57!
59 58 0 0 100
60 59 0 0 100

Steady State occurs at \$timestamp = 57 (the SECOND col in the head output…).
The 1st col is just the line# of the head() output…don't use that. Don't do "+1"…

The confusion arises because
line# (1st col) starts at "1" and
\$timestamp col starts at "0".
Time reference column (2nd col) is the one we should use.

Well, pls correct me if I'm wrong
You know…it's happened before. 🙂
—————————————————-

Now for my question:

when I type, as per the tutorial:

plot(mcSIR,package="diagram")

I do get a network plot of the Markov Chain
but some nodes and arrows are "cut off".

My plot is correct but it look very different
from the plot image in your article…

Both packages:
markovchain and diagram