Published by

Nicole Radziwill

Note: Everything in this article is easier with dplyr and magrittr in tidyverse. I’ll write a followup sometime this year.

I just wrote a new chapter for my students describing how to subset a data frame in R. The full text is available at https://docs.google.com/document/d/1K5U11-IKRkxNmitu_lS71Z6uLTQW_fp6QNbOMMwA5J8/edit?usp=sharing but here’s a preview:

Let’s load in ChickWeight, one of R’s built in datasets. This contains the weights of little chickens at 12 different times throughout their lives. The chickens are on different diets, numbered 1, 2, 3, and 4. Using the str command, we find that there are 578 observations in this data frame, and two different categorical variables: Chick and Diet.

> data(ChickWeight)
> head(ChickWeight)
weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
5 76 8 1 1
6 93 10 1 1
> str(ChickWeight)
Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and 'data.frame': 578 obs. of 4 variables:
$ weight: num 42 51 59 64 76 93 106 125 149 171 ...
$ Time : num 0 2 4 6 8 10 12 14 16 18 ...
$ Chick : Ord.factor w/ 50 levels "18"<"16"<"15"<..: 15 15 15 15 15 15 15 15 15 15 ...
$ Diet : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "formula")=Class 'formula' length 3 weight ~ Time | Chick
.. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
- attr(*, "outer")=Class 'formula' length 2 ~Diet
.. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
- attr(*, "labels")=List of 2
..$ x: chr "Time"
..$ y: chr "Body weight"
- attr(*, "units")=List of 2
..$ x: chr "(days)"
..$ y: chr "(gm)"

Get One Column: Now that we have a data frame named ChickWeight loaded into R, we can take subsets of these 578 observations. First, let’s assume we just want to pull out the column of weights. There are two ways we can do this: specifying the column by name, or specifying the column by its order of appearance. The general form for pulling information from data frames is data.frame[rows,columns] so you can get the first column in either of these two ways:

ChickWeight[,1] # get all rows, but only the first column
ChickWeight[,c("weight")] # get all rows, and only the column named “weight”

Get Multiple Columns: If you want more than one column, you can specify the column numbers or the names of the variables that you want to extract. If you want to get the weight and diet columns, you would do this:

ChickWeight[,c(1,4)] # get all rows, but only 1st and 4th columns
ChickWeight[,c("weight","Diet")] # get all rows, only “weight” & “Diet” columns

If you want more than one column and those columns are next to each other, you can do this:

ChickWeight[,c(1:3)]

Get One Row: You can get the first row similarly to how you got the first column, and any other row the same way:

ChickWeight[1,] # get first row, and all columns
ChickWeight[82,] # get 82nd row, and all columns

Get Multiple Rows: If you want more than one row, you can specify the row numbers you want like this:

> ChickWeight[c(1:6,15,18,27),]
weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
5 76 8 1 1
6 93 10 1 1
15 58 4 2 1
18 103 10 2 1
27 55 4 3 1

Rate this:

7 responses to “Taking a Subset of a Data Frame in R”

František Hána

December 13, 2016

I usually use it also with boolean parameter like ChickWeight[ChickWeight$Chick == 1, ]

Reply
Jeff

December 13, 2016

Dr. Radziwill,

This is a very nice resource. Thank you for sharing. Quick question regarding this bit:

> chick13 chick28 t.test(chick13,chick28)

Given how this data is set up, there are only 2 chicks here, with some number of weights attached to each 9e.g. the number of rows for each chick). The t.test command you have assumes independent samples by default, but these weights are dependent, no? You wrote a book. I didn’t LOL! So I’m guess there is something obvious here I’m missing, just want to know what it is.

thx in advance,
jeff

Reply
1. Nicole Radziwill
  
  December 14, 2016
  
  There’s no way to know whether the weights are dependent or paired, so I defaulted to independent samples. Some of the chick-weight-vectors don’t even report all the weights for each time! So it’s just an assumption based on missing context… no magic 🙂
  
  Reply
Jeff

December 13, 2016

Alright,

so that cut-n-paste is borked in my previous post. The syntax I’m referring to is at the bottom of page 5, where you subset the weights for two different chicks (13 and 28), save each chick’s weight to a different vector (chick13 and chick28), then run a t.test.

Reply
Alex

December 14, 2016

From the title, I thought it would be about subset() and logical indexing.

Reply
1. Nicole Radziwill
  
  December 14, 2016
  
  What a good (and clearly logical) idea! I’ll add a section on subset() too — never hurts to be able to do things many, many different ways.
  
  Reply
$

December 14, 2016

A third way to access a single column is through the $ symbol, e.g. ChickWeight$weight.

Reply

I’m Nicole

Since 2008, I’ve been sharing insights and expertise on Digital Transformation & Data Science for Performance Excellence here. As a CxO, I’ve helped orgs build empowered teams, robust programs, and elegant strategies bridging data, analytics, and artificial intelligence (AI)/machine learning (ML)… while building models in R and Python on the side. In 2025, I help leaders drive Quality-Driven Data & AI Strategies and navigate the complex market of data/AI vendors & professional services. Need help sifting through it all? Reach out to inquire – check out my new book that reveal the one thing EVERY organization has been neglecting – Data, Strategy, Culture & Power.

More About Me or HIRE ME OR MY PEOPLE

Let’s connect

Get Notifications

Stay updated with our latest ideas, books, and courses, or follow me on LinkedIn.

Quality and Innovation

Taking a Subset of a Data Frame in R

Rate this:

7 responses to “Taking a Subset of a Data Frame in R”

Leave a Reply Cancel reply

I’m Nicole

Let’s connect

Get Notifications

Recent posts

Sausage Program

Psychological Forces in Data Management

How Data Loses Value Over Time

Process in Service of Value

Looking Ahead to 2025

The Scariest Part of Corporate Halloween

Rate this:

Share this:

7 responses to “Taking a Subset of a Data Frame in R”

Leave a Reply Cancel reply

I’m Nicole

Let’s connect

Get Notifications

Recent posts