Tag Archives: r statistical software

Performance Measures for Classifiers: Precision, Recall, and F1

Here is a new, simple tutorial on how to evaluate the quality of a classifier. The attached doc shows you how to construct a confusion matrix, compute the precision, recall, and f1 scores for a classifier, and to construct a precision/recall chart in R to compare the relative strengths and weaknesses of different classifiers.

performance-measures-classifiers-75-925

Granted, these measures are not perfect. Powers (2011), in the Journal of Machine Learning Technologies, advises that they should not be used without a clear understanding of the biases, especially considering the power of intelligent prediction vs. the power of the guess. However, they should provide a decent basis for practitioners to compare different classification strategies. (Notice that you don’t even need algorithms to do this… you can generate a confusion matrix from any plant operation or business activity where classification is performed!)

Access METAR Weather Data in R Statistical Software

Although this is neither quality nor innovation related, as a multi-decade weather geek and degreed meteorologist, I still really love my weather data. Today I wanted to learn how to retrieve historical weather data from within the R statistical software. I managed to get a list within R that contains METAR observations for an entire day for one observing station. Here’s how I did it!

1. First, I signed up for a KEY to use the Weather Underground API at http://www.wunderground.com/weather/api/ – I’m not going to tell you what my personal key is, but it has 16 characters and looks kind of like this: d7000XXXXXXXXXXX

2. Next, I installed the rjson package into R

3. Then, I used this code to find out that there were 46 observations for August 11, 2012 (the date of interest). You’ll have to try it with YOUR new Weather Underground API key in place of the d7000XXX… :


library(rjson)

# BE SURE TO PUT THIS ALL ON THE SAME LINE, NO SPACES,
# NO CARRIAGE RETURNS, AND USE YOUR OWN API KEY
x <- fromJSON("http://api.wunderground.com/api/d7000XXX/
history_20120811/q/VA/Charlottesville.json")

# THIS WILL TELL US HOW MANY OBSERVATIONS WE HAVE
length((x$history)$observations)

# GET ALL METARS FOR THE WHOLE DAY AND STORE IT TO A LIST
daily.metars <- rep(NA,length((x$history)$observations))
for (n in 1:length((x$history)$observations)) {
daily.metars[n] <- (x$history$observations)[[n]]$metar
}

4. Now you have a list in R called daily.metars that contains strings holding all of your METARs for the day! Here’s the header from the list that I produced:

> head(daily.metars)
[1] "SPECI KCHO 110408Z AUTO 00000KT 5SM VCTS -RA BR SCT003 BKN030 OVC075 21/19 A2983 RMK AO2 LTG DSNT SW P0001"
[2] "SPECI KCHO 110433Z AUTO 20005KT 6SM -TSRA BR FEW003 BKN033 OVC110 21/19 A2984 RMK AO2 LTG DSNT NE AND S AND SW TSE11B27 P0002"
[3] "METAR KCHO 110453Z AUTO 00000KT 5SM -TSRA BR SCT018 SCT046 OVC100 21/19 A2983 RMK AO2 LTG DSNT NE-S TSE11B27 SLP093 P0003 T02060194 402940200"
[4] "SPECI KCHO 110529Z AUTO 00000KT 8SM -RA FEW070 SCT095 BKN110 21/19 A2982 RMK AO2 LTG DSNT NE-SE TSE23 P0001"
[5] "METAR KCHO 110553Z AUTO 00000KT 8SM FEW085 21/19 A2982 RMK AO2 LTG DSNT NE-SE TSE23RAE30 SLP090 P0001 60127 T02060194 10250 20200 58004"
[6] "METAR KCHO 110653Z AUTO 18005KT 8SM BKN090 21/19 A2983 RMK AO2 SLP093 T02060194"