Jul 20, 2018

Yogi Berra says, “You can see a lot just by observing.”

The data explored here examines daily average temperatures collected from January 1, 1955 to August 13, 2010 at weather stations located in Raleigh, McGuire Airforce Base, Fairbanks and New Orleans. These slides will walk you through multiple approaches for deciding how to analyze large data sets containing seasonality and high day-to-day variability.

Has it gotten warmer at the four sites or not?

A good place to start is just plotting the data. Below are plots of the average daily temperatures (in degrees Fahrenheit) for all four locations. What do these plots indicate?

Figure 1: Plots of Average daily temperature

Figure 1: Plots of Average daily temperature

Comparing Two different Years

The next slide has an applet that lets you change the years to compare the daily temperature between two years. Can you find two years that would indicate tempuratures are rising? Can you find two years that would indicate the tempuratures are falling? Is one year obviously warmer or cooler than another? Or other observations?Discuss any observations.

Comparing Two different Years

Review of Boxplots

Boxplots are one way to summarize data to get a sense of the overall distribution.


The next slide has an applet that displays boxplots. You can choose yearly, monthly or daily. For example, if you choose yearly, this creates a boxplot for each year while if you choose monthly this creates 12 boxplot; one for each month.

Look at all three (yearly, monthly and daily) and discuss your observations.



It can be difficult to see changes over time because of the seasonal effect. One way to eliminate this is to plot the temperature for one day each year. The next slide has an applet that lets you pick date and plots the temperture for that one date from 1955 to 2010. The monthly option plots the average temperature for the month that you selected rather than the particular day.

Pick a day in the winter and one in the summer. What do you notice about the variablity? Does this make sense?

Is there more or less variabilty between the daily and monthly option? Why?


Intro to time series

Another way to remove seasonality from data in a series through time is to compare points that should be the same with respect to seasonal effects. For instance, there should be no seasonal effect if you plot only the temperature readings taken on January 1 of each year or only those taken on August 17 - which you just played with. Let’s look for a more holistic approach.

The temperature data has a seasonal component with a period of 365 days. Letting \(T_t\) denote the temperature reading at time \(t\), the following differences remove the seasonal component:

\[\LARGE D_t = T_t - T_{t-365}\]

Time Series

Does it make sense that the red line is close to 0? If there were a linear trend in temperature, then the differences should be randomly distributed about the average yearly change.

Average Temperatures

We are searching for a small signal (in this case a temperature change), if any, within data that are very noisy (due to day to day variations). One way to smooth out some of this variability is to use averaging. The next app plots yearly average temperature over time.

Average Temperatures

Final Discussions

  1. Do you think you have a better sense for the data now than you did at the beginning? Explain.

  2. What questions about the data might you want to ask next?

  3. What other plots would you suggest looking at? Why?

Final Project Possibilities

  1. The module has not answered the question we began with: “Is there any observable temperature trend over this time period at the four locations?” What do you think? Support your position with evidence from the graphs.

  2. Compare two of the four locations. How are they the same and how are they different?


  1. J. M. Chambers, W. S. Cleveland, B. Kleiner, and P. Tukey, “Graphical Methods for Data Analysis,” Duxbury Press, Boston, MA, 1983.

  2. M. Frigge, D. C. Hoaglin, B. Iglewicz, “Some Implementations of the Boxplot,” The American Statistician, 43(1), pp. 50-54, 1989.

  3. R.J. Hyndman and Y. Fan, “Sample Quantiles in Statistical Packages,” The American Statistician, 50(4), pp. 361-365, 1996.

  4. R. McGill, J. W. Tukey and W. A. Larsen, “Variations of Boxplots,” The American Statistician, 32(1), pp. 12-16, 1978.

  5. NOAA, Climate data format and download instruction, 2011. ftp://ftp.ncdc.noaa.gov/pub/data/gsod/readme.txt.

  6. NOAA, Data Tables, Normal Daily Mean Temperature, Degrees F, http://www1.ncdc.noaa.gov/pub/data/ccd-data/nrmavg.txt.