Feed on Posts or Comments

english & science & visualisation Franchu on 13 Dec 2009 10:26 pm

United Kingdom’s Met Office dataset preliminary analysis

The United Kingdom’s Met Office recently released temperature data for about 1700 weather stations across the globe from 1701 to 2009.

As lately I have been trying to learn how to use R and Processing, I decided that I would give a try to this dataset.

The first thing that I tried to do was to understand how the stations of the dataset are geographically distributed.
Stations distribution

Even if the projection is not very friendly, it is possible to recognise the main mass bodies on Earth. The density of stations is non uniform with some areas over represented and some areas under represented. This might affect the stability and validity of global averages over time.

Next I was interested in seeing how many data was available each year, so I did a quick plot:
Number of measurements per year

It can be seen how the number of measurements increased dramatically in the middle 20th century. But what caused that sharp increase in the amount of data?

Checking the evolution of the number of stations over time, we get our answer.
Number of stations per year

It can be seen how the number of stations has increased over time for each region.

After understanding a bit better the evolution of the number of stations, I was interested in trying to see if I could find any meaningful pattern in the temperature data. So first I did an exploratory plot with the average monthly temperature for each region.
Temperature evolution per region

The first thing that caught my attention is the seasonal variation of the temperatures, and that displaying them in a scatter plot makes it unintuitive to understand that the right end of a plot is connected to the left end of the same plot (December – January). Then I decided to give it a try using polar coordinates.
Temperature evolution (North region)  - polar plot

Temperatures are represented radially, the angular magnitude corresponds to the months in a calendar year, while colors represent the years. The fact that the ellipse is not centered shows the seasonality of the data.

Then I decided to try to get one step further and try to show in an animation the temporal evolution of this data, and with my first Processing script ever, I created the following animation.

Global average temperature evolution 1701 – 2009 from Miguel Eduardo Gil Biraud on Vimeo.

Just to finish the preliminary analysis, I decided to check the evolution of the temperature averages over time, and to do that I did the analysis for the north region.
Temperature yearly average (North region)

This graph shows a clear increase in the yearly temperature averages in the last 50 year! In a similar way that the visualization done by EagerEyes does. But is it the real story?

Temperature yearly average (North region) with latitude information

This is the same plot as before, but we have added in a color coded scale the mean latitude value for each measurement. The fact that the number of stations in the dataset changes over time, brings the mean latitude of the stations south (almost 10 degrees). Therefore, not all the temperatures have the same reference level. All in all, the chart is a case of apples to oranges comparison and it is telling a misleading story. If we plot explicitly the mean latitude variation, and the mean temperature variation, we can see that the variations follow each other.
Temperature yearly average (North region) with latitude information

This relationship between both magnitudes can be measured by the correlation factor, and in this case it is -0.7115842. Even if correlation does not imply causation, it should be a clear indicator for anyone to pay extra attention to the way in which the manipulate and present the information, as it is very easy to produce visualisations that will support a given idea even if the data says something different.

For the rest of the regions the correlation between temperature and latitude mean values are:

Region Correlation
Artic -0.7376852
North -0.7115842
Tropic -0.1567509
South 0.9392206
Antarctic 0.9336734

The difference in sign for the North and South are due to the fact that latitudes have opposite sign. While in the north lower latitudes bring the position closer to the tropic (higher temperatures), in the south this effect is achieved with higher latitudes.

In the next days I will post the R scripts I used to analyse the data, as well as the Processing program so that you can reproduce this analysis.

In these graphs, climate change cannot be seen, and it is meant as an exercise to illustrate how easy it is to produce plots that are misleading. Unfortunately, as of today, I lack the skills to reproduce the analysis that have been published in peer-reviewed papers with this dataset, but if you know how to do it, please go ahead and show us! I am eager to learn :)

As usual, I will be very grateful for any comments you have about how to improve the visualisations and the analysis.

Related posts:

  1. United Kingdom’s Met Office dataset preliminary analysis: The R script As promised, I want to share with you the R...
  2. United Kingdom’s Met Office dataset preliminary analysis: The Processing script After posting yesterday the R script that generates the averaged...

Related posts brought to you by Yet Another Related Posts Plugin.

5 Responses to “United Kingdom’s Met Office dataset preliminary analysis”

  1. on 15 Dec 2009 at 01:38 1.Franchu’s Lair » United Kingdom’s Met Office dataset preliminary analysis: The R script said …

    [...] promised, I want to share with you the R code I used to generate all the beautiful charts of the United Kingdom’s Met Office dataset preliminary analysis [...]

  2. on 15 Dec 2009 at 01:48 2.Manu said …

    Very nice graphs! However, if I may say, I think that this dataset alone is not sufficient to draw any conclusions.

    My reasoning goes like this: if the planet was made out of a uniform material that would have the exact same behavior when exposed to increased heat over the years, then perhaps conclusions could be drawn from solely temperature data. But as we know, there’s a bit more diversity on the surface: water, land, and ice. Together with the atmosphere of the planet, this leads to a quite complex and dynamic system.

    I think that one big “PR” problem with the climate change is the misleading former name (that’s still stuck in everyone’s head) “global warming”. Of course with such a name, when talking about temperature we may expect to see it raising on various graphs. Now I would argue that without looking and supplementary data such as the temperature of the oceans (which is clearly missing from the dataset, as shown in the very first graph that shows nothing more but the continents) it’s rather pointless to draw any conclusions. However I would say that water (liquid and solid) is quite receptive to increased heat, much more than soil.

    Besides of this, the separation of stations into categories (Artic, North, South, …) is a bit misleading for such an analysis, as this is a global phenomena and that a more holistic approach is necessary. Perhaps plotting the relative temperature evolution for all regions on one graph would visualize something? But then again, as just argued before, without the ocean temperature it’s not that useful.

    Cheers!

  3. on 15 Dec 2009 at 02:04 3.Franchu said …

    Hi Manu,

    I also believe that this dataset by itself it is not enough to draw any conclusions.

    My main goal was to show how easy it is to jump quickly to conclusions when we see a chart that shows what people expect to see.

    While the separation by regions might not be the best solution, there are few alternatives as you need a way to come up with measurements that are comparable and in which all the parameters remain stable in order to ensure that the variations are really in the temperature and not due to external factors.

    As I showed in the last graph, even with this separation in regions the mean latitude has changed over time affecting the results that are seen.

    A more holistic approach in which all the data is compared at the same time is difficult to come up with as the temperatures behave differently in each region (and that has nothing to do with climate change!).

    I will try three other approaches:

    - Make smaller clusters over which the averaging is done
    - Analyse the behaviour of each station individually and see if some global pattern is visible
    - Take a subset of the stations that remains constant over time and repeat the analysis just with them. By doing this the latitude drift bias would be avoided.

    That should keep me busy for a while :) If you feel adventurous, I’ve just posted the code of the script in case you want to give it a try yourself ;)

  4. on 15 Dec 2009 at 11:05 4.Joao Rei said …

    Amazing Franchu!

    I think that more than the actual results, and the conclusions you can draw from them, the way you chose to present them and their visualization is what makes it so interesting.

    Like Manu said, people are expecting to see rising temperatures, and that is mainly due to the terminology that has been used in the media.

    And yes, maybe you should focus on a subset of data only.

    keep it up!

    signed:
    your fans!

  5. on 16 Dec 2009 at 22:41 5.David said …

    Hola Franchu,

    Me encanta este post, yo tambien estoy haciendo mis pinitos con un dataset interesante, pero no me salen cosas tan chulas :)

    Saludos

Trackback This Post | Subscribe to the comments through RSS Feed

Leave a Reply