Feed on Posts or Comments

Category Archivescience



english & science & visualisation Franchu on 13 Dec 2009

United Kingdom’s Met Office dataset preliminary analysis

The United Kingdom’s Met Office recently released temperature data for about 1700 weather stations across the globe from 1701 to 2009.

As lately I have been trying to learn how to use R and Processing, I decided that I would give a try to this dataset.

The first thing that I tried to do was to understand how the stations of the dataset are geographically distributed.
Stations distribution

Even if the projection is not very friendly, it is possible to recognise the main mass bodies on Earth. The density of stations is non uniform with some areas over represented and some areas under represented. This might affect the stability and validity of global averages over time.

Next I was interested in seeing how many data was available each year, so I did a quick plot:
Number of measurements per year

It can be seen how the number of measurements increased dramatically in the middle 20th century. But what caused that sharp increase in the amount of data?

Checking the evolution of the number of stations over time, we get our answer.
Number of stations per year

It can be seen how the number of stations has increased over time for each region.

After understanding a bit better the evolution of the number of stations, I was interested in trying to see if I could find any meaningful pattern in the temperature data. So first I did an exploratory plot with the average monthly temperature for each region.
Temperature evolution per region

The first thing that caught my attention is the seasonal variation of the temperatures, and that displaying them in a scatter plot makes it unintuitive to understand that the right end of a plot is connected to the left end of the same plot (December – January). Then I decided to give it a try using polar coordinates.
Temperature evolution (North region)  - polar plot

Temperatures are represented radially, the angular magnitude corresponds to the months in a calendar year, while colors represent the years. The fact that the ellipse is not centered shows the seasonality of the data.

Then I decided to try to get one step further and try to show in an animation the temporal evolution of this data, and with my first Processing script ever, I created the following animation.

Global average temperature evolution 1701 – 2009 from Miguel Eduardo Gil Biraud on Vimeo.

Just to finish the preliminary analysis, I decided to check the evolution of the temperature averages over time, and to do that I did the analysis for the north region.
Temperature yearly average (North region)

This graph shows a clear increase in the yearly temperature averages in the last 50 year! In a similar way that the visualization done by EagerEyes does. But is it the real story?

Temperature yearly average (North region) with latitude information

This is the same plot as before, but we have added in a color coded scale the mean latitude value for each measurement. The fact that the number of stations in the dataset changes over time, brings the mean latitude of the stations south (almost 10 degrees). Therefore, not all the temperatures have the same reference level. All in all, the chart is a case of apples to oranges comparison and it is telling a misleading story. If we plot explicitly the mean latitude variation, and the mean temperature variation, we can see that the variations follow each other.
Temperature yearly average (North region) with latitude information

This relationship between both magnitudes can be measured by the correlation factor, and in this case it is -0.7115842. Even if correlation does not imply causation, it should be a clear indicator for anyone to pay extra attention to the way in which the manipulate and present the information, as it is very easy to produce visualisations that will support a given idea even if the data says something different.

For the rest of the regions the correlation between temperature and latitude mean values are:

Region Correlation
Artic -0.7376852
North -0.7115842
Tropic -0.1567509
South 0.9392206
Antarctic 0.9336734

The difference in sign for the North and South are due to the fact that latitudes have opposite sign. While in the north lower latitudes bring the position closer to the tropic (higher temperatures), in the south this effect is achieved with higher latitudes.

In the next days I will post the R scripts I used to analyse the data, as well as the Processing program so that you can reproduce this analysis.

In these graphs, climate change cannot be seen, and it is meant as an exercise to illustrate how easy it is to produce plots that are misleading. Unfortunately, as of today, I lack the skills to reproduce the analysis that have been published in peer-reviewed papers with this dataset, but if you know how to do it, please go ahead and show us! I am eager to learn :)

As usual, I will be very grateful for any comments you have about how to improve the visualisations and the analysis.

GIS & english & programming & science Franchu on 07 Jun 2009

GEOSTAT2009

Last month I had the chance to attend GEOSTAT2009, a summer school on spatio-temporal data analysis with R + SAGA + Google Earth that was held at MedILS in Split, Croatia.

The summer school was a great opportunity to get some hands on experience with R and SAGA, as learning them on your own can be a little bit tricky. We were also very lucky to have three teachers (Roger Bivand, Olaf Conrad and Tomislav Hengl) that have a lot of experience with those pieces of software and the field of geostatistics.

But one of the things I liked most of the course is that there was a wide variety of fields represented in the course as we had biologists, geologists, geographers, meteorologists, engineers and software developers! (I think I am still missing some fields…) It was an eye opening experience for me, as the same techniques can be applied to all those sciences and just the interpretation of the outputs differ to accommodate the processes involved in each discipline.

Given my experience in software development, I was able to grasp very quickly how to write the code to implement the steps needed to solve a problem. Nevertheless, my sub-par knowledge of geostatistics was the handicap I had to fight with during all the course. That is why I really enjoyed when we were doing the practical exercises and we were discussing with other participants. They always had the theoretical background to explain what was needed to do, and we were able to implement it faster than them. Symbiosis at its best :)

All this took place in a wonderful place, a former Tito’s residence (Villa Dalmatia), where one of the buildings of the complex has been reconverted into a life sciences research institute. We had our own private beach, basketball court, tennis table, and a great food catering with lots of variety. In such a wonderful environment it is difficult not to have fun and forge new friendships while learning really advanced stuff.

For the weekend, we went to Brac, an island that is in front of Split and we had a great day walking around to Pustinja Blaca (an old monastery in the mountains), and swimming in Bol.

You can see some of the photos I took during that week:

In the following days, work permitting, I will post some interesting interviews and ideas I got during the course.

general & science & spanish Franchu on 01 May 2008

Un momento histórico

Cuando he visto esta foto me ha sorprendido ver a tantas personas relevantes para la ciencia juntas en una misma ocasión y siempre me ha surgido la duda de si eran conscientes de la importancia que sus descubrimientos tendrían en el futuro.

Así podemos ver en la foto, entre otros, a:

Next Page »