Feed on Posts or Comments

Category Archiveenglish



english & programming & visualisation Franchu on 15 Dec 2009

United Kingdom’s Met Office dataset preliminary analysis: The R script

As promised, I want to share with you the R code I used to generate all the beautiful charts of the United Kingdom’s Met Office dataset preliminary analysis post.

First of all, you need to download the dataset. I don’t want to hotlink directly the MySQL dump file that this guy produced, so just go over to his site and download it from there. In the meantime you can also see the perl script he wrote to convert the original dataset to the MySQL dump that we will be using.

Once you have downloaded the MySQL dump, uncompress it and import it into your MySQL database. In my case I created a database called “ClimateChange” and granted all rights on that database to a user called “ClimateChange” and without password.

Once you have your data in the database, you can start using R with this script available on GitHub.

english & science & visualisation Franchu on 13 Dec 2009

United Kingdom’s Met Office dataset preliminary analysis

The United Kingdom’s Met Office recently released temperature data for about 1700 weather stations across the globe from 1701 to 2009.

As lately I have been trying to learn how to use R and Processing, I decided that I would give a try to this dataset.

The first thing that I tried to do was to understand how the stations of the dataset are geographically distributed.
Stations distribution

Even if the projection is not very friendly, it is possible to recognise the main mass bodies on Earth. The density of stations is non uniform with some areas over represented and some areas under represented. This might affect the stability and validity of global averages over time.

Next I was interested in seeing how many data was available each year, so I did a quick plot:
Number of measurements per year

It can be seen how the number of measurements increased dramatically in the middle 20th century. But what caused that sharp increase in the amount of data?

Checking the evolution of the number of stations over time, we get our answer.
Number of stations per year

It can be seen how the number of stations has increased over time for each region.

After understanding a bit better the evolution of the number of stations, I was interested in trying to see if I could find any meaningful pattern in the temperature data. So first I did an exploratory plot with the average monthly temperature for each region.
Temperature evolution per region

The first thing that caught my attention is the seasonal variation of the temperatures, and that displaying them in a scatter plot makes it unintuitive to understand that the right end of a plot is connected to the left end of the same plot (December – January). Then I decided to give it a try using polar coordinates.
Temperature evolution (North region)  - polar plot

Temperatures are represented radially, the angular magnitude corresponds to the months in a calendar year, while colors represent the years. The fact that the ellipse is not centered shows the seasonality of the data.

Then I decided to try to get one step further and try to show in an animation the temporal evolution of this data, and with my first Processing script ever, I created the following animation.

Global average temperature evolution 1701 – 2009 from Miguel Eduardo Gil Biraud on Vimeo.

Just to finish the preliminary analysis, I decided to check the evolution of the temperature averages over time, and to do that I did the analysis for the north region.
Temperature yearly average (North region)

This graph shows a clear increase in the yearly temperature averages in the last 50 year! In a similar way that the visualization done by EagerEyes does. But is it the real story?

Temperature yearly average (North region) with latitude information

This is the same plot as before, but we have added in a color coded scale the mean latitude value for each measurement. The fact that the number of stations in the dataset changes over time, brings the mean latitude of the stations south (almost 10 degrees). Therefore, not all the temperatures have the same reference level. All in all, the chart is a case of apples to oranges comparison and it is telling a misleading story. If we plot explicitly the mean latitude variation, and the mean temperature variation, we can see that the variations follow each other.
Temperature yearly average (North region) with latitude information

This relationship between both magnitudes can be measured by the correlation factor, and in this case it is -0.7115842. Even if correlation does not imply causation, it should be a clear indicator for anyone to pay extra attention to the way in which the manipulate and present the information, as it is very easy to produce visualisations that will support a given idea even if the data says something different.

For the rest of the regions the correlation between temperature and latitude mean values are:

Region Correlation
Artic -0.7376852
North -0.7115842
Tropic -0.1567509
South 0.9392206
Antarctic 0.9336734

The difference in sign for the North and South are due to the fact that latitudes have opposite sign. While in the north lower latitudes bring the position closer to the tropic (higher temperatures), in the south this effect is achieved with higher latitudes.

In the next days I will post the R scripts I used to analyse the data, as well as the Processing program so that you can reproduce this analysis.

In these graphs, climate change cannot be seen, and it is meant as an exercise to illustrate how easy it is to produce plots that are misleading. Unfortunately, as of today, I lack the skills to reproduce the analysis that have been published in peer-reviewed papers with this dataset, but if you know how to do it, please go ahead and show us! I am eager to learn :)

As usual, I will be very grateful for any comments you have about how to improve the visualisations and the analysis.

english & linux Franchu on 12 Nov 2009

Upgrade problems from Ubuntu 6.10 (Edgy) to Ubuntu 9.10 (Intrepid)

These days I had to perform an upgrade of a server that had been forgotten running Ubuntu 6.10 Edgy, and as it was to be expected it didn’t go as smoothly as I would have like it.

In this post you will probably not find anything you couldn’t find anywhere else, as we did, but having it all in one place might prove useful and save you some time.

During the upgrade the mdam.conf file was overwritten with a default version of it, and the array was not automatically mounted anymore after reboot. The solution was found in a thread in the Ubuntu forums.
mdadm --examine --scan --config=mdadm.conf >> /etc/mdadm/mdadm.conf

The locale files got changed and we started to see lots of messages like:

perl: warning: Setting locale failed.perl: warning: Please check that your locale settings:
LANGUAGE = “es_ES.ISO-8859-15@euro”,
LC_ALL = (unset),
LANG = “es_ES.ISO-8859-15@euro”
are supported and installed on your system. perl: warning: Falling back to the standard locale (“C”).

The solution was found again in the Ubuntu forums. You have to edit /var/lib/locales/supported.d/es and make it look like:

es_EC.UTF-8 UTF-8
es_CL.UTF-8 UTF-8
es_DO.UTF-8 UTF-8
es_HN.UTF-8 UTF-8
es_PY.UTF-8 UTF-8
es_PR.UTF-8 UTF-8
es_NI.UTF-8 UTF-8
es_ES.UTF-8 UTF-8
es_PE.UTF-8 UTF-8
es_VE.UTF-8 UTF-8
es_GT.UTF-8 UTF-8
es_CR.UTF-8 UTF-8
es_BO.UTF-8 UTF-8
es_US.UTF-8 UTF-8
es_AR.UTF-8 UTF-8
es_PA.UTF-8 UTF-8
es_SV.UTF-8 UTF-8
es_UY.UTF-8 UTF-8
es_MX.UTF-8 UTF-8
es_CO.UTF-8 UTF-8
es_ES.ISO-8859-15@euro ISO-8859-15

Finally you have to rebuild the locales:

sudo dpgk-reconfigure locales

And last but not least, after the upgrades we were getting thrown into the busybox initramfs prompt. Apparently there is a problem with the evms and ACPI support, so we just disabled it, until we have further time to investigate the issue. The solution was found here.

Add into /boot/grub/menu.lst in the line that begins with #kopts

acpi=noacpi irqpoll

and then:

sudo update-grub
sudo aptitude remove evms

« Previous PageNext Page »