Weather data

Weather data

What to do with your “cluster”?

I have been downloading data from the UK Metoffice for years using a simple script. The files have observations for all UK weather stations in the last 24 hours in JSON format.

So, how can we extract the data? I tried a few things including R, but one way is to stick the data into Hadoop and define a table in HIVE. The table definition can be found on github (WordPress seems to mangle the table definition). The brickhouse jar is handy for operating on JSON and can easily be compiled from the git repo. To make the data more accessible we can define a second table
Which presents the data as

hive> select * from weather_t limit 10;
OK
3002    BALTASOUND  60.749  -0.854  SCOTLAND    EUROPE  15.0    2018-12-31Z 0.0 p   1016
3002    BALTASOUND  60.749  -0.854  SCOTLAND    EUROPE  15.0    2018-12-31Z 0.0 s   28
3002    BALTASOUND  60.749  -0.854  SCOTLAND    EUROPE  15.0    2018-12-31Z 0.0 d   WSW
3002    BALTASOUND  60.749  -0.854  SCOTLAND    EUROPE  15.0    2018-12-31Z 0.0 t   9.5
3002    BALTASOUND  60.749  -0.854  SCOTLAND    EUROPE  15.0    2018-12-31Z 0.0 pt  F
3002    BALTASOUND  60.749  -0.854  SCOTLAND    EUROPE  15.0    2018-12-31Z 0.0 v   9000
3002    BALTASOUND  60.749  -0.854  SCOTLAND    EUROPE  15.0    2018-12-31Z 0.0 w   8
3002    BALTASOUND  60.749  -0.854  SCOTLAND    EUROPE  15.0    2018-12-31Z 0.0 h   94.0
3002    BALTASOUND  60.749  -0.854  SCOTLAND    EUROPE  15.0    2018-12-31Z 0.0 dp  8.6
3002    BALTASOUND  60.749  -0.854  SCOTLAND    EUROPE  15.0    2018-12-31Z 1.0 p   1016
Time taken: 0.155 seconds, Fetched: 10 row(s)