[PYTHON] Use of past weather data 4 (feelings of the weather during the Tokyo Olympics)

The Japan Meteorological Agency will provide historical weather data free of charge until the end of March 2020. (Reference) Usage environment of past weather data https://www.data.jma.go.jp/developer/past_data/index.html

The basic weather data is "Anyone can use it regardless of the purpose and target of use", so we will do something using the weather data.

You can also download it from the "Past Meteorological Data Download" page of the Japan Meteorological Agency, but it is very convenient because you can download it all at once. The available data is listed below. https://www.data.jma.go.jp/developer/past_data/data_list_20200114.pdf The deadline is coming soon, so if you need it, download it early.

This time, I will try to express the weather during the Tokyo Olympics in letters. WordCloud (1.6.0) is used to display characters.

Data download

As with "Use of past meteorological data 2 (change in maximum temperature during the Tokyo Olympics)", "Ground weather observation"-"Hourly / daily values" will be used. Check the following for the file format. http://data.wxbc.jp/basic_data/kansoku/surface/format_surface.pdf

Download the file to the surface folder. It takes time because it has a capacity of about 2GB.

import os
import urllib.request

#Ground weather observation hourly / daily value file download
url    = 'http://data.wxbc.jp/basic_data/kansoku/surface/hourly_daily_1872-2019_v191121.tar'
folder = 'surface'
path   = 'surface/hourly_daily_1872-2019_v191121.tar'
#Create folder
os.makedirs(folder, exist_ok=True)
if not os.path.exists(path):
    #download
    urllib.request.urlretrieve(url, path)

For details of the file, refer to "Use of past weather data 2 (Transition of maximum temperature during the Tokyo Olympics)".

Reading weather data

As for the weather data, "daytime" and "nighttime" data are stored as the general weather conditions. This time, we will use the "daytime" data. The weather overview consists of up to four weathers and conjunctions. For example, if it is sunny all day long, it will be "fine". For example, if it is composed of multiple weathers, it will be "cloudy and sometimes sunny", "cloudy and then sunny", and "cloudy and sometimes sunny and accompanied by lightning".

code conjunction
0 No data
1 Blank
2 Temporary
3 Sometimes
4 rear
5 Temporarily after
6 Sometimes later
7 Accompanied by (accompanied by ○○)
code weather code weather code weather
0 No weather 10 Sleet 20 Typhoon
1 Sunny 11 snow 21 Thunder
2 Fine 12 heavy snow 22 Hailstone
3 Light cloud 13 Blizzard 23 leopard
4 Cloudy 14 Fubuki 24 Typhoon / thunder
5 fog 15 Fubuki 25 Lightning and hail
6 Drizzle 16 Reserve 26 Lightning / hail
7 rain 17 Reserve 27 Lightning / fog
8 heavy rain 18 Reserve 28 No precipitation
9 storm 19 Reserve 29 There is a sunny day
30 Reserve
31 ×

"Heavy rain" is used when there is rainfall of 30 mm or more. For details, refer to the following site. http://www.data.jma.go.jp/obd/stats/data/mdrr/man/gaikyo.html

Prepare a list for conversion.

conjunction = ['No data', 'Blank', 'Temporary', 'Sometimes', 'rear', 'rearTemporary', 'rearSometimes', '、']
conjunction7 = 'Accompanied by'
weather = ['No weather', 'Sunny', 'Fine', 'Light cloud', 'Cloudy', 'fog', 'fograin', 'rain', '大rain', '暴風rain',
           'Sleet', 'snow', '大snow', '暴風snow', 'Fubuki', '地Fubuki', 'Reserve', 'Reserve', 'Reserve', 'Reserve',
           'Typhoon', 'Thunder', 'Hailstone', 'leopard', 'Typhoon・Thunder', 'Thunder・Hailstone', 'Thunder・leopard', 'Thunder・霧', 'No precipitation', 'There is a sunny day',
           'Reserve', '×']

Conjunctions are generally added before the weather, but in the case of "accompanied", the weather (the part of XX) is in the middle, such as "accompanied by XX".

Load data into a pandas data frame. The weather overview is stored from the 1500th byte of each day. Read while converting numbers to characters. Insert a space between each word for later display in WordCloud. WordCloud doesn't break words down. You need to break it down into words and pass it in advance.

#Create data frame for data storage
import pandas as pd

tokyo_df = pd.DataFrame()
#Get weather overview
import tarfile

#Point setting=Tokyo
p_no = '662'

#Get the files contained in the tar file
with tarfile.open(path, 'r') as tf:
    for tarinfo in tf:
        if tarinfo.isfile():
            # tar.Get the files contained in the gz file
            with tarfile.open(fileobj=tf.extractfile(tarinfo), mode='r') as tf2:
                for tarinfo2 in tf2:
                    if tarinfo2.isfile():
                        #Read only files with matching points
                        if tarinfo2.name[-3:] == p_no:
                            print(tarinfo2.name)
                            #Open file
                            with tf2.extractfile(tarinfo2) as tf3:
                                lines = tf3.readlines()
                                for line in lines:
                                    #Ignore files that do not contain data
                                    if line[0:3] == b'   ':
                                        continue
                                    #Year
                                    year = line[14:18].decode()
                                    #time
                                    date = line[18:22].decode().replace(' ', '0')
                                    #Get weather overview
                                    conditions = ''
                                    p = 1500
                                    #
                                    for i in range(4):
                                        #conjunction
                                        c = int(line[p:p+1])
                                        c_rmk = int(line[p+1:p+2])
                                        if c_rmk == 8:
                                            conditions += conjunction[c] + ' '
                                        #weather
                                        w = int(line[p+2:p+4])
                                        w_rmk = int(line[p+4:p+5])
                                        if w_rmk == 8:
                                            conditions += weather[w] + ' '
                                            if c == 7: #If accompanied by 〇〇, attach it to the back.
                                                conditions += conjunction7 + ' '
                                        p += 5

                                    #Data storage
                                    tokyo_df.loc[year, date] = conditions

Check the data.

#Data confirmation
tokyo_df

There is no weather information in the old days. It seems that the current format is after 1989, so I will take out the data after 1989. The date will be the Olympic period.

tokyo_olympic_df = tokyo_df.loc['1989':'2019','0724':'0809']
tokyo_olympic_df

The weather is correct.

Display by Word Cloud

Let's display the weather using WordCloud. The size of the letters changes according to the number of words used in the weather conditions. It may not be possible to accurately reflect the weather, but you should be able to get a rough idea of the situation, as "fine" is counted as the same one for both "fine" and "temporary cloudy" all day long. It also contains conjunctions, but leave them as they are to indicate the change in weather.

First, let's check the weather during the Olympic Games every year. For each year, we combine the weather overview strings and pass them to WordCloud. It is a point to note. -Specify the font (font_path) to display Japanese. My environment specified MS Gothic for Winodws. Please change to the appropriate font according to your environment. -By default, single-letter words are not displayed. Specify the regular expression with regexp and display it.

import matplotlib.pyplot as plt
from wordcloud import WordCloud
#Display by year
i = 1
plt.figure(figsize=(16, 33))
for row in tokyo_olympic_df.index:
    text = ''
    for column in tokyo_olympic_df.columns:
        text += tokyo_olympic_df.loc[row, column]
    #Character image creation with WordCloud
    wordcloud = WordCloud(colormap='jet', font_path="msgothic.ttc", regexp="\w+").generate(text)
    plt.subplot(11,3,i)
    plt.imshow(wordcloud)
    plt.title(row)
    plt.axis("off")
    i += 1
plt.show()

The result. Actually, it would be nice if the font color could be changed according to the weather such as fine weather and rain, but it seems that the font color is randomly selected. There are some variations depending on the year, but it is mostly sunny and cloudy. WordCloud_year.png

By date.

#Display by date
i = 1
plt.figure(figsize=(16, 18))
for column in tokyo_olympic_df.columns:
    text = ''
    for row in tokyo_olympic_df.index:
        text += tokyo_olympic_df.loc[row, column]
    #Character image creation with WordCloud
    wordcloud = WordCloud(colormap='jet', font_path="msgothic.ttc", regexp="\w+").generate(text)
    plt.subplot(6,3,i)
    plt.imshow(wordcloud)
    plt.title(column)
    plt.axis("off")
    i += 1
plt.show()

It's midsummer, so it's sunny. WordCloud_date.png

Weather feeling calendar

Since it's a big deal, let's create a weather feeling calendar. It is for one year. Since 2019 is only halfway data, it will be until 2018. Also, February 29th of the leap year was deleted.

tokyo_365_df = tokyo_df.loc['1989':'2018']
tokyo_365_df = tokyo_365_df.drop('0229', axis=1)
tokyo_365_df
#Display by date
i = 1
plt.figure(figsize=(16, 40))
for column in tokyo_365_df.columns:
    text = ''
    for row in tokyo_365_df.index:
        text += tokyo_365_df.loc[row, column]
    #Character image creation with WordCloud
    wordcloud = WordCloud(colormap='jet', font_path="msgothic.ttc", regexp="\w+").generate(text)
    plt.subplot(37,10,i)
    plt.imshow(wordcloud)
    plt.title(column)
    plt.axis("off")
    i += 1
plt.show()

It is a calendar for one year in Tokyo. WordCloud_tokyo.png

During the rainy season, the rain is noticeable. October 10th, when the opening ceremony of the last Tokyo Olympics, which is said to be a peculiar day of fine weather, seems to have less rain than the surrounding dates. I think it's a peculiar day of fine weather.

Points can be addressed by changing the point number of p_no = '662'. If you created a calendar for your local location, change the location number. You can check the point number in the point information history file (smaster_201909.tar.gz).

This is an example of Sapporo (412). There is a lot of snow in winter.

WordCloud_sapporo.png

This time, I tried to express the weather by the size of letters. It is more accurate to express it numerically or in a graph, but it is intuitive and interesting to express it in the size of letters.

The data will be released until the end of March 2020, so if you need it, we recommend you to download it as soon as possible.

Recommended Posts

Use of past weather data 4 (feelings of the weather during the Tokyo Olympics)
Use of past meteorological data 1 (Display of AMeDAS points)
Use of past meteorological data 3 (Time-series heat map display of precipitation during heavy rain)
Let's use the open data of "Mamebus" in Python
Try scraping the data of COVID-19 in Tokyo with Python
Data analysis based on the election results of the Tokyo Governor's election (2020)
Use PyCaret to predict the price of pre-owned apartments in Tokyo!
Explain the mechanism of PEP557 data class
The story of verifying the open data of COVID-19
Get the column list & data list of CASTable
Visualize the export data of Piyo log