I have posted an article to read the data of river water level and rainfall from the other day, but I would like to receive various opinions and overlap these two data this time. Also, considering that the state of the heavy rain the other day will be known at the same time, I think it is quite timely.
Create a function to read each data based on the following articles created so far.
-Visualize the water level data of rivers published by Shimane Prefecture -Visualize the rainfall data published by Shimane Prefecture
python
#Library
import requests
from bs4 import BeautifulSoup
import pandas as pd
#Get a specific tag in a URL
def get_tag_from_html(urlName, tag):
url = requests.get(urlName)
soup = BeautifulSoup(url.content, "html.parser")
return soup.find_all(tag)
#Get the URL of the data page from the catalog page
def get_page_urls_from_catalog(urlName):
urlNames = []
elems = get_tag_from_html(urlName, "a")
for elem in elems:
try:
string = elem.get("class")[0]
if string in "heading":
href = elem.get("href")
if href.find("resource") > 0:
urlNames.append(urlBase + href)
except:
pass
return urlNames
#Get CSV URL from data page
def get_csv_urls_from_url(urlName):
urlNames = []
elems = get_tag_from_html(urlName, "a")
for elem in elems:
try:
href = elem.get("href")
if href.find(".csv") > 0:
urlNames.append(href)
except:
pass
return urlNames[0]
#Process the acquired CSV data
def data_cleansing(df):
print("set timestamp as index.")
df.index = df["Observatory"].map(lambda _: pd.to_datetime(_))
df = df.sort_index()
print("replace words to -1.")
df = df.replace('Not collected', '-1')
df = df.replace('Missing', '-1')
df = df.replace('Maintenance', '-1')
print("edit name of columns.")
cols = df.columns.tolist()
for i in range(len(cols)):
if cols[i].find("name") > 0:
cols[i] = cols[i-1] + "_Accumulation"
df.columns = cols
print("change data type to float.")
cols = df.columns[1:]
for col in cols:
df[col] = df[col].astype("float")
return df
#Acquisition of rainfall data
def get_rain_data():
urlName = urlBase + "/db/dataset/010009"
urlNames = get_page_urls_from_catalog(urlName)
urls = []
for urlName in urlNames:
urls.append(get_csv_urls_from_url(urlName))
df = pd.DataFrame()
for url in urls:
#Only for data every 10 minutes
if url.find("10min") > 0:
df = pd.concat([df, pd.read_csv(url, encoding="Shift_JIS").iloc[2:]])
return data_cleansing(df)
#Acquisition of river water level data
def get_level_data():
urlName = urlBase + "/db/dataset/010010"
urlNames = get_page_urls_from_catalog(urlName)
urls = []
for urlName in urlNames:
urls.append(get_csv_urls_from_url(urlName))
df = pd.DataFrame()
for url in urls:
df = pd.concat([df, pd.read_csv(url, encoding="Shift_JIS").iloc[6:]])
return data_cleansing(df)
#Set domain URL
urlBase = "https://shimane-opendata.jp"
This time, both the rainfall and the cumulative rainfall in the rainfall data can be used.
Using the above function, get the river water level and rainfall data as follows.
python
df_rain = get_rain_data()
df_level = get_level_data()
Be prepared so that Japanese characters are not garbled in the graph.
python
#Preparing for visualization
!pip install japanize_matplotlib
import matplotlib.pyplot as plt
import japanize_matplotlib
import seaborn as sns
sns.set(font="IPAexGothic")
I will extract only 5 and make a graph.
python
cols = df_rain.columns[1:]
df_rain[cols[:5]].plot(figsize=(15,5))
You can see that the amount of rain from the 13th is very heavy.
In the same way, I will extract some excerpts and graph them.
python
cols = df_level.columns[1:]
df_level[cols[:5]].plot(figsize=(15,5))
Similarly, you can see that the water level rises when there is a lot of rainfall.
In the date information of the data, the range where both the river water level and the rainfall exist is acquired.
python
idx_min = df_rain.index.min()
idx_max = df_rain.index.max()
idx_min = idx_min if df_level.index.min() < idx_min else df_level.index.min()
idx_max = idx_max if df_level.index.max() > idx_max else df_level.index.max()
Create a function to acquire the data of river water level and rainfall by specifying the column name.
python
def get_marged_dataframe(cols_rain, cols_level):
df = pd.DataFrame()
df = df_rain[idx_min:idx_max][cols_rain]
new_cols = []
for col in cols_rain:
new_cols.append(col + "_rainfall[mm]")
df.columns = new_cols
for col in cols_level:
df[col + "_Water level[cm]"] = df_level[idx_min: idx_max][col] * 100
df.tail()
return df
Using the above function, create a function to graph by specifying the column name.
python
def plot(cols_rain, cols_level):
df = get_marged_dataframe(cols_rain, cols_level)
#Draw the entire range
df.plot(figsize=(15,5))
plt.show()
#Draw after July 12th
df["2020-07-12":].plot(figsize=(15,5))
plt.show()
Draw a graph by specifying an appropriate column.
python
cols_rain = ["Matsue_gawa_Accumulation"]
cols_level = ["Kyobashi River"]
plot(cols_rain, cols_level)
Here, it seems that the river water level is divided into parts that are likely to correlate with rainfall and parts that are unlikely to correlate with rainfall.
Perhaps the effects of increased water at high tide are greater than the amount of rainfall. Also, the amount of rainfall may be affected when it exceeds a certain level.
... I'm not an expert, so I'm not sure. Sweat
In any case, I found that if there was data, it could be easily visualized and separate data could be overlaid and examined.
If you have such needs, we look forward to hearing from you! Also, if you have the data, I think it will be fun to do various things if you publish it in the same way.