[PYTHON] If you have data on river water level and rainfall, you'll want to overlay them, right?

Introduction

I have posted an article to read the data of river water level and rainfall from the other day, but I would like to receive various opinions and overlap these two data this time. Also, considering that the state of the heavy rain the other day will be known at the same time, I think it is quite timely.

Data reading

Create a function to read each data based on the following articles created so far.

-Visualize the water level data of rivers published by Shimane Prefecture -Visualize the rainfall data published by Shimane Prefecture

python


#Library
import requests
from bs4 import BeautifulSoup
import pandas as pd

#Get a specific tag in a URL
def get_tag_from_html(urlName, tag):
  url = requests.get(urlName)
  soup = BeautifulSoup(url.content, "html.parser")
  return soup.find_all(tag)

#Get the URL of the data page from the catalog page
def get_page_urls_from_catalog(urlName):
  urlNames = []
  elems = get_tag_from_html(urlName, "a")
  for elem in elems:
    try:
      string = elem.get("class")[0]
      if string in "heading":
        href = elem.get("href")
        if href.find("resource") > 0:
          urlNames.append(urlBase + href)
    except:
      pass
  return urlNames

#Get CSV URL from data page
def get_csv_urls_from_url(urlName):
  urlNames = []
  elems = get_tag_from_html(urlName, "a")
  for elem in elems:
    try:
      href = elem.get("href")
      if href.find(".csv") > 0:
        urlNames.append(href)
    except:
      pass
  return urlNames[0]

#Process the acquired CSV data
def data_cleansing(df):
  
  print("set timestamp as index.")
  df.index = df["Observatory"].map(lambda _: pd.to_datetime(_))
  df = df.sort_index()
  
  print("replace words to -1.")
  df = df.replace('Not collected', '-1')
  df = df.replace('Missing', '-1')
  df = df.replace('Maintenance', '-1')

  print("edit name of columns.")
  cols = df.columns.tolist()
  for i in range(len(cols)):
    if cols[i].find("name") > 0:
      cols[i] = cols[i-1] + "_Accumulation"
  df.columns = cols

  print("change data type to float.")
  cols = df.columns[1:]
  for col in cols:
    df[col] = df[col].astype("float")

  return df

#Acquisition of rainfall data
def get_rain_data():

  urlName = urlBase + "/db/dataset/010009"

  urlNames = get_page_urls_from_catalog(urlName)

  urls = []

  for urlName in urlNames:
    urls.append(get_csv_urls_from_url(urlName))

  df = pd.DataFrame()

  for url in urls:
    #Only for data every 10 minutes
    if url.find("10min") > 0:
      df = pd.concat([df, pd.read_csv(url, encoding="Shift_JIS").iloc[2:]])

  return data_cleansing(df)

#Acquisition of river water level data
def get_level_data():

  urlName = urlBase + "/db/dataset/010010"

  urlNames = get_page_urls_from_catalog(urlName)

  urls = []

  for urlName in urlNames:
    urls.append(get_csv_urls_from_url(urlName))

  df = pd.DataFrame()

  for url in urls:
      df = pd.concat([df, pd.read_csv(url, encoding="Shift_JIS").iloc[6:]])

  return data_cleansing(df)

#Set domain URL
urlBase = "https://shimane-opendata.jp"

This time, both the rainfall and the cumulative rainfall in the rainfall data can be used.

Using the above function, get the river water level and rainfall data as follows.

python


df_rain = get_rain_data()
df_level = get_level_data()

Visualization

Preparation

Be prepared so that Japanese characters are not garbled in the graph.

python


#Preparing for visualization
!pip install japanize_matplotlib

import matplotlib.pyplot as plt
import japanize_matplotlib 
import seaborn as sns

sns.set(font="IPAexGothic")

rainfall

I will extract only 5 and make a graph.

python


cols = df_rain.columns[1:]
df_rain[cols[:5]].plot(figsize=(15,5))

Unknown.png

You can see that the amount of rain from the 13th is very heavy.

River water level

In the same way, I will extract some excerpts and graph them.

python


cols = df_level.columns[1:]
df_level[cols[:5]].plot(figsize=(15,5))

Unknown-2.png

Similarly, you can see that the water level rises when there is a lot of rainfall.

Overlay

In the date information of the data, the range where both the river water level and the rainfall exist is acquired.

python


idx_min = df_rain.index.min()
idx_max = df_rain.index.max()

idx_min = idx_min if df_level.index.min() < idx_min else df_level.index.min()
idx_max = idx_max if df_level.index.max() > idx_max else df_level.index.max()

Create a function to acquire the data of river water level and rainfall by specifying the column name.

python


def get_marged_dataframe(cols_rain, cols_level):

  df = pd.DataFrame()

  df = df_rain[idx_min:idx_max][cols_rain]

  new_cols = []
  for col in cols_rain:
    new_cols.append(col + "_rainfall[mm]")

  df.columns = new_cols

  for col in cols_level:
    df[col + "_Water level[cm]"] = df_level[idx_min: idx_max][col] * 100

  df.tail()
  return df

Using the above function, create a function to graph by specifying the column name.

python


def plot(cols_rain, cols_level):
  df = get_marged_dataframe(cols_rain, cols_level)

  #Draw the entire range
  df.plot(figsize=(15,5))
  plt.show()

  #Draw after July 12th
  df["2020-07-12":].plot(figsize=(15,5))
  plt.show()

Draw a graph by specifying an appropriate column.

python


cols_rain = ["Matsue_gawa_Accumulation"]
cols_level = ["Kyobashi River"]

plot(cols_rain, cols_level)

Unknown-3.png

Unknown-4.png

Here, it seems that the river water level is divided into parts that are likely to correlate with rainfall and parts that are unlikely to correlate with rainfall.

Perhaps the effects of increased water at high tide are greater than the amount of rainfall. Also, the amount of rainfall may be affected when it exceeds a certain level.

... I'm not an expert, so I'm not sure. Sweat

In any case, I found that if there was data, it could be easily visualized and separate data could be overlaid and examined.

If you have such needs, we look forward to hearing from you! Also, if you have the data, I think it will be fun to do various things if you publish it in the same way.

Recommended Posts

If you have data on river water level and rainfall, you'll want to overlay them, right?
I want to know if you install Python on Mac ・ Iroha