Visualize railway line data and solve the shortest path problem (Python + Pandas + NetworkX)

Introduction

The route map and the graph network go well together, and when you try to think about something on the route map, you will touch on the shortest path problem that anyone can think of.

Turn the railroad map into a simple graph problem and use Dijkstra's algorithm to find the shortest path. Dijkstra's algorithm is built into the Python library NetworkX, which is convenient, so we will use it this time.

The program is uploaded to Google Colaboratory so you can run it in your browser. Here https://nbviewer.jupyter.org/github/galileo15640215/train/blob/master/train_dijkstra.ipynb

flow

Even though it is a route map, it is too large to handle national data suddenly, and before considering routes nationwide, first extract and create Tokyo Metro routes as a small problem.

Order ・ Data processing ・ Route map within Tokyo Metro ・ The shortest route within the Tokyo Metro area ・ Nationwide route map ・ The shortest route nationwide

Required data

From station data.jp https://ekidata.jp/ Business data company20200309.csv Route data line20200306free.csv Station data station20200316free.csv Connection station data join20200306.csv Download. (CSV file name as of May 1, 2020)

We will create the necessary data based on this.

program

#Required modules
import pandas as pd
import matplotlib.pyplot as plt
import japanize_matplotlib
import numpy as np
import networkx as nx
#Create a pandsa format table from a csv file
station = pd.read_csv("station20200316free.csv")
join = pd.read_csv("join20200306.csv")
#pref = pd.read_csv("pref.csv")
line = pd.read_csv("line20200306free.csv")
company = ("company20200309.csv")

So far, we've put the necessary data into the Pandas table.

Tokyo Metro version

Next, extract the data for Tokyo Metro only.

#Tokyo Metro extracts only Tokyo Metro stations from stations nationwide...company_cd == 18
metro = station[["station_cd", "station_name", "line_cd", "lon", "lat"]]
metro = pd.merge(metro, line, on = 'line_cd')
metro = metro[metro["company_cd"] == 18]
metro = metro[["station_cd", "station_name", "line_cd", "lon_x", "lat_x", "line_name", "line_color_c", "line_color_t"]]
lon = metro["lon_x"]
lat = metro["lat_x"]
metro["lon"] = lon
metro["lat"] = lat
metro = metro[["station_cd", "station_name", "line_cd", "lon", "lat", "line_name"]]

#Route to extract the connection side of Tokyo Metro...line_cd == 28001---28010
metro_join = join[(join["line_cd"]==28001)|(join["line_cd"]==28002)|(join["line_cd"]==28003)|(join["line_cd"]==28004)|(join["line_cd"]==28005)|(join["line_cd"]==28006)|(join["line_cd"]==28007)|(join["line_cd"]==28008)|(join["line_cd"]==28009)|(join["line_cd"]==28010)]
metro_join = metro_join[["station_cd1", "station_cd2"]]

From here, create a graph of the Tokyo Metro route map.

#Graph declaration
G = nx.Graph()
#Make the apex the station name
G.add_nodes_from(metro["station_name"])
#Set the coordinates of plot
pos={}
for i, j, k in zip(metro["station_name"], metro["lon"], metro["lat"]):
  pos[i] = (j, k)
#List e to station_name and station_Store cd and link
e = []
for i, j in zip(metro["station_name"], metro["station_cd"]):
  e.append([i, j])
#Add edge information to the graph
for i, j in zip(metro_join["station_cd1"], metro_join["station_cd2"]):
    for k in e:
      if k[1] == i:
        for l in e:
          if l[1] == j:
            G.add_edge(k[0], l[0])
#Graph output settings
plt.figure(figsize=(10,10),dpi=200)
plt.title('Tokyo metro', fontsize=20)
plt.axes().set_aspect('equal', 'datalim')
nx.draw_networkx(G, pos, node_color='b', alpha=0.8, node_size=10, font_size=5, font_family='IPAexGothic')
plt.show()

image.png

I was able to visualize the route map of Tokyo Metro. From here, give the graph the information needed to solve the shortest path problem.

Give information to the graph

#Data creation to give the distance between stations as a weight on the side
dist = []
cd1_lat = []
cd1_lon = []
cd2_lat = []
cd2_lon = []
not_exist = [] #station that exists in the join table but not in the station table_Store cd
for i, j in zip(metro_join["station_cd1"], metro_join["station_cd2"]):
  flag = True
  for k, l, m in zip(metro["station_cd"], metro["lat"], metro["lon"]):
    if i == k:
      cd1_x = l
      cd1_y = m
      cd1_lat.append(l)
      cd1_lon.append(m)
      flag = False
    if j == k:
      cd2_x = l
      cd2_y = m
      cd2_lat.append(l)
      cd2_lon.append(m)
  if flag:
    not_exist.append([i, j])
    #print(i, j)
  else:
    dist.append(((cd1_x-cd2_x)**2 + (cd1_y-cd2_y)**2)**0.5)
#If you execute it as it is, an error ValueError: Length of values does not match length of index
#Apparently"station_cd" ==Removed 2800701 and 2800702 because they do not exist in the station table and are unnecessary from the join table
#The following two lines...metro_join = metro_join[metro_join["station_cd1"] != 2800701]...Equivalent to
for i in range(len(not_exist)):
  metro_join = metro_join[metro_join["station_cd1"] != not_exist[i][0]]
#Add column to join table
metro_join["cd1_lat"] = cd1_lat
metro_join["cd1_lon"] = cd1_lon
metro_join["cd2_lat"] = cd2_lat
metro_join["cd2_lon"] = cd2_lon
metro_join["distance"] = dist

Add edge weights to the graph

#nodes is station_name
#Give the graph edge weights
for i, j, m in zip(metro_join["station_cd1"], metro_join["station_cd2"], metro_join["distance"]):
    for k in e:
      if k[1] == i:
        for l in e:
          if l[1] == j:
            G.add_edge(k[0], l[0], weight=m)

The NetworkX library, Networkx.dijkstra_path (), which is needed to solve the shortest path problem, does not seem to support the Japanese apex. Change the vertices from station_name to station_cd and declare a new graph.

#nodes is station_cd
G = nx.Graph()
G.add_nodes_from(metro["station_cd"])
pos={}
for i, j, k in zip(metro["station_cd"], metro["lon"], metro["lat"]):
  pos[i] = (j, k)
for i, j, m in zip(metro_join["station_cd1"], metro_join["station_cd2"], metro_join["distance"]):
  G.add_edge(i, j, weight=m)
#station_The station with the same name when cd is the apex_Even with name, station for each line_Since cd is set, it does not have a connection side with other lines as it is
#Therefore, connect stations with the same name with a weight of 0.
for i, j in zip(metro["station_name"], metro["station_cd"]):
  for k, l in zip(metro["station_name"], metro["station_cd"]):
    if i == k and j != l:
      G.add_edge(j, l, weight=0)
#Set start and finish stations
st_name = "Ogikubo"
go_name = "Nishi-Funabashi"
#station_from name to station_Search cd
for i, j in zip(metro["station_name"], metro["station_cd"]):
  if i == st_name:
    st = j
  if i == go_name:
    go = j
#Search for the shortest path
dij = nx.dijkstra_path(G, st, go)
out = []
for k in range(len(dij)):
  for i, j in zip(metro["station_name"], metro["station_cd"]):
    if j == dij[k]:
      out.append(i)
print(out)
#Declare a graph for the shortest path
G_root = nx.Graph()
G_root.add_nodes_from(dij)
pos_root = {}
for l in dij:
  for i, j, k in zip(metro["station_cd"], metro["lon"], metro["lat"]):
    if l == i:
      pos_root[l] = (j, k)
for i in range(len(dij)-1):
  G_root.add_edge(dij[i], dij[i+1])

plt.figure(figsize=(10,10),dpi=200)
plt.title('Tokyo metro', fontsize=20)
plt.axes().set_aspect('equal', 'datalim')
nx.draw_networkx(G, pos, node_color='b', alpha=0.3, node_size=10, with_labels= False)
c = ['green' if n==st else 'red' if n!=go else'yellow' for n in G_root.nodes()]
n_size = [30 if n==st else 10 if n!=go else 30 for n in G_root.nodes()]
nx.draw_networkx(G_root, pos_root, node_color=c, alpha=0.9, node_size=n_size, with_labels= False)
plt.show()

Execution result ['Ogikubo','Minami Asagaya',' Shinkoenji','Higashikoenji','Shin Nakano','Nakano Sakagami','Nishi Shinjuku','Shinjuku','Shinjuku 3-chome','Shinjuku Gyoenmae', "Yotsuya 3-chome", "Yotsuya", "Akasaka Mitsuke", "In front of the Parliament Building", "In front of the Parliament Building", "Kasumigaseki", "Hibiya", "Hibiya", "Ginza", "Ginza", "Kyobashi" ,'Nipponbashi','Nihonbashi','Kayabacho','Monzennakacho','Kiba','Toyocho','Minamisagocho','Nishi Kasai','Kasai','Urayasu','Minami Gyotoku', 'Gyotoku','Myonori',' Log Nakayama',' Nishifunabashi']

image.png

I was able to draw the shortest route on the Tokyo Metro. How does it compare to the actual excellent transfer guidance app? image.png

image.png

The program has more transfers. However, the route is almost the same for a small one like Tokyo Metro.

Next, the national version

The same thing is done nationwide.

zen_join = join
zen = station[["station_cd", "station_name", "line_cd", "lon", "lat"]]
zen = pd.merge(zen, line, on='line_cd')
zen = zen[["station_cd", "station_name", "line_cd", "lon_x", "lat_x", "line_name"]]
lon = zen["lon_x"]
lat = zen["lat_x"]
zen["lon"] = lon
zen["lat"] = lat
zen = zen[["station_cd", "station_name", "line_cd", "lon", "lat", "line_name"]]
cd1_lat = []
cd1_lon = []
cd2_lat = []
cd2_lon = []
dist = []
not_exist = []
for i, j in zip(zen_join["station_cd1"], zen_join["station_cd2"]):
  flag = True
  for k, l, m in zip(zen["station_cd"], zen["lat"], zen["lon"]):
    if i == k:
      cd1_x = l
      cd1_y = m
      cd1_lat.append(l)
      cd1_lon.append(m)
      flag = False
    if j == k:
      cd2_x = l
      cd2_y = m
      cd2_lat.append(l)
      cd2_lon.append(m)
  if flag:
    not_exist.append([i, j])
    #print(i, j)
  else:
    dist.append(((cd1_x-cd2_x)**2 + (cd1_y-cd2_y)**2)**0.5)
#Deleted because it does not exist in the station table and is unnecessary from the join table
for i in range(len(not_exist)):
  zen_join = zen_join[zen_join["station_cd1"] != not_exist[i][0]]# & zen_join["station_cd2"] != not_exist[i][1]]
zen_join["cd1_lat"] = cd1_lat
zen_join["cd1_lon"] = cd1_lon
zen_join["cd2_lat"] = cd2_lat
zen_join["cd2_lon"] = cd2_lon
zen_join["distance"] = dist

#nodes is station_cd
G = nx.Graph()
G.add_nodes_from(zen["station_cd"])
pos={}
for i, j, k in zip(zen["station_cd"], zen["lon"], zen["lat"]):
  pos[i] = (j, k)
for i, j, m in zip(zen_join["station_cd1"], zen_join["station_cd2"], zen_join["distance"]):
  G.add_edge(i, j, weight=m)
for i, j, m in zip(zen["station_name"], zen["station_cd"], zen["lat"]):
  for k, l, n in zip(zen["station_name"], zen["station_cd"], zen["lat"]):
    if i == k and j != l and m == n:
      #print(i, j, k, l)
      G.add_edge(j, l, weight=0)
plt.figure(figsize=(10,10),dpi=200)
plt.title('All Japan', fontsize=20)
plt.axes().set_aspect('equal', 'datalim')
nx.draw_networkx(G, pos, alpha=0.0, with_labels=False)
nx.draw_networkx_edges(G, pos)
plt.show()

image.png

st_name = "Nemuro"
go_name = "Kagoshima"
for i, j in zip(zen["station_name"], zen["station_cd"]):
  if i == st_name:
    st = j
  if i == go_name:
    go = j
dij = nx.dijkstra_path(G, st, go)
out = []
for k in range(len(dij)):
  for i, j in zip(zen["station_name"], zen["station_cd"]):
    if j == dij[k]:
      out.append(i)
print(out)

#nodes is station_cd
plt.figure(figsize=(50,50),dpi=200)
G_root = nx.Graph()
G_root.add_nodes_from(dij)
pos_root = {}
for l in dij:
  for i, j, k in zip(zen["station_cd"], zen["lon"], zen["lat"]):
    if l == i:
      pos_root[l] = (j, k)
for i in range(len(dij)-1):
  G_root.add_edge(dij[i], dij[i+1])

plt.figure(figsize=(10,10),dpi=200)
plt.title('All Japan', fontsize=20)
plt.axes().set_aspect('equal', 'datalim')
nx.draw_networkx(G, pos, alpha=0.0, with_labels=False)
nx.draw_networkx_edges(G, pos, alpha=0.3)
nx.draw_networkx(G_root, pos_root, alpha=0.0, with_labels=False)
nx.draw_networkx_edges(G_root, pos_root, alpha=1.0, edge_color='r')
plt.show()

image.png

Complete. It's unrealistic to actually make a connection, and unless you're a big train lover, it's hell. It's done. Thanks to the Hokkaido Shinkansen, you can reach Honshu through the Seikan Tunnel. Speaking of the shortest time, it is natural to transfer to the Tohoku Shinkansen and Tokaido Shinkansen, but if the route is the shortest, it seems that you will go to the Sea of Japan side by conventional line without using the Tohoku Shinkansen.

Summary

The side information this time was the distance, but if you use it as the time, you can make something like the shortest time for transfer guidance. I would like to try it if I can create time data. To be continued...

Referenced site

Visualize railway line data as a graph with Cytoscape 1 https://qiita.com/keiono/items/29286f49b15a5b13c987 Please refer to here for the processing of station data.

Recommended Posts

Visualize railway line data and solve the shortest path problem (Python + Pandas + NetworkX)
Try to solve the shortest path with Python + NetworkX + social data
Solve the spiral book (algorithm and data structure) with python!
Solve the Python knapsack problem with a branch and bound method
Solve with Python [100 selected past questions that beginners and intermediates should solve] (056 --059 Shortest path problem: Dijkstra's algorithm)
Solve the maximum subarray problem in Python
Python --Read data from a numeric data file and find the multiple regression line.
Solve the Python asymmetric traveling salesman problem with a branch and bound method
Try to solve the Python class inheritance problem
Solve the knapsack problem using pyomo and glpk
Interactively visualize data with TreasureData, Pandas and Jupyter.
[Introduction to Algorithm] Find the shortest path [Python3]
Find the shortest path with the Python Dijkstra's algorithm
Solving the shortest path problem (VRP) Mixed integer programming
Try to solve the internship assignment problem with Python
Visualize railway line data as a graph with Cytoscape 2
Visualize the range of interpolation and extrapolation with python
Visualize data and understand correlation at the same time
I tried to solve the problem with Python Vol.1
Visualize plant activity from space using satellite data and Python
[Understand in the shortest time] Python basics for data analysis
Build a Python environment and transfer data to the server
Graph time series data in Python using pandas and matplotlib