We have automated the analysis of store openings using Python and Web API (Google geocode, Geographical Survey Institute). The following three points have been automated. ① Acquisition of store address ② Obtain longitude and latitude from the address ③ Obtaining the distance from longitude and latitude
This allows you, for example, to: Example 1) Survey of redundancy of own store. List the stores within a radius of 3km when viewed from a certain store. Example 2) Survey of distance from competing stores. List of other stores within a radius of 3km when viewed from one store.
For more conciseness, check out this repository. https://github.com/taiga518/automation-geo
Automated a research on geographical information such as store network using python and Web API. Automation includes the following three parts.
You can also check my git repository: https://github.com/taiga518/automation-geo
In a straightforward analysis of the store network, it was necessary to manually perform steps ① to ③ above. ① Copy the address of each store from the homepage. At this time, mistakes and oversights can occur. (2) (3) I don't think you want the latitude and longitude itself, but if you want to know the distance, you had to manually enter the address in google maps etc. and measure the distance between the points. This will take some time. For example, if you want to find out all the distances between two stores of 100 stores, you have to check more than 5000 ways (!), Which is not realistic.
The most simple way to tackle with these task might be doing them manually. However, it would take unrealistically long time if number of stores you would like to check is high (eg. to check distance of all the pair of two stores from in total 100 stores, you need to check distance for more than 5000 times)
As mentioned above, three programs solved the problem. As I mentioned earlier, I solved this issue with three programs.
① Acquisition of store address / Collecting address information
This program is a simple example of a program that automatically obtains an address by web scraping from the store list on the homepage.
This program is a simple example of web scraping to get address information of shops automatically.
get_address(example).py
""""
Just an example of simple web scraping.
This is an example of acquiring a store address by web scraping.
"""
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
"""
Rondom company with many shops is set as an example.
An example of a company with many stores is Hoken no Madoguchi. It's basically the same for other companies.
Homepage here : https://www.hokennomadoguchi.com/shop-list/
"""
urlName = "https://www.hokennomadoguchi.com/shop-list/"
url = requests.get(urlName)
soup = BeautifulSoup(url.content, "html.parser")
shop_name_list = []
address_list = []
# select finds class and id.Find the class or id with select.
for elem in soup.select(".shop-link"):
# find finds first tag appears.find will look for the corresponding tag that appears first.
# get finds attributes.get gets an attribute.
shop_url = "https://www.hokennomadoguchi.com" + elem.find("a").get("href")
# contents breakdowns tags.contents breaks down the tag and allows you to select parts or attributes of the tag.
shop_name = elem.find("a").select(".shop_name")[0].contents[0]
print("--- "+ shop_name +" ---")
url = requests.get(shop_url)
soup_shop = BeautifulSoup(url.content, "html.parser")
address = soup_shop.select(".item2")[0].contents[2].replace(" ","").replace("\u3000","").replace("\r","").replace("\n","")
print(address)
shop_name_list.append(shop_name)
address_list.append(address)
df_madoguchi = pd.DataFrame([shop_name_list,address_list]).T
df_madoguchi.columns=["shop_name", "address"]
df_madoguchi.to_csv("madoguchi_address.csv", encoding='utf_8_sig')
② Converting address information into latitude and longitude This program uses the google geocode API to automatically obtain latitude and longitude from an address (character string). This program is to get longitude and latitude from address(string) using google geocode API.
get_lnglat_geocode.py
import json
import pandas as pd
import requests
import time
API_key = "XXXXX"
"""
Please note that you will need to obtain the API key yourself.
I don't think you will be charged for your personal frequency of use, but please check the price list carefully.
API key needs to be set to use google geocoding API. Follow the guidance here :
https://developers.google.com/maps/documentation/geocoding/overview
"""
def start_end_decolator(input_function):
"""Decolator to print start and end"""
def return_function(*args, **kwargs):
print("\n--------------start--------------")
result = input_function(*args, **kwargs)
print("\n---------------end---------------")
return result
return return_function
def progress_decolator(input_function):
"""Decolator to print * to show progress"""
def return_function(*args, **kwargs):
print("*", end="")
result = input_function(*args, **kwargs)
return result
return return_function
@progress_decolator
def get_location(address):
"""
Use Google's geocoding to get the longitude and latitude from the address.
Get latitude and longitude using google geocoding API.
API key needs to be set to use google geocodin API. Follow the guidance here :
https://developers.google.com/maps/documentation/geocoding/overview
Check billing here: https://console.cloud.google.com/google/maps-apis/overview
Input : address as text
eg) "4-2-8 Shibakoen, Minato-ku, Tokyo"
Output : tupple of address(text), latitude(float), longitude(float)
eg) ('4-chōme-2-8 Shibakōen, Minato City, Tōkyō-to 105-0011, Japan', 35.6585769, 139.7454506)
"""
url = "https://maps.googleapis.com/maps/api/geocode/json?address=+"+address+"&key="+API_key
result = requests.get(url)
result_json = json.loads(result.text)
formatted_address = result_json["results"][0]["formatted_address"]
lat, lng = result_json["results"][0]["geometry"]["location"].values()
return (formatted_address, lat, lng)
@start_end_decolator
def add_location_info(input_df):
"""
Get above from a list of multiple addresses_Use the location function to get a list of longitudes and latitudes.
Get lists of location information using get_location function.
Input : dataframe with address information named address
Output : dataframe with formatted_address, latitute, longitude columns
"""
formatted_address_list = []
lat_list = []
lng_list = []
for i_row in range(len(input_df)):
formatted_address, lat, lng = get_location(input_df.loc[i_row,"address"])
formatted_address_list.append(formatted_address)
lat_list.append(lat)
lng_list.append(lng)
output_df = input_df
output_df["formatted_address"] = formatted_address_list
output_df["latitude"] = lat_list
output_df["longitude"] = lng_list
return output_df
### main here
df = pd.read_csv("PATH.csv")
df = df[["name","address"]]
df_loc = add_location_info(df)
df_loc.to_csv("output.csv", encoding='utf_8_sig')
③ Converting longitude and latitude information into distance This program automatically calculates the distance between two points expressed in latitude and longitude. This program is to calculate distance between two points described with longitude and latitude.
get_distance.py
import json
import pandas as pd
import requests
import time
def progress_decolator(input_function):
"""Decolator to print * to show progress"""
def return_function(*args, **kwargs):
print("*", end="")
result = input_function(*args, **kwargs)
return result
return return_function
def get_distance_API(lat1, lng1, lat2, lng2):
""" Get distance between two points described with latitute and longitude.
Details of the API can be found here: https://vldb.gsi.go.jp/sokuchi/surveycalc/api_help.html
Validate the result using this web app : https://vldb.gsi.go.jp/sokuchi/surveycalc/surveycalc/bl2stf.html
Input : latitute and longitude of two points (float)
eg) 35.6585769, 139.7454506, 35.710256, 139.8107946
Output : distance of input two points with kilo meter unit (float)
eg) 8.237
"""
url = "http://vldb.gsi.go.jp/sokuchi/surveycalc/surveycalc/bl2st_calc.pl?latitude1={}&longitude1={}&latitude2={}&longitude2={}&ellipsoid=bessel&outputType=json".format(lat1,lng1,lat2,lng2)
i_count = 0
while i_count <= 10:
result = requests.get(url)
status_code = result.status_code
if status_code == 200:
break
i_count += 1
time.sleep(2)
print("retry : {}".format(i_count+1),end="")
result_json = json.loads(result.text)
distance = "0" + result_json["OutputData"]["geoLength"]
if distance == "0":
print("error here")
print(url)
print(result)
print(result_json)
return round(float(distance)/1000, 3)
def get_distance_locally(lat_a, lon_a,lat_b, lon_b):
"""
Credit : https://qiita.com/damyarou/items/9cb633e844c78307134a
"""
ra=6378.140 # equatorial radius (km)
rb=6356.755 # polar radius (km)
F=(ra-rb)/ra # flattening of the earth
rad_lat_a=np.radians(lat_a)
rad_lon_a=np.radians(lon_a)
rad_lat_b=np.radians(lat_b)
rad_lon_b=np.radians(lon_b)
pa=np.arctan(rb/ra*np.tan(rad_lat_a))
pb=np.arctan(rb/ra*np.tan(rad_lat_b))
xx=np.arccos(np.sin(pa)*np.sin(pb)+np.cos(pa)*np.cos(pb)*np.cos(rad_lon_a-rad_lon_b))
c1=(np.sin(xx)-xx)*(np.sin(pa)+np.sin(pb))**2/np.cos(xx/2)**2
c2=(np.sin(xx)+xx)*(np.sin(pa)-np.sin(pb))**2/np.sin(xx/2)**2
dr=F/8*(c1-c2)
rho=ra*(xx+dr)
return rho
@progress_decolator
def get_distance(lat1, lng1, lat2, lng2, method=0):
if method == 0:
return_distance = get_distance_API(lat1, lng1, lat2, lng2)
else:
return_distance = get_distance_locally(lat1, lng1, lat2, lng2)
return return_distance
def create_matrix(n_row, n_col):
"""Create matrix filled with nan in decided size
Input : n_row(int), n_col(int)
Output : dataframe
"""
matrix = pd.DataFrame(index=range(n_row), columns=range(n_col))
return matrix
# main here
df1 = pd.read_csv("PATH1.csv")
df2 = pd.read_csv("PATH2.csv")
matrix = create_matrix(len(df1), len(df2))
for i in range(len(df1)):
for j in range(len(df2)):
distance = get_distance(df1.loc[i,"latitude"],
df1.loc[i,"longitude"],
df2.loc[j, "latitude"],
df2.loc[j, "longitude"],
method = 0)
if distance == 0:
# if distance equal 0, that is most probably wrong. check what is the problem.
#If the distance is 0, there is often a problem. Therefore, please check.
print(df1[i])
print(df2[j])
matrix.iloc[i,j] = distance
matrix.to_csv("output.csv", encoding='utf_8_sig')
# if you want to decolate output with headings, run the followings
#Below is the addition of the header. Please execute it arbitrarily.
col_expanded = pd.concat([df1[["name","address"]],matrix], axis = "columns")
df_head = pd.DataFrame([[""]*2,[""]*2],columns=["name","address"])
df_head = pd.concat([df_head , df2[["name","address"]]], ignore_index=True).T.reset_index(drop=True)
df_head.columns = col_expanded.columns
df_head.index = ["name", "address"]
df_expanded = pd.concat([df_head, col_expanded])
df_expanded.to_csv("output_with_header.csv", encoding='utf_8_sig')
Please let me know if you have any job change information. Hire me. https://www.linkedin.com/in/taigakubota/
Recommended Posts