Automation of a research on geographical information such as store network using Python and Web API

Something / What

We have automated the analysis of store openings using Python and Web API (Google geocode, Geographical Survey Institute). The following three points have been automated. ① Acquisition of store address ② Obtain longitude and latitude from the address ③ Obtaining the distance from longitude and latitude

This allows you, for example, to: Example 1) Survey of redundancy of own store. List the stores within a radius of 3km when viewed from a certain store. Example 2) Survey of distance from competing stores. List of other stores within a radius of 3km when viewed from one store.

For more conciseness, check out this repository. https://github.com/taiga518/automation-geo

Automated a research on geographical information such as store network using python and Web API. Automation includes the following three parts.

  1. Collecting address information
  2. Converting address information into latitude and longitude
  3. Converting longitude and latitude information into distance

You can also check my git repository: https://github.com/taiga518/automation-geo

What was the issue

In a straightforward analysis of the store network, it was necessary to manually perform steps ① to ③ above. ① Copy the address of each store from the homepage. At this time, mistakes and oversights can occur. (2) (3) I don't think you want the latitude and longitude itself, but if you want to know the distance, you had to manually enter the address in google maps etc. and measure the distance between the points. This will take some time. For example, if you want to find out all the distances between two stores of 100 stores, you have to check more than 5000 ways (!), Which is not realistic.

The most simple way to tackle with these task might be doing them manually. However, it would take unrealistically long time if number of stores you would like to check is high (eg. to check distance of all the pair of two stores from in total 100 stores, you need to check distance for more than 5000 times)

What I did / What I did

As mentioned above, three programs solved the problem. As I mentioned earlier, I solved this issue with three programs.

① Acquisition of store address / Collecting address information

This program is a simple example of a program that automatically obtains an address by web scraping from the store list on the homepage.

This program is a simple example of web scraping to get address information of shops automatically.

Code (Click to open)

get_address(example).py


"""" 
Just an example of simple web scraping.
This is an example of acquiring a store address by web scraping.
"""

import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np


"""
Rondom company with many shops is set as an example.
An example of a company with many stores is Hoken no Madoguchi. It's basically the same for other companies.
Homepage here : https://www.hokennomadoguchi.com/shop-list/
"""

urlName = "https://www.hokennomadoguchi.com/shop-list/"
url = requests.get(urlName)
soup = BeautifulSoup(url.content, "html.parser")


shop_name_list = []
address_list = []
# select finds class and id.Find the class or id with select.
for elem in soup.select(".shop-link"):
    # find finds first tag appears.find will look for the corresponding tag that appears first.
    # get finds attributes.get gets an attribute.
    shop_url = "https://www.hokennomadoguchi.com" + elem.find("a").get("href")
    # contents breakdowns tags.contents breaks down the tag and allows you to select parts or attributes of the tag.
    shop_name = elem.find("a").select(".shop_name")[0].contents[0]
    print("--- "+ shop_name +" ---")
    url = requests.get(shop_url)
    soup_shop = BeautifulSoup(url.content, "html.parser")
    address = soup_shop.select(".item2")[0].contents[2].replace(" ","").replace("\u3000","").replace("\r","").replace("\n","")
    print(address)

    shop_name_list.append(shop_name)
    address_list.append(address)
    
    
df_madoguchi = pd.DataFrame([shop_name_list,address_list]).T
df_madoguchi.columns=["shop_name", "address"]

df_madoguchi.to_csv("madoguchi_address.csv", encoding='utf_8_sig')

② Converting address information into latitude and longitude This program uses the google geocode API to automatically obtain latitude and longitude from an address (character string). This program is to get longitude and latitude from address(string) using google geocode API.

Code (Click to open)

get_lnglat_geocode.py


import json
import pandas as pd
import requests
import time

API_key = "XXXXX"
"""    
Please note that you will need to obtain the API key yourself.
I don't think you will be charged for your personal frequency of use, but please check the price list carefully.
API key needs to be set to use google geocoding API. Follow the guidance here :
https://developers.google.com/maps/documentation/geocoding/overview
"""


def start_end_decolator(input_function):
    """Decolator to print start and end"""
    def return_function(*args, **kwargs):
        print("\n--------------start--------------")
        result = input_function(*args, **kwargs)
        print("\n---------------end---------------")
        return result
    return return_function

def progress_decolator(input_function):
    """Decolator to print * to show progress"""
    def return_function(*args, **kwargs):
        print("*", end="")
        result = input_function(*args, **kwargs)
        return result
    return return_function

@progress_decolator
def get_location(address):
    """ 
Use Google's geocoding to get the longitude and latitude from the address.
    
    Get latitude and longitude using google geocoding API.
    API key needs to be set to use google geocodin API. Follow the guidance here : 
    https://developers.google.com/maps/documentation/geocoding/overview
    Check billing here: https://console.cloud.google.com/google/maps-apis/overview
    
    Input : address as text
        eg) "4-2-8 Shibakoen, Minato-ku, Tokyo"
        
    Output : tupple of address(text), latitude(float), longitude(float)
        eg) ('4-chōme-2-8 Shibakōen, Minato City, Tōkyō-to 105-0011, Japan', 35.6585769, 139.7454506)
    """
    url = "https://maps.googleapis.com/maps/api/geocode/json?address=+"+address+"&key="+API_key
    result = requests.get(url)
    result_json = json.loads(result.text)
    formatted_address = result_json["results"][0]["formatted_address"]
    lat, lng = result_json["results"][0]["geometry"]["location"].values()
    return (formatted_address, lat, lng)


@start_end_decolator    
def add_location_info(input_df):
    """
Get above from a list of multiple addresses_Use the location function to get a list of longitudes and latitudes.
    
    Get lists of location information using get_location function.
    
    Input : dataframe with address information named address
    Output : dataframe with formatted_address, latitute, longitude columns
    """    
    
    formatted_address_list = []
    lat_list = []
    lng_list = []

    for i_row in range(len(input_df)):
        formatted_address, lat, lng = get_location(input_df.loc[i_row,"address"])
        formatted_address_list.append(formatted_address)
        lat_list.append(lat)
        lng_list.append(lng)
    
    output_df = input_df
    output_df["formatted_address"] = formatted_address_list
    output_df["latitude"] = lat_list
    output_df["longitude"] = lng_list
    return output_df



### main here

df = pd.read_csv("PATH.csv")
df = df[["name","address"]]
df_loc = add_location_info(df)
df_loc.to_csv("output.csv", encoding='utf_8_sig')


③ Converting longitude and latitude information into distance This program automatically calculates the distance between two points expressed in latitude and longitude. This program is to calculate distance between two points described with longitude and latitude.

Code (Click to open)

get_distance.py


import json
import pandas as pd
import requests
import time


def progress_decolator(input_function):
    """Decolator to print * to show progress"""
    def return_function(*args, **kwargs):
        print("*", end="")
        result = input_function(*args, **kwargs)
        return result
    return return_function


def get_distance_API(lat1, lng1, lat2, lng2):
    """ Get distance between two points described with latitute and longitude.
    Details of the API can be found here: https://vldb.gsi.go.jp/sokuchi/surveycalc/api_help.html
    Validate the result using this web app : https://vldb.gsi.go.jp/sokuchi/surveycalc/surveycalc/bl2stf.html
    Input : latitute and longitude of two points (float)
        eg) 35.6585769, 139.7454506, 35.710256, 139.8107946
    Output : distance of input two points with kilo meter unit (float)
        eg) 8.237
    """
    url = "http://vldb.gsi.go.jp/sokuchi/surveycalc/surveycalc/bl2st_calc.pl?latitude1={}&longitude1={}&latitude2={}&longitude2={}&ellipsoid=bessel&outputType=json".format(lat1,lng1,lat2,lng2)
    i_count = 0
    while i_count <= 10:
        result = requests.get(url)
        status_code = result.status_code
        if status_code == 200:
            break
        i_count += 1    
        time.sleep(2)
        print("retry : {}".format(i_count+1),end="")
        
    result_json = json.loads(result.text)
    distance = "0" + result_json["OutputData"]["geoLength"] 
    if distance == "0":
        print("error here")
        print(url)
        print(result)
        print(result_json)
    return round(float(distance)/1000, 3)

def get_distance_locally(lat_a, lon_a,lat_b, lon_b):
    """
    Credit : https://qiita.com/damyarou/items/9cb633e844c78307134a
    """
    ra=6378.140  # equatorial radius (km)
    rb=6356.755  # polar radius (km)
    F=(ra-rb)/ra # flattening of the earth
    rad_lat_a=np.radians(lat_a)
    rad_lon_a=np.radians(lon_a)
    rad_lat_b=np.radians(lat_b)
    rad_lon_b=np.radians(lon_b)
    pa=np.arctan(rb/ra*np.tan(rad_lat_a))
    pb=np.arctan(rb/ra*np.tan(rad_lat_b))
    xx=np.arccos(np.sin(pa)*np.sin(pb)+np.cos(pa)*np.cos(pb)*np.cos(rad_lon_a-rad_lon_b))
    c1=(np.sin(xx)-xx)*(np.sin(pa)+np.sin(pb))**2/np.cos(xx/2)**2
    c2=(np.sin(xx)+xx)*(np.sin(pa)-np.sin(pb))**2/np.sin(xx/2)**2
    dr=F/8*(c1-c2)
    rho=ra*(xx+dr)
    return rho

@progress_decolator
def get_distance(lat1, lng1, lat2, lng2, method=0):
    if method == 0:
        return_distance = get_distance_API(lat1, lng1, lat2, lng2)
    else: 
        return_distance = get_distance_locally(lat1, lng1, lat2, lng2)
    return return_distance 


def create_matrix(n_row, n_col):
    """Create matrix filled with nan in decided size
    
    Input : n_row(int), n_col(int)
    Output : dataframe
    """
    
    matrix = pd.DataFrame(index=range(n_row), columns=range(n_col))
    return matrix



# main here

df1 = pd.read_csv("PATH1.csv")
df2 = pd.read_csv("PATH2.csv")

matrix = create_matrix(len(df1), len(df2))

for i in range(len(df1)):
    for j in range(len(df2)):
        distance = get_distance(df1.loc[i,"latitude"], 
                                df1.loc[i,"longitude"], 
                                df2.loc[j, "latitude"], 
                                df2.loc[j, "longitude"],
                               method = 0)
        if distance == 0:
            # if distance equal 0, that is most probably wrong. check what is the problem.
            #If the distance is 0, there is often a problem. Therefore, please check.
            print(df1[i])
            print(df2[j])
        matrix.iloc[i,j] = distance
        
matrix.to_csv("output.csv", encoding='utf_8_sig')

# if you want to decolate output with headings, run the followings
#Below is the addition of the header. Please execute it arbitrarily.

col_expanded = pd.concat([df1[["name","address"]],matrix], axis = "columns")
df_head = pd.DataFrame([[""]*2,[""]*2],columns=["name","address"])
df_head = pd.concat([df_head , df2[["name","address"]]], ignore_index=True).T.reset_index(drop=True)
df_head.columns = col_expanded.columns
df_head.index = ["name", "address"]
df_expanded = pd.concat([df_head, col_expanded])
df_expanded.to_csv("output_with_header.csv", encoding='utf_8_sig')


At the end / Closing

Please let me know if you have any job change information. Hire me. https://www.linkedin.com/in/taigakubota/

Recommended Posts

Automation of a research on geographical information such as store network using Python and Web API
Development and deployment of REST API in Python using Falcon Web Framework
Create a web map using Python and GDAL
[Python3] Take a screenshot of a web page on the server and crop it further
Get a list of GA accounts, properties, and views as vertical data using API
[python] [Gracenote Web API] A little customization of pygn
Web scraping of comedy program information and notification on LINE
Building a Python environment on a Mac and using Jupyter lab
Set information such as length on the edge of NetworkX
I made a Chatbot using LINE Messaging API and Python
[Python] Visualize overseas Japanese soccer players on a map as of 2021.1.1
Build a game leaderboard on Alibaba cloud using Python and Redis
A little more about references ~ Using Python and Java as examples ~
Collect product information and process data using Rakuten product search API [Python]
[Python] I wrote a REST API using AWS API Gateway and Lambda.
I made a Chatbot using LINE Messaging API and Python (2) ~ Server ~
[Ruby on Rails] Display and pinning of GoolgeMAP using Google API
I did a lot of research on how Python is executed
[MS Azure] Slack notification of competition information using Azure Functions and Kaggle API
Try using SpatiaLite to store spatial information such as maps in SQLite
Hit a method of a class instance with the Python Bottle Web API
I want to make a web application using React and Python flask
Create a web application execution environment of Python3.4 + Nginx + uWSGI + Flask with haste using venv on Ubuntu 14.04 LTS
Suddenly I needed to work on a project using Python and Pyramid, so a note of how I'm studying
A memo with Python2.7 and Python3 on CentOS
Map rent information on a map with python
Connect a lot of Python or and and
See file and folder information on python
Study on Tokyo Rent Using Python (3-1 of 3)
Idempotent automation of Python and PyPI setup
[Python] Get product information such as ASIN and JAN with Amazon PA-API ver5.0
Let Python measure the average score of a page using the PageSpeed Insights API
[Raspberry Pi] Publish a web application on https using Apache + WSGI + Python Flask
Until you publish a web service on GCP while studying JQuery and Python