[PYTHON] Scraping the SBI SECURITIES portfolio page

Introduction

I used to use myTrade, but since support ended on 1/9, I searched for various apps, but there was no suitable app, so I scraped the SBI portfolio page myself and Google spread. I decided to manage the data with a sheet. Therefore, this page introduces the following two programs.

Scraping the SBI SECURITIES portfolio page
Write the scraped data to Google spread sheet ↓ Write like this.

environment

OS: Mac
language: python 3.7
thrid party
- selenium
- ChromeDriver
- beautiful soap
- pandas
- Google Chrome

procedure

Environment

Assumption:

python is installed
pip is installed

Install the required modules

--Install the required modules with pip

pip install selenium
pip install pandas lxml html5lib BeautifulSoup4

Install ChromeDriver

Download the Chrome Driver for your Google Chrome version and place it in your PATH. (Reference: PATH on Mac) Match the version of Chrome Driver you download with the version of Google Chrome you are using. If there is no match, the closest one.

Google settings The page below is wonderful, so I'll do exactly that https://tanuhack.com/operate-spreadsheet/#Google_Cloud_Platform

code

1.import

import time
import datetime
import gspread
import json
import pandas
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
from oauth2client.service_account import ServiceAccountCredentials

The part to be scraped

class Result():
    def __init__(self, fund, amount):
        self.fund = fund
        self.amount = amount

def convert_to_list(data_frame, custody):
    data_frame = data_frame.iloc[:, [1, 10]]
    data_frame.drop([0], inplace=True)
    data_frame.columns = ['funds', 'amount']

    results = []
    row_num = data_frame.shape[0]
    for i in range(row_num):
        index = i + 1
        fund = data_frame.at[index, 'funds']
        amount = data_frame.at[index, 'amount']

        results.append(Result(custody + ':' + fund, amount))

    return results

def get_stocks():
    options = Options()
    #Headless mode(Mode that does not display chrome)
    options.add_argument('--headless')
    #Create a Chrome WebDriver object
    driver = webdriver.Chrome(options=options)

    #Open the top screen of SBI SECURITIES
    driver.get('https://www.sbisec.co.jp/ETGate')

    #Set user ID and password
    input_user_id = driver.find_element_by_name('user_id')
    input_user_id.send_keys('xxxx')

    input_user_password = driver.find_element_by_name('user_password')
    input_user_password.send_keys('yyyy')

    #Click the login button to log in
    #Loading the body seems to be asynchronous, so sleep a little
    driver.find_element_by_name('ACT_login').click()
    time.sleep(5)
    driver.find_element_by_link_text('portfolio').click()

    #UTF character code-Convert to 8
    html = driver.page_source #.encode('utf-8')

    #Perth with Beautiful Soup
    soup = BeautifulSoup(html, "html.parser")

    table = soup.find_all("table", border="0", cellspacing="1", cellpadding="4", bgcolor="#9fbf99", width="100%")
    df_stocks = pandas.read_html(str(table))[0]
    stocks = convert_to_list(df_stocks, 'specific')

    df_nisa = pandas.read_html(str(table))[1]
    nisa = convert_to_list(df_nisa, 'NISA')
    
    result = []
    for s in stocks:
        result.append(s)
    
    for n in nisa:
        result.append(n)

    driver.quit()
    return result

Write to spread sheet

def write(stocks):
    scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']

    #Credential settings
    #Set the downloaded json file name in the credential variable (put the private key in a position that is easy to read from the Python file)
    credentials = ServiceAccountCredentials.from_json_keyfile_name('zzzzz.json', scope)

    #Log in to the Google API using your OAuth2 credentials.
    gc = gspread.authorize(credentials)

    #Variables for shared spreadsheet keys[SPREADSHEET_KEY]Store in.
    SPREADSHEET_KEY = 'hogehoge'

    #Open Sheet 1 of a shared spreadsheet
    worksheet = gc.open_by_key(SPREADSHEET_KEY).sheet1
    headers = worksheet.row_values(1)
    dates = worksheet.col_values(1)
    new_row_num = len(dates) + 1

    worksheet.update_cell(new_row_num, 1, datetime.datetime.today().strftime('%Y/%m/%d'))
    for stock in stocks:
        for i in range(len(headers)):
            if headers[i] == stock.fund:
                worksheet.update_cell(new_row_num, i + 1, stock.amount)

Combine the pages above

def main():
    #Get data to scrape and write portfolio
    stocks = get_stocks()
    #Write the retrieved data to a spreadsheet
    write(stocks)
if __name__ == "__main__":
    main()

Respect page:

SBI SECURITIES Scraping: https://hato.yokohama/scraping_sbi_investment/ python + selenium: https://qiita.com/memakura/items/20a02161fa7e18d8a693
Summary of scraping basics with Beautiful Soup: https://qiita.com/U-MA/items/896c49d46585e32ff7b1
pandas:
- https://shinyorke.hatenablog.com/entry/nyumon-pandas
- https://pandas.pydata.org/pandas-docs/stable/index.html
[Don't hesitate anymore] Initial settings for reading and writing spreadsheets with Python Summary: https://tanuhack.com/operate-spreadsheet/#Google_Cloud_Platform
Initial settings for reading and writing spreadsheets with Python Summary: https://tanuhack.com/library-gspread/
gspread official: https://gspread.readthedocs.io/en/latest/index.html