[PYTHON] Scraping the SBI SECURITIES portfolio page

Introduction

I used to use myTrade, but since support ended on 1/9, I searched for various apps, but there was no suitable app, so I scraped the SBI portfolio page myself and Google spread. I decided to manage the data with a sheet. Therefore, this page introduces the following two programs.

  1. Scraping the SBI SECURITIES portfolio page
  2. Write the scraped data to Google spread sheet ↓ Write like this. スクリーンショット 2019-12-29 23.22.53.png

environment

procedure

Environment

Assumption:

  1. Install the required modules

--Install the required modules with pip

pip install selenium
pip install pandas lxml html5lib BeautifulSoup4

Download the Chrome Driver for your Google Chrome version and place it in your PATH. (Reference: PATH on Mac) Match the version of Chrome Driver you download with the version of Google Chrome you are using. If there is no match, the closest one.

  1. Google settings The page below is wonderful, so I'll do exactly that https://tanuhack.com/operate-spreadsheet/#Google_Cloud_Platform

code

1.import

import time
import datetime
import gspread
import json
import pandas
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
from oauth2client.service_account import ServiceAccountCredentials
  1. The part to be scraped
class Result():
    def __init__(self, fund, amount):
        self.fund = fund
        self.amount = amount

def convert_to_list(data_frame, custody):
    data_frame = data_frame.iloc[:, [1, 10]]
    data_frame.drop([0], inplace=True)
    data_frame.columns = ['funds', 'amount']

    results = []
    row_num = data_frame.shape[0]
    for i in range(row_num):
        index = i + 1
        fund = data_frame.at[index, 'funds']
        amount = data_frame.at[index, 'amount']

        results.append(Result(custody + ':' + fund, amount))

    return results

def get_stocks():
    options = Options()
    #Headless mode(Mode that does not display chrome)
    options.add_argument('--headless')
    #Create a Chrome WebDriver object
    driver = webdriver.Chrome(options=options)

    #Open the top screen of SBI SECURITIES
    driver.get('https://www.sbisec.co.jp/ETGate')

    #Set user ID and password
    input_user_id = driver.find_element_by_name('user_id')
    input_user_id.send_keys('xxxx')

    input_user_password = driver.find_element_by_name('user_password')
    input_user_password.send_keys('yyyy')

    #Click the login button to log in
    #Loading the body seems to be asynchronous, so sleep a little
    driver.find_element_by_name('ACT_login').click()
    time.sleep(5)
    driver.find_element_by_link_text('portfolio').click()

    #UTF character code-Convert to 8
    html = driver.page_source #.encode('utf-8')

    #Perth with Beautiful Soup
    soup = BeautifulSoup(html, "html.parser")

    table = soup.find_all("table", border="0", cellspacing="1", cellpadding="4", bgcolor="#9fbf99", width="100%")
    df_stocks = pandas.read_html(str(table))[0]
    stocks = convert_to_list(df_stocks, 'specific')

    df_nisa = pandas.read_html(str(table))[1]
    nisa = convert_to_list(df_nisa, 'NISA')
    
    result = []
    for s in stocks:
        result.append(s)
    
    for n in nisa:
        result.append(n)

    driver.quit()
    return result
  1. Write to spread sheet
def write(stocks):
    scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']

    #Credential settings
    #Set the downloaded json file name in the credential variable (put the private key in a position that is easy to read from the Python file)
    credentials = ServiceAccountCredentials.from_json_keyfile_name('zzzzz.json', scope)

    #Log in to the Google API using your OAuth2 credentials.
    gc = gspread.authorize(credentials)

    #Variables for shared spreadsheet keys[SPREADSHEET_KEY]Store in.
    SPREADSHEET_KEY = 'hogehoge'

    #Open Sheet 1 of a shared spreadsheet
    worksheet = gc.open_by_key(SPREADSHEET_KEY).sheet1
    headers = worksheet.row_values(1)
    dates = worksheet.col_values(1)
    new_row_num = len(dates) + 1

    worksheet.update_cell(new_row_num, 1, datetime.datetime.today().strftime('%Y/%m/%d'))
    for stock in stocks:
        for i in range(len(headers)):
            if headers[i] == stock.fund:
                worksheet.update_cell(new_row_num, i + 1, stock.amount)
  1. Combine the pages above
def main():
    #Get data to scrape and write portfolio
    stocks = get_stocks()
    #Write the retrieved data to a spreadsheet
    write(stocks)
if __name__ == "__main__":
    main()

Respect page:

Recommended Posts

Scraping the SBI SECURITIES portfolio page
Scraping the result of "Schedule-kun"
scraping the Nikkei 225 with playwright-python
I-town page scraping with selenium