[PYTHON] Output web scraped information by voice with OpenJtalk

This time, we will ask Raspberry Pi to talk about the stock price obtained by web scraping using OpenJtalk.

What you can do with this article

  1. Web scraping
  2. Speak with OpenJtalk installed on Raspberry Pi with the acquired information

Premise

・ Python 3 and OpenJtalk can be used on Raspberry Pi (Installation of OpenJtalk is explained in this article, so if you haven't done so already!)

Operating environment

・ Raspberry Pi3 model B ・ OS: Raspbian ・ Python ver3.7

1. Get stock price by web scraping

The code is based on the following article. Web scraping with Python3: https://qiita.com/Senple/items/724e36fc1f66f5b14231 Get stock price: https://qiita.com/Azunyan1111/items/9b3d16428d2bcc7c9406

For the time being, web scraping may come into contact with the law if you make a mistake, so if you are new to web scraping, we recommend that you read this article. ..

Let's write the code.

This is a Python 3 series It is a code to get the stock price from Nikkei newspaper site and print it. For the meaning of the code, please refer to this article.

webscraping_test.py


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import urllib3 #this is installed as default
from bs4 import BeautifulSoup
import certifi

#URL
url = "https://www.nikkei.com/markets/kabu/"

#Html to access URL is returned →<html><head><title>Economic, stock, business and political news:Nikkei electronic version</title></head><body....
html = urllib3.PoolManager(
    cert_reqs='CERT_REQUIRED',
    ca_certs=certifi.where())
r = html.request('GET', url)

#Handle html with Beautiful Soup
soup = BeautifulSoup(r.data, "html.parser")

#Extract all span elements → All span elements are put back in the array →[<span class="m-wficon triDown"></span>, <span class="l-h...
#span element->Insert ordinary sentences like p and div elements,The difference is that there are no line breaks
spans = soup.find_all("span")
for tag in spans:
    #Elements for which class is not set are tags.get("class").pop(0)Avoid the error with try as it will result in an error because you cannot do
    try:
        #tag.get("class")Get all the classes in tag and return them in a list
        string_ = tag.get("class").pop(0)

        if string_ in "mkc-stock_prices":
            stockprice = tag.string
            break
    except:
        pass

print(stockprice)

2. Output audio with openJtalk

In order to output the scraped information immediately, you need to run openJtalk from Python. (Install OpenJtalk from here](https://qiita.com/coffiego/items/4fc3b0be78fcded3eef0)) So, there was a article running openJtalk from Python, so I will use it. Copy and paste this code into the same directory as before.

jtalk.py


#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import subprocess
from datetime import datetime

def jtalk(t):
    open_jtalk=['open_jtalk']
    mech=['-x','/var/lib/mecab/dic/open-jtalk/naist-jdic']
    htsvoice=['-m','/usr/share/hts-voice/mei/mei_normal.htsvoice']
    speed=['-r','1.0']
    outwav=['-ow','open_jtalk.wav']
    cmd=open_jtalk+mech+htsvoice+speed+outwav
    c = subprocess.Popen(cmd,stdin=subprocess.PIPE)
    c.stdin.write(t.encode())
    c.stdin.close()
    c.wait()
    aplay = ['aplay','-q','open_jtalk.wav']
    wr = subprocess.Popen(aplay)

def say_datetime():
    d = datetime.now()
    text = '%s month%s day,%s time%s minutes%s seconds' % (d.month, d.day, d.hour, d.minute, d.second)
    jtalk(text)

if __name__ == '__main__':
    say_datetime()

After that, import this code in the code to webscaraping You can output voice by using jtalk.jtalk ("what you want to say")!

3. Web scraping + openJtalk

Finally combine. That said, just add a few lines, but ... Here is the resulting code.

webscraping_jtalk_test.py


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#python3
#The one who scraped the stock price is output by voice with openjtalk.
#web scraping
#This time with Python3! !!

##web scraping library####
import urllib3 #this is installed as default
from bs4 import BeautifulSoup
import certifi

### jtalk import
import jtalk #I'm importing here! !!

#URL
url = "https://www.nikkei.com/markets/kabu/"

#Html to access URL is returned →<html><head><title>Economic, stock, business and political news:Nikkei electronic version</title></head><body....
html = urllib3.PoolManager(
    cert_reqs='CERT_REQUIRED',
    ca_certs=certifi.where())
r = html.request('GET', url)

#Handle html with Beautiful Soup
soup = BeautifulSoup(r.data, "html.parser")

#Extract all span elements → All span elements are put back in the array →[<span class="m-wficon triDown"></span>, <span class="l-h...
#span element->Insert ordinary sentences like p and div elements,The difference is that there are no line breaks
spans = soup.find_all("span")
for tag in spans:
    #Elements for which class is not set are tags.get("class").pop(0)Avoid the error with try as it will result in an error because you cannot do
    try:
        #tag.get("class")Get all the classes in tag and return them in a list
        string_ = tag.get("class").pop(0)

        if string_ in "mkc-stock_prices":
            stockprice = tag.string
            break
    except:
        pass

jtalk.jtalk(str(stockprice)) #Output audio with open j talk
#print(stockprice)

In the directory jtalk.py webscraping_jtalk_test.py If you confirm that there is, you can do the following and it will tell you.

$ ls
jtalk.py webscraping_jtalk_test.py
$ python3 webscraping_jtalk_test.py

I told you! But you just read the numbers one by one. Lol For the time being, I think that it can be applied as much as possible.

Summary

This time, I tried web scraping + OpenJtalk! With this, you can also utter the information obtained by the API in the same way. Write a program so that it works at a fixed time in the morning and let it read out the weather and news that you care about, Then! [Addition] I wrote a program to talk about the weather in the following article, so if you are interested, please do! URL: https://qiita.com/coffiego/items/ec050e6106a7424c048b

Recommended Posts

Output web scraped information by voice with OpenJtalk
Get property information by scraping with python
Get boat race match information by web scraping
[Python-pptx] Output PowerPoint font information to csv with python