[PYTHON] Get the title of yahoo news and analyze sentiment

Get yahoo news content

Please take responsibility for the acquisition of the content.

from tqdm import tqdm
import urllib.parse
import time

jp_keyword = ''#Enter a search keyword
page_num=int()#()Enter the total number of pages in

#URL encode Japanese
keyword = urllib.parse.quote(jp_keyword)

#Title list
title_list=[]
#Post date list
date_list=[]
for i in tqdm(range(1,page_num-1)):
    
    url="https://news.yahoo.co.jp/search/?p="+keyword+"&st=n&ei=UTF-8&b="+str(i)+"1"
    print(url)
    res = requests.get(url)
    #Leave a gap so as not to overload the server
    time.sleep(2)
    #Create a BeautifulSoup object from the response HTML
    soup = BeautifulSoup(res.content, 'html.parser')

    #Get the title tag string
    title_text = soup.find_all('h2')
    for x in title_text:
        title_list.append(x.text)
    
    date_text=soup.find_all('span', class_="d")
    for x in date_text:
        date_list.append(x.text)
    

Create a function to perform sentiment analysis

You can find out about getting the API by reading the following site.

Quick Start: Natural Language API Settings (https://cloud.google.com/natural-language/docs/setup?hl=ja) Use API Keys (https://cloud.google.com/docs/authentication/api-keys?hl=ja)

How to use the API

key=""#Enter API key
#API URL
url = 'https://language.googleapis.com/v1/documents:analyzeSentiment?key=' + key

def sentimental(text):
    header = {'Content-Type': 'application/json'}
    body = {
        "document": {
            "type": "PLAIN_TEXT",
            "language": "JA",#Specify language
            "content": text
        },
        "encodingType": "UTF8"
    }

    #Receive the result in json format.
    response = requests.post(url, headers=header, json=body).json()
    #Return score
    return response["documentSentiment"]["score"]

Put the score in the score list

score_list=[]
for word in tqdm(wordlist):
    score_list.append(sentimental(word))
import pandas as pd
df = pd.DataFrame()
df["word"]=title_list
df["date"]=date_list
df["score"]=score_list

Save dataframe to pickle file

import pickle
with open('sentimental_df.pickle', 'wb') as web:
    pickle.dump(df , web)

Extract data

import pickle
with open('sentimental_df.pickle', 'rb') as web:
    df = pickle.load(web)
    print (df)

reference

Use API Keys (https://cloud.google.com/docs/authentication/api-keys?hl=ja) Sentiment analysis by hitting Google Natural Language API with Python Natural Language Improve development efficiency! How to use pickle in Python [for beginners] URL encoding / decoding in Python (urllib.parse.quote, unquote) Quick Start: Natural Language API Settings (https://cloud.google.com/natural-language/docs/setup?hl=ja)

Recommended Posts

Get the title of yahoo news and analyze sentiment
Get the title and delivery date of Yahoo! News in Python
[Python] Get the main topics of Yahoo News
[Django 2.2] Sort and get the value of the relation destination
Let's analyze the sentiment of Tweet using Chainer (1st)
Get the number of digits
I want to get the operation information of yahoo route
The story of Python and the story of NaN
Get the number of views of Qiita
Note: Get the first and last items of Python OrderedDict non-destructively
[Python] How to get the first and last days of the month
Get the attributes of an object
[Python] Get the update date of a news article from HTML
Get the first element of queryset
Get and set the value of the dropdown menu using Python and Selenium
Get the number of Youtube subscribers
I tried to get and analyze the statistical data of the new corona with Python: Data of Johns Hopkins University
Read the graph image with OpenCV and get the coordinates of the final point of the graph
I tried to verify and analyze the acceleration of Python by Cython
I want to analyze the emotions of people who want to meet and tremble
Get news from three major mobile companies using Django and the News API
Get the number of articles accessed and likes with Qiita API + Python
Get the song name from the title of the video you tried to sing
Get and estimate the shape of the head using Dlib and OpenCV with python
This and that of the inclusion notation.
Get the column list & data list of CASTable
Review the concept and terminology of regression
Get the minutes of the Diet via API
The story of trying deep3d and losing
Get the value of the middle layer of NN
Get the last day of the specified month
[Python] Get the character code of the file
Get the filename of a directory (glob)
Get the address from latitude and longitude
[PowerShell] Get the reading of the character string
[Python] I made a web scraping code that automatically acquires the news title and URL of Nikkei Inc.
Get the stock price of a Japanese company with Python and make a graph
Get the last element of the array by splitting the string in Python and PHP
Get the contents of git diff from python
About the behavior of copy, deepcopy and numpy.copy
Summary of the differences between PHP and Python
Full understanding of the concepts of Bellman-Ford and Dijkstra
[Python] Get / edit the scale label of the figure
Get the caller of a function in Python
The answer of "1/2" is different between python2 and 3
Organize the meaning of methods, classes and objects
Specifying the range of ruby and python arrays
Change the color of Fabric errors and warnings
Get the X Window System window title in Python
Get the module itself and add members dynamically
Compare the speed of Python append and map
[Python] Get the last updated date of the website
The story of making a lie news generator
Get only the address part of NIC (eth0)
General description of the CPUFreq core and CPUFreq notifiers
To get the path of the currently running python.exe
Organize the super-basic usage of Autotools and pkg-config
I read and implemented the Variants of UKR
About the * (asterisk) argument of python (and itertools.starmap)
A discussion of the strengths and weaknesses of Python
[Python] Get the day of the week (English & Japanese)