[PYTHON] Visualize app reviews using wordcloud

When downloading an app, many people first refer to the reviews posted on the app. But there are so many reviews that it's difficult to see them all. So, this time, I would like to use the expression ** wordcloud ** to visualize the app reviews so that you can see them at a glance.

What is word cloud

By expressing the words that appear in a document in various sizes and colors, the features of the document are visualized in a single image.

Collection of review data

App reviews on the App Store can be obtained in JSON format by entering the ID of the corresponding app at the URL below. https://itunes.apple.com/jp/rss/customerreviews/id=(アプリID)/page=1/json This time, we will target the Twitter app. The ID is 333903271. You can get up to 10 pages by changing the number after page =.

Get the app review data with the script below.

import pandas as pd
import requests
import json

rss_url = 'https://itunes.apple.com/jp/rss/customerreviews/id={}/sortBy=mostRecent/page={}/json'
app_id = '333903271'

def get_reviews(url):
    """ 
From API response of iOS review acquisition[Score, title, text, name]Get a list of
    """
    response = requests.get(url, timeout=3.5)
    response_json = json.loads(response.text)
    reviews = [[int(entry['im:rating']['label']), entry['title']['label'], entry['content']['label'], entry['author']['name']['label']]\
               for entry in response_json['feed']['entry']]
    return reviews    

review_list = []
# 1~Collect 10 pages of reviews
for i in range(1, 11): 
    page_url = rss_url.format(app_id, i)
    reviews = get_reviews(page_url)
    review_list += reviews

review_df = pd.DataFrame(review_list, columns=['point', 'title', 'review', 'name'])![twitter_wordcloud.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/501910/8c3531e3-7cde-a60f-8d5d-d8acd8594c6b.png)

The collected data looks like this.

スクリーンショット 2020-06-28 17.36.11.png

Creating Word Cloud

Create a wordcloud image based on the collected review data. First, install MeCab for morphological analysis.

$ brew install mecab
$ brew install mecab-ipadic

Then install the wordcloud library with python.

$ pip install wordcloud

You can create a wordcloud with the script below. For Japanese input, specify the path of the Japanese font file in the execution environment.

#Specify the path where the Japanese font is stored
FONT_PATH = '/System/Library/Fonts//Hiragino Horn Gothic W3.ttc'

def prepare_word_list(words):
    """
Create a string for input in wodcloud
    
    Args:
        words([str]):List of sentences
        
    Retruns:
        str:A character string that extracts only the specified part of speech from all words and combines them separated by spaces.
    """
    m = MeCab.Tagger('')
    parsed_words = []
    for word in words:
        items = [x.split('\t') for x in m.parse(word).splitlines()]
        for item in items:
            if item[0] == 'EOS' or item[0] == '':
                pass
            elif item[1].split(',')[0] in ["noun", "adjective", "verb"]:
                parsed_words.append(item[0]) 

    return ' '.join(parsed_words)


def make_wordcloud(words, file_name):
    """
Create a wordcloud image file from the entered text.
    
    Args:
        words(str):A string of words separated by spaces
        file_name(str):Image file output destination path
        
    Returns:
None
    """
    parsed_words = prepare_word_list(words)
    wordc = wordcloud.WordCloud(
        font_path = FONT_PATH,
        background_color='white',
        contour_width=2,
        width=800,
        height=600,
    ).generate(parsed_words)
    
    wordc.to_file(file_name)


make_wordcloud(review_df['review'], './image/twitter_wordcloud.png')

Here is the completed wordcloud image!

Twitter, account, freeze, follow, timeline, etc. are lined up with twitter-like words.

Summary

I was able to visualize the reviews of the ios app. Wordcloud may be good when you want to quickly capture images of a large number of documents.

Recommended Posts

Visualize app reviews using wordcloud
Visualize 2019 nem with WordCloud
Interactive handwriting recognition app using pygame