I'm tired of Python, so I analyzed the data with nehan (corona related, is that word now?)


Hello, this is sunfish. Data analysis using Python has become popular these days, but it is difficult to master. The goal is to struggle with Python, and the business improvement that I originally wanted to achieve is here. .. .. I would like to introduce an example of analyzing data using the GUI tool "nehan" to solve such problems.

Looking back on the number of occurrences of a specific word from Twitter data

More than half a year has passed since the coronavirus became a social problem. Let's follow the number of occurrences of that word from the tweet data for the past two months.


nehan can directly import Twitter data, and this time I used that function. I will introduce it later. Every day from July 27, 2020 ** 3,000 tweets including "Corona" in the tweet text are accumulated and data for about 2 months is prepared. Click here for details of the data (https://sunfish.nehan.io/datasources_v2/3424) スクリーンショット 2020-09-28 19.13.12.png


1. Select columns only for Text and Created_At to be used
port_2 = port_1[['Created_At', 'Text']]

スクリーンショット 2020-09-28 19.14.05.png

2. Change Created_At to date type
port_3 = port_2.copy()
port_3['Created_At'] = pd.to_datetime(
    port_3['Created_At'], errors='coerce', foramt=None)
port_3['Created_At'] = port_3['Created_At'].map(lambda x: x.date())

スクリーンショット 2020-09-28 19.14.16.png

3. Created_At, which cannot be changed to date type, is a missing value, so delete each row.
port_4 = port_3.copy()
port_4 = port_4.dropna(subset=None, how='any')

スクリーンショット 2020-09-28 19.14.31.png

Aggregate the number of words by day

4. Filter to tweets containing specific words
port_5 = port_4[(port_4['Text'].str.contains('cluster', na=False, regex=False))]

スクリーンショット 2020-09-28 19.14.53.png

5. Aggregate daily
port_9 = port_5.copy()
port_9 = port_9.groupby(['Created_At']).agg(
    {'Created_At': ['size']}).reset_index()
port_9.columns = ['Created_At', 'Line count']

スクリーンショット 2020-09-28 19.15.09.png

Visualize and consider


スクリーンショット 2020-09-28 19.17.44.png The word "cluster" is widely recognized as a symbol of explosive infection. The reason why it flew on 8/9 is probably due to the [Cluster Festival] held in Shibuya (https://news.yahoo.co.jp/articles/76e47dc2ce6608e018fe37bc92be296e381f76fa?page=1).


スクリーンショット 2020-09-28 19.17.57.png I also looked at this word, which made me feel nostalgic.


スクリーンショット 2020-09-28 19.18.11.png A new lifestyle is taking root, but it seems that the self-restraint mood is not completely over. It looks like it is gradually decreasing.


In order to get an exact result, I really have to do more pre-processing, but I tried to process the data simply for a rough observation and an introduction to nehan. In addition, the above source code is a copy of the code output by nehan's python export function.

