[PYTHON] Instant visualization of Qiita Advent Calender 2019

Instant visualization of Qiita Advent Calender 2019

This article is the 20th day article of All About Group (All About Co., Ltd.) Advent Calendar 2019. It's Christmas when I sleep a few more times. I believe that Santa Claus will really come now as an adult.

Nice to meet you for the first time, this is Akidukin14. I am usually in charge of machine learning of All About's advertisement distribution system and visualization of analysis results.

This is my first time writing an Advent Calendar, so I'll try what I have come up with.

What I did (overview)

Let's visualize the article data inside each calendar of Qiita Advent Calender (Only for articles posted on Qiita ...)

Contents (Topics)

--Background --What you did (bulleted list) --Result --What you did (details) --Extract the article text of the Advent calendar --Process the article text in natural language (separate writing) --Visualization with WordCloud --Reference --Advent Calender Index

background

(As of 12 / n 2019) ** Number of calendars: 770 ** ** Number of participants: 13,414 **

This is my first time to participate in Advent Calender, but the first thing I thought about was.

Pane. The number of calendars. Participant Pane.

Where is the calendar that you are interested in pinpointing ... ?? I thought that was my motive.

What I did (bulleted)

  1. Extract the article text of the Advent calendar
  2. Process the article text in natural language (separate writing)
  3. Visualize with WordCloud

result

I will post the result first. If the result is enough, please check here. (I've done so much light, so I think the results are biased, I'm sorry ...)

Image 1

0_wordcloud.png

Image 2

1_wordcloud.png

Image 3

2_wordcloud.png

Image 4

3_wordcloud.png

Image 5

4_wordcloud.png

Image 6

5_wordcloud.png

Image 7

6_wordcloud.png

Image 8

7_wordcloud.png

Image 9

8_wordcloud.png

Image 10

9_wordcloud.png

Image 11

10_wordcloud.png

Image 12

11_wordcloud.png

Image 13

12_wordcloud.png

Image 14

13_wordcloud.png

Image 15

14_wordcloud.png

Image 16

15_wordcloud.png

Image 17

16_wordcloud.png

Image 18

17_wordcloud.png

Image 19

18_wordcloud.png

Image 20

19_wordcloud.png

Image 21

20_wordcloud.png

Image 22

21_wordcloud.png

Image 23

22_wordcloud.png

Image 24

23_wordcloud.png

Image 25

24_wordcloud.png

Image 26

25_wordcloud.png

Image 27

26_wordcloud.png

Image 28

27_wordcloud.png

Image 29

28_wordcloud.png

Image 30

29_wordcloud.png

Image 31

30_wordcloud.png

Image 32

31_wordcloud.png

1. Extract the article text of the Advent calendar

  1. Hit Qiita API from Python to get article data

call_qiita_api


def call_qiita_api(header, per_page = None, query = None, page = None):
    ##api designation
    get_items_api = 'https://qiita.com/api/v2/items'
    params = {'per_page' : per_page
        , 'query' : query
        , 'page' : page}
    datas = requests.get(get_items_api, params = params, headers = header)
    return datas
  1. This is an Advent calendar! Get data by targeting articles that you think

regs_body_text


###The code is really dirty...
def regs_body_text(text):
    ##Normalization pattern
    reg_pattern = re.compile('(\n|\t| | |-|~|-|`|:|;|_|\*|\!|\?|!|?|\+|\$|#|\[|\])')
    tmp = re.sub(reg_pattern, '', text.lower())
    target_type = re.compile('(Advent calendar|adventcalendar)')
    if not re.search(target_type, tmp):
        return None, None
    calender_type = re.compile('(This article)\w+?(adventcalendar2019)')
    if not re.search(calender_type, tmp):
        return None, None,
    url_strings = re.search(calender_type, tmp).group()
    get_calender_type = re.sub('(This article|adventcalendar2019)', '', url_strings)
    return get_calender_type, tmp

2. Process the article text in natural language (separate writing)

  1. Specify the part of speech with MeCab and divide it into words

parse_text


mecab = MeCab.Tagger('-Owakati')
mecab.parse('')

def parse_text(text, parser = mecab):
    part = ['noun','動noun']
    parsed_text = []
    t = parser.parseToNode(text)
    while t:
        parts = t.feature.split(',')
        if parts[0] in part:
            parsed_text.append(t.surface)
        t = t.next
    return parsed_text

3. Visualize with WordCloud

  1. Supports drawing in 9x9 area at any time

make_wordcloud


##Visualize with WordCloud
keys_len = len(dataset.keys())
plot_picture = int(keys_len / 9) + 1
plot_area = np.arange(0,9,1).reshape(3,3)
keys = sorted(dataset.keys())
fp = FontProperties(fname = fonts)
k = 0
for pp in range(plot_picture):
    fig,axes = plt.subplots(nrows = 3, ncols = 3, figsize = (10,10))
    for i in range(9):
        sys.stdout.write('\r {}/{}'.format(k, keys_len))
        target_key = keys[k]
        wc = wordcloud.WordCloud(
                font_path = fonts
                , prefer_horizontal = 1
                , max_words = 300
                , background_color = 'white'
                , colormap = 'RdYlBu'
                , contour_color='pink'
                , width = 750
                , height = 750)
        n,m = [x.item() for x in np.where(plot_area == i)]
        plot_data = ' '.join([y for x in dataset[target_key]['parsed_text'] for y in x if not check_word(y)])
        wc_gen = wc.generate(plot_data)
        axes[n,m].imshow(wc_gen, interpolation = 'bilinear')
        axes[n,m].set_title('AdventCalendar : {}'.format(target_key), FontProperties = fp, color = 'gray', fontsize = 10)
        axes[n,m].axis('off')
        k += 1
    plt.subplots_adjust(left=0.1, right=0.95, bottom=0.1, top=0.95)
    plt.savefig('{}_wordcloud.png'.format(pp))
    plt.close()

Reference URL

Using Qiita API: https://qiita.com/arai-qiita/items/94902fc0e686e59cb8c5

AdventCalender index

This is the type of calendar I used this time. I prepared it as an index.

Image No AdventCalenderNo Advent Calender name
1 0 1on1
1 1 2019 new graduate engineer
1 2 3dsensor
1 3 access
1 4 airccar
1 5 aizu
1 6 akerun
1 7 alh
1 8 alibabacloud
2 9 amazoneks
2 10 amazoneks2
2 11 android
2 12 android2
2 13 android for beginners
2 14 angular
2 15 angular2
2 16 ansible
2 17 ansible2
3 18 appsscript
3 19 arduino
3 20 asoview
3 21 aws
3 22 awsamplify
3 23 aws lambda and serverless1
3 24 aws beginner
3 25 azure
3 26 bitrise
4 27 blockchain
4 28 bosyu
4 29 brainpad
4 30 c
4 31 cakephp
4 32 calendargmo ad marketing
4 33 camphor
4 34 cbcloud
4 35 circleci
5 36 classi
5 37 clojure
5 38 codebaseokinawa
5 39 conoha
5 40 css
5 41 cyberagent20 new graduate
5 42 cyberagentdevelopers
5 43 dart
5 44 datadog
6 45 dena
6 46 advent calendar by dena20 graduate candidate engineer dena20 new graduate
6 47 dena20 new graduate
6 48 deno
6 49 discord
6 50 diverse
6 51 django
6 52 dmm group
6 53 dotfiles
7 54 dsl
7 55 dtp
7 56 eccube
7 57 elasticstack
7 58 elixir
7 59 elm
7 60 elm2
7 61 emacs
7 62 enebular
8 63 engineeringmanager
8 64 Talk about ethercat
8 65 filemaker
8 66 firebase
8 67 flutter
8 68 flutter2
8 69 fork
8 70 foss4g
8 71 People involved in freee data
9 72 fun
9 73 fusic
9 74 fusic part 2
9 75 git
9 76 globis
9 77 gmo pepabo
9 78 go
9 79 go3
9 80 go4
10 81 go5
10 82 go6
10 83 go7
10 84 goodpatch
10 85 hamee
10 86 haskell
10 87 heroku
10 88 houdiniapprentice
10 89 hrtech
11 90 ios2
11 91 iotlt
11 92 iplug
11 93 ipv6
11 94 The second piece of iq1
11 95 iridge
11 96 jamstack
11 97 java
11 98 javascript
12 99 javascript2
12 100 kaggle
12 101 kayac
12 102 kintone
12 103 kintone2
12 104 klab
12 105 klabengineer
12 106 kubernetes
12 107 kubernetes2
13 108 kubernetes3
13 109 kyash
13 110 kyotouniversity
13 111 laravel
13 112 laravel2
13 113 libreoffice
13 114 lifull
13 115 lifull part 3
13 116 makeit
14 117 maya
14 118 microad
14 119 microsoftazuretech
14 120 microsoftpowerbi
14 121 misoca yayoi
14 122 mohikanz
14 123 mysql
14 124 ncc
14 125 nem
15 126 nervesjp
15 127 nestjs
15 128 newspicks
15 129 nijibox
15 130 nodered
15 131 northdetail
15 132 ntt communications
15 133 ntt techno cross
15 134 n high school
16 135 obniz
16 136 office365
16 137 oicitcreateclub
16 138 openandreproduciblescience
16 139 opensaasstudio
16 140 opttechnologies
16 141 oraclecloudinfrastructure
16 142 othlotech
16 143 pandoc
17 144 pathee
17 145 perl
17 146 php
17 147 plaid
17 148 ponos
17 149 pwa
17 150 pyladiesjapan
17 151 python
17 152 python part 3
18 153 qiitagithubactions
18 154 qt
18 155 qualiarts
18 156 r
18 157 react
18 158 react2
18 159 reactnative
18 160 retty
18 161 rpa
19 162 ruby
19 163 runteq
19 164 rust
19 165 rust part 2
19 166 rust part 3
19 167 salesforceplatform
19 168 sansan
19 169 sap
19 170 satysfi
20 171 sbai
20 172 scala
20 173 sensy
20 174 sfc
20 175 sfcrg
20 176 siv3d
20 177 slack
20 178 smarthr
20 179 snowrobin
21 180 soracom
21 181 speee
21 182 splunk
21 183 sra
21 184 sre
21 185 studioztech
21 186 swift
21 187 terraform
21 188 tjbot
22 189 tokyocityuniversity
22 190 tomowarkar alone
22 191 typescript
22 192 unity
22 193 unity2
22 194 unity3
22 195 valu
22 196 vexperts
22 197 vim
23 198 vim2
23 199 visualstudiocode
23 200 vrchat
23 201 vtubertech1
23 202 vue2
23 203 wanogroup
23 204 wano group
23 205 webgl
23 206 workflow
24 207 xamarin
24 208 yamap engineer
24 209 zeals
24 210 zlab
24 211 zozo technologies
24 212 zozo Technologies 1
24 213 zozo Technologies 2
24 214 zozo Technologies 3
24 215 zozo Technologies 4
25 216 zozo Technologies 5
25 217 Uluru
25 218 Kufu Company
25 219 Sakura Internet
25 220 Just a group
25 221 Anything for the time being
25 222 Engineer who wants to spread something
25 223 Looking back
25 224 Puri Puri Appliance
26 225 Iridge
26 226 Aso View
26 227 Inception deck
26 228 Willgate
26 229 Web crew
26 230 M3 Career
26 231 AP Communications
26 232 Keyboard 1
26 233 Giftee
27 234 Fucking app
27 235 Fucking app 2
27 236 Crowdworks
27 237 Shader advent calendar
27 238 Ciscosystemsjapan by Cisco volunteers
27 239 Japan system
27 240 G's Academy
27 241 Smart speaker
27 242 Software testing
28 243 Software test tips
28 244 Dip
28 245 About Data Science by Datamix Community
28 246 Data structures and algorithms
28 247 Toreta
28 248 Domain Driven Design 1
28 249 Dwango
28 250 Nifty Group
28 251 Non-Pro Lab
29 252 Hands Lab
29 253 Fenrir Design and Technology
29 254 Photocreate
29 255 Future
29 256 Future 2
29 257 Fuller
29 258 Mynavi
29 259 Mixi 20 new graduate
29 260 Mixi group
30 261 Motivation cloud series
30 262 Your Meister
30 263 Unique Vision Co., Ltd.
30 264 Lux
30 265 Ray tracing
30 266 Personal development
30 267 thousand
30 268 Kure National College of Technology
30 269 Shinagawa
31 270 Muroran Institute of Technology Data Science Laboratory dsl
31 271 Miyazaki it related study session
31 272 Fujitsu Cloud Technologies
31 273 Attorney dot com
31 274 Access Co., Ltd.
31 275 Amazonai by Knowledgecom operated by Knowledge Communication Co., Ltd.
31 276 How did you learn machine learning by Nikkei xtech Business ai②
31 277 Delve into machine learning tools by Nikkei xtech Business ai③
31 278 Spring source club
32 279 Fukuoka young sierbc
32 280 Second Dwango
32 281 Self-made os
32 282 Natural language processing
32 283 Natural language processing 2
32 284 Ibaraki University
32 285 Certification Authorization technology
32 286 Kinki University
32 287 Suzuka National College of Technology

Recommended Posts

Instant visualization of Qiita Advent Calender 2019
Transition of Qiita posts
Looking back on the transition of the Qiita Advent calendar
LGTM outside of Qiita
Qiita memo of my thoughts
Visualization of data by prefecture
Visualization of possessed skills [continuation]
I tried scraping the ranking of Qiita Advent Calendar with Python