This article is the 20th day article of All About Group (All About Co., Ltd.) Advent Calendar 2019. It's Christmas when I sleep a few more times. I believe that Santa Claus will really come now as an adult.
Nice to meet you for the first time, this is Akidukin14. I am usually in charge of machine learning of All About's advertisement distribution system and visualization of analysis results.
This is my first time writing an Advent Calendar, so I'll try what I have come up with.
Let's visualize the article data inside each calendar of Qiita Advent Calender (Only for articles posted on Qiita ...)
--Background --What you did (bulleted list) --Result --What you did (details) --Extract the article text of the Advent calendar --Process the article text in natural language (separate writing) --Visualization with WordCloud --Reference --Advent Calender Index
(As of 12 / n 2019) ** Number of calendars: 770 ** ** Number of participants: 13,414 **
This is my first time to participate in Advent Calender, but the first thing I thought about was.
Pane. The number of calendars. Participant Pane.
Where is the calendar that you are interested in pinpointing ... ?? I thought that was my motive.
I will post the result first. If the result is enough, please check here. (I've done so much light, so I think the results are biased, I'm sorry ...)
































call_qiita_api
def call_qiita_api(header, per_page = None, query = None, page = None):
##api designation
get_items_api = 'https://qiita.com/api/v2/items'
params = {'per_page' : per_page
, 'query' : query
, 'page' : page}
datas = requests.get(get_items_api, params = params, headers = header)
return datas
regs_body_text
###The code is really dirty...
def regs_body_text(text):
##Normalization pattern
reg_pattern = re.compile('(\n|\t| | |-|~|-|`|:|;|_|\*|\!|\?|!|?|\+|\$|#|\[|\])')
tmp = re.sub(reg_pattern, '', text.lower())
target_type = re.compile('(Advent calendar|adventcalendar)')
if not re.search(target_type, tmp):
return None, None
calender_type = re.compile('(This article)\w+?(adventcalendar2019)')
if not re.search(calender_type, tmp):
return None, None,
url_strings = re.search(calender_type, tmp).group()
get_calender_type = re.sub('(This article|adventcalendar2019)', '', url_strings)
return get_calender_type, tmp
parse_text
mecab = MeCab.Tagger('-Owakati')
mecab.parse('')
def parse_text(text, parser = mecab):
part = ['noun','動noun']
parsed_text = []
t = parser.parseToNode(text)
while t:
parts = t.feature.split(',')
if parts[0] in part:
parsed_text.append(t.surface)
t = t.next
return parsed_text
make_wordcloud
##Visualize with WordCloud
keys_len = len(dataset.keys())
plot_picture = int(keys_len / 9) + 1
plot_area = np.arange(0,9,1).reshape(3,3)
keys = sorted(dataset.keys())
fp = FontProperties(fname = fonts)
k = 0
for pp in range(plot_picture):
fig,axes = plt.subplots(nrows = 3, ncols = 3, figsize = (10,10))
for i in range(9):
sys.stdout.write('\r {}/{}'.format(k, keys_len))
target_key = keys[k]
wc = wordcloud.WordCloud(
font_path = fonts
, prefer_horizontal = 1
, max_words = 300
, background_color = 'white'
, colormap = 'RdYlBu'
, contour_color='pink'
, width = 750
, height = 750)
n,m = [x.item() for x in np.where(plot_area == i)]
plot_data = ' '.join([y for x in dataset[target_key]['parsed_text'] for y in x if not check_word(y)])
wc_gen = wc.generate(plot_data)
axes[n,m].imshow(wc_gen, interpolation = 'bilinear')
axes[n,m].set_title('AdventCalendar : {}'.format(target_key), FontProperties = fp, color = 'gray', fontsize = 10)
axes[n,m].axis('off')
k += 1
plt.subplots_adjust(left=0.1, right=0.95, bottom=0.1, top=0.95)
plt.savefig('{}_wordcloud.png'.format(pp))
plt.close()
Using Qiita API: https://qiita.com/arai-qiita/items/94902fc0e686e59cb8c5
This is the type of calendar I used this time. I prepared it as an index.
| Image No | AdventCalenderNo | Advent Calender name |
|---|---|---|
| 1 | 0 | 1on1 |
| 1 | 1 | 2019 new graduate engineer |
| 1 | 2 | 3dsensor |
| 1 | 3 | access |
| 1 | 4 | airccar |
| 1 | 5 | aizu |
| 1 | 6 | akerun |
| 1 | 7 | alh |
| 1 | 8 | alibabacloud |
| 2 | 9 | amazoneks |
| 2 | 10 | amazoneks2 |
| 2 | 11 | android |
| 2 | 12 | android2 |
| 2 | 13 | android for beginners |
| 2 | 14 | angular |
| 2 | 15 | angular2 |
| 2 | 16 | ansible |
| 2 | 17 | ansible2 |
| 3 | 18 | appsscript |
| 3 | 19 | arduino |
| 3 | 20 | asoview |
| 3 | 21 | aws |
| 3 | 22 | awsamplify |
| 3 | 23 | aws lambda and serverless1 |
| 3 | 24 | aws beginner |
| 3 | 25 | azure |
| 3 | 26 | bitrise |
| 4 | 27 | blockchain |
| 4 | 28 | bosyu |
| 4 | 29 | brainpad |
| 4 | 30 | c |
| 4 | 31 | cakephp |
| 4 | 32 | calendargmo ad marketing |
| 4 | 33 | camphor |
| 4 | 34 | cbcloud |
| 4 | 35 | circleci |
| 5 | 36 | classi |
| 5 | 37 | clojure |
| 5 | 38 | codebaseokinawa |
| 5 | 39 | conoha |
| 5 | 40 | css |
| 5 | 41 | cyberagent20 new graduate |
| 5 | 42 | cyberagentdevelopers |
| 5 | 43 | dart |
| 5 | 44 | datadog |
| 6 | 45 | dena |
| 6 | 46 | advent calendar by dena20 graduate candidate engineer dena20 new graduate |
| 6 | 47 | dena20 new graduate |
| 6 | 48 | deno |
| 6 | 49 | discord |
| 6 | 50 | diverse |
| 6 | 51 | django |
| 6 | 52 | dmm group |
| 6 | 53 | dotfiles |
| 7 | 54 | dsl |
| 7 | 55 | dtp |
| 7 | 56 | eccube |
| 7 | 57 | elasticstack |
| 7 | 58 | elixir |
| 7 | 59 | elm |
| 7 | 60 | elm2 |
| 7 | 61 | emacs |
| 7 | 62 | enebular |
| 8 | 63 | engineeringmanager |
| 8 | 64 | Talk about ethercat |
| 8 | 65 | filemaker |
| 8 | 66 | firebase |
| 8 | 67 | flutter |
| 8 | 68 | flutter2 |
| 8 | 69 | fork |
| 8 | 70 | foss4g |
| 8 | 71 | People involved in freee data |
| 9 | 72 | fun |
| 9 | 73 | fusic |
| 9 | 74 | fusic part 2 |
| 9 | 75 | git |
| 9 | 76 | globis |
| 9 | 77 | gmo pepabo |
| 9 | 78 | go |
| 9 | 79 | go3 |
| 9 | 80 | go4 |
| 10 | 81 | go5 |
| 10 | 82 | go6 |
| 10 | 83 | go7 |
| 10 | 84 | goodpatch |
| 10 | 85 | hamee |
| 10 | 86 | haskell |
| 10 | 87 | heroku |
| 10 | 88 | houdiniapprentice |
| 10 | 89 | hrtech |
| 11 | 90 | ios2 |
| 11 | 91 | iotlt |
| 11 | 92 | iplug |
| 11 | 93 | ipv6 |
| 11 | 94 | The second piece of iq1 |
| 11 | 95 | iridge |
| 11 | 96 | jamstack |
| 11 | 97 | java |
| 11 | 98 | javascript |
| 12 | 99 | javascript2 |
| 12 | 100 | kaggle |
| 12 | 101 | kayac |
| 12 | 102 | kintone |
| 12 | 103 | kintone2 |
| 12 | 104 | klab |
| 12 | 105 | klabengineer |
| 12 | 106 | kubernetes |
| 12 | 107 | kubernetes2 |
| 13 | 108 | kubernetes3 |
| 13 | 109 | kyash |
| 13 | 110 | kyotouniversity |
| 13 | 111 | laravel |
| 13 | 112 | laravel2 |
| 13 | 113 | libreoffice |
| 13 | 114 | lifull |
| 13 | 115 | lifull part 3 |
| 13 | 116 | makeit |
| 14 | 117 | maya |
| 14 | 118 | microad |
| 14 | 119 | microsoftazuretech |
| 14 | 120 | microsoftpowerbi |
| 14 | 121 | misoca yayoi |
| 14 | 122 | mohikanz |
| 14 | 123 | mysql |
| 14 | 124 | ncc |
| 14 | 125 | nem |
| 15 | 126 | nervesjp |
| 15 | 127 | nestjs |
| 15 | 128 | newspicks |
| 15 | 129 | nijibox |
| 15 | 130 | nodered |
| 15 | 131 | northdetail |
| 15 | 132 | ntt communications |
| 15 | 133 | ntt techno cross |
| 15 | 134 | n high school |
| 16 | 135 | obniz |
| 16 | 136 | office365 |
| 16 | 137 | oicitcreateclub |
| 16 | 138 | openandreproduciblescience |
| 16 | 139 | opensaasstudio |
| 16 | 140 | opttechnologies |
| 16 | 141 | oraclecloudinfrastructure |
| 16 | 142 | othlotech |
| 16 | 143 | pandoc |
| 17 | 144 | pathee |
| 17 | 145 | perl |
| 17 | 146 | php |
| 17 | 147 | plaid |
| 17 | 148 | ponos |
| 17 | 149 | pwa |
| 17 | 150 | pyladiesjapan |
| 17 | 151 | python |
| 17 | 152 | python part 3 |
| 18 | 153 | qiitagithubactions |
| 18 | 154 | qt |
| 18 | 155 | qualiarts |
| 18 | 156 | r |
| 18 | 157 | react |
| 18 | 158 | react2 |
| 18 | 159 | reactnative |
| 18 | 160 | retty |
| 18 | 161 | rpa |
| 19 | 162 | ruby |
| 19 | 163 | runteq |
| 19 | 164 | rust |
| 19 | 165 | rust part 2 |
| 19 | 166 | rust part 3 |
| 19 | 167 | salesforceplatform |
| 19 | 168 | sansan |
| 19 | 169 | sap |
| 19 | 170 | satysfi |
| 20 | 171 | sbai |
| 20 | 172 | scala |
| 20 | 173 | sensy |
| 20 | 174 | sfc |
| 20 | 175 | sfcrg |
| 20 | 176 | siv3d |
| 20 | 177 | slack |
| 20 | 178 | smarthr |
| 20 | 179 | snowrobin |
| 21 | 180 | soracom |
| 21 | 181 | speee |
| 21 | 182 | splunk |
| 21 | 183 | sra |
| 21 | 184 | sre |
| 21 | 185 | studioztech |
| 21 | 186 | swift |
| 21 | 187 | terraform |
| 21 | 188 | tjbot |
| 22 | 189 | tokyocityuniversity |
| 22 | 190 | tomowarkar alone |
| 22 | 191 | typescript |
| 22 | 192 | unity |
| 22 | 193 | unity2 |
| 22 | 194 | unity3 |
| 22 | 195 | valu |
| 22 | 196 | vexperts |
| 22 | 197 | vim |
| 23 | 198 | vim2 |
| 23 | 199 | visualstudiocode |
| 23 | 200 | vrchat |
| 23 | 201 | vtubertech1 |
| 23 | 202 | vue2 |
| 23 | 203 | wanogroup |
| 23 | 204 | wano group |
| 23 | 205 | webgl |
| 23 | 206 | workflow |
| 24 | 207 | xamarin |
| 24 | 208 | yamap engineer |
| 24 | 209 | zeals |
| 24 | 210 | zlab |
| 24 | 211 | zozo technologies |
| 24 | 212 | zozo Technologies 1 |
| 24 | 213 | zozo Technologies 2 |
| 24 | 214 | zozo Technologies 3 |
| 24 | 215 | zozo Technologies 4 |
| 25 | 216 | zozo Technologies 5 |
| 25 | 217 | Uluru |
| 25 | 218 | Kufu Company |
| 25 | 219 | Sakura Internet |
| 25 | 220 | Just a group |
| 25 | 221 | Anything for the time being |
| 25 | 222 | Engineer who wants to spread something |
| 25 | 223 | Looking back |
| 25 | 224 | Puri Puri Appliance |
| 26 | 225 | Iridge |
| 26 | 226 | Aso View |
| 26 | 227 | Inception deck |
| 26 | 228 | Willgate |
| 26 | 229 | Web crew |
| 26 | 230 | M3 Career |
| 26 | 231 | AP Communications |
| 26 | 232 | Keyboard 1 |
| 26 | 233 | Giftee |
| 27 | 234 | Fucking app |
| 27 | 235 | Fucking app 2 |
| 27 | 236 | Crowdworks |
| 27 | 237 | Shader advent calendar |
| 27 | 238 | Ciscosystemsjapan by Cisco volunteers |
| 27 | 239 | Japan system |
| 27 | 240 | G's Academy |
| 27 | 241 | Smart speaker |
| 27 | 242 | Software testing |
| 28 | 243 | Software test tips |
| 28 | 244 | Dip |
| 28 | 245 | About Data Science by Datamix Community |
| 28 | 246 | Data structures and algorithms |
| 28 | 247 | Toreta |
| 28 | 248 | Domain Driven Design 1 |
| 28 | 249 | Dwango |
| 28 | 250 | Nifty Group |
| 28 | 251 | Non-Pro Lab |
| 29 | 252 | Hands Lab |
| 29 | 253 | Fenrir Design and Technology |
| 29 | 254 | Photocreate |
| 29 | 255 | Future |
| 29 | 256 | Future 2 |
| 29 | 257 | Fuller |
| 29 | 258 | Mynavi |
| 29 | 259 | Mixi 20 new graduate |
| 29 | 260 | Mixi group |
| 30 | 261 | Motivation cloud series |
| 30 | 262 | Your Meister |
| 30 | 263 | Unique Vision Co., Ltd. |
| 30 | 264 | Lux |
| 30 | 265 | Ray tracing |
| 30 | 266 | Personal development |
| 30 | 267 | thousand |
| 30 | 268 | Kure National College of Technology |
| 30 | 269 | Shinagawa |
| 31 | 270 | Muroran Institute of Technology Data Science Laboratory dsl |
| 31 | 271 | Miyazaki it related study session |
| 31 | 272 | Fujitsu Cloud Technologies |
| 31 | 273 | Attorney dot com |
| 31 | 274 | Access Co., Ltd. |
| 31 | 275 | Amazonai by Knowledgecom operated by Knowledge Communication Co., Ltd. |
| 31 | 276 | How did you learn machine learning by Nikkei xtech Business ai② |
| 31 | 277 | Delve into machine learning tools by Nikkei xtech Business ai③ |
| 31 | 278 | Spring source club |
| 32 | 279 | Fukuoka young sierbc |
| 32 | 280 | Second Dwango |
| 32 | 281 | Self-made os |
| 32 | 282 | Natural language processing |
| 32 | 283 | Natural language processing 2 |
| 32 | 284 | Ibaraki University |
| 32 | 285 | Certification Authorization technology |
| 32 | 286 | Kinki University |
| 32 | 287 | Suzuka National College of Technology |
Recommended Posts