Samari

--We have created a network that connects VTuber channels. --The weight of the edge of the network is how much the viewer of the channel overlaps with another channel. --The offices considered are Nijisanji, hololive, 774 .inc, upd8, Nori Pro, and individuals. (Personal selection is 100% of my hobbies and tastes ...) ――This time, I just visualized it. I haven't analyzed it. There are still many things I want to do, so I will do it when the power pro calms down. ――July 12th is the 3D unveiling of hololive Kakumaki Watame-san. Let's see.

Of the edges that connect the distributors, the network that displays only the edges with the top 10% weight is as follows, and viewers are often seen among distributors belonging to the same office such as Nijisanji and hololive. I found that I was wearing it. graph_author_union_(90, 0).png

Motivated

Currently, there are many offices in the VTuber industry such as Nijisanji, hololive, 774 inc., Nori Pro, upd8, etc., and each office has many distributors. Of course, there are also individual distributors who do not belong to the office. The distributor posts videos for about an hour at a pace of about once every day to a few days. If you add up the video time for each office, the total time of videos posted per day can easily exceed 24 hours. Therefore, it is practically impossible to watch all the videos of multiple offices. The following is my registered channel column one day, but it's hard to see everything. .. .. ある日の登録チャンネル欄.png Perhaps many people are in the same situation. Therefore, each person chooses a video and a channel according to their hobbies and tastes. At this time, I thought it would be fun to visualize what channels are easy to watch at the same time.

Method

Many VTubers broadcast live, and you can post comments during the broadcast. For example, if you play the video (https://www.youtube.com/watch?v=Ypc_xKz--fY) of hololive's Luna Himemori (https://www.youtube.com/channel/UCa9Y57gfeY0Zro_noHRVrnw), You can see the following comment section at the time of live broadcasting aside. YouTube画面.png This comment log contains comment date, viewer name, spacha information, and more. This time, we will use viewer name information to evaluate the relationships between channels. Let $ U_i $ be the set of viewers who commented on a channel $ i $, and define how much the viewers suffered between channels $ w_ {ij} $, which is the common ratio of channels $ i $ and $ j . .. $ w_{ij} := \frac{|U_i \cap U_j|}{|U_i \cup U_j|} $$

This is used as an edge weight when visualizing as a network.

Calculation and visualization of interchannel similarity

The data acquisition period is from January 1, 2020 to June 30, 2020. In addition, we have not validated whether the comments of all the videos with comments were obtained correctly. .. ..

Get comment data

There is a data acquisition method in the following article, so I will use it almost as it is.

https://qiita.com/tf0101/items/efb2484a0b5b1cdc8291

As an example, I save each video in the following format.

AuthorName	BaseDate	ChannelId	Timestamp	VideoId	VideoLength
Moss Max	2020-05-07	UC--A2dwZW7-M2kID0N6_lfA	2020-05-07 19:53:23	-Alnw7B1GBo	2953
Chocolate cornet	2020-05-07	UC--A2dwZW7-M2kID0N6_lfA	2020-05-07 19:54:58	-Alnw7B1GBo	2953
Black dog	2020-05-07	UC--A2dwZW7-M2kID0N6_lfA	2020-05-07 19:55:08	-Alnw7B1GBo	2953
Oguna	2020-05-07	UC--A2dwZW7-M2kID0N6_lfA	2020-05-07 19:55:56	-Alnw7B1GBo	2953
High tension friday	2020-05-07	UC--A2dwZW7-M2kID0N6_lfA	2020-05-07 19:56:05	-Alnw7B1GBo	2953

Only VideoLength is texto. .. .. I won't use it this time. .. ..

Creating a dataset

First, create a list of viewers who commented on the data period on each channel. This can be done by merging all the comment lists obtained above. The code below tries to count the number of comments, but this is not really relevant to this work for the convenience of another work.

df = pd.concat([pd.read_pickle(path) for path in comment_paths])
counts = df.groupby(['AuthorName', 'ChannelId', 'VideoId', 'BaseDate']).size().to_frame('Count').reset_index()

The format is as follows.

AuthorName	ChannelId	VideoId	BaseDate	Count
chro nicle	UCwrjITPwG4q71HzihV2C7Nw	H7wgvBbxo1U	2020-06-30T00:00:00	1
Fazias	UChAnqc_AY5_I3Px5dig3X1Q	Q7DS6uaInMA	2020-06-30T00:00:00	26
Dream eating	UCuvk5PilcvDECU7dDZhQiEw	6uiQOEDmD6U	2020-06-30T00:00:00	91
Fatin Thifal	UCOmjciHZ8Au3iKMElKXCF_g	ZrFJpafDKVw	2020-06-30T00:00:00	3
Snail state of the futon	UC6oDys1BGgBsIC3WhG1BovQ	QHTLzahEiX4	2020-06-30T00:00:00	1

Adjacency matrix

As mentioned earlier, this time we will calculate the viewer's coverage between channels. This can be easily obtained by creating a user list for each channel and performing a set operation.

def corr_by_author_set_union(counts, channels):
    corr = pd.DataFrame().assign(Channel=channels).set_index('Channel')
    tmp = counts.loc[:, ['ChannelId', 'AuthorName']].drop_duplicates()
    channelId_to_set = {ch: set(tmp[tmp.ChannelId == ch].AuthorName) for ch in channels}
    for  ch1 in channels: 
        corr[ch1] = [(len(channelId_to_set[ch1] & channelId_to_set[ch2]) / \
                    len(channelId_to_set[ch1] | channelId_to_set[ch2])) for ch2 in channels]
    return corr

Graph depiction

Now, let's draw the graph. The code is almost the same as the following site. --I tried to visualize the national surname network at https://datumstudio.jp/blog/networkx

https://stackoverflow.com/questions/55750436/group-nodes-together-in-networkx

def create_graph(df, threshold=0.5, is_directed=True):
    assert set(df.index) == set(df.columns)

    #Create a graph
    if is_directed:
        graph = nx.DiGraph()
    else:
        graph = nx.Graph()

    #Add node
    for col in df.columns:
        if not graph.has_node(col):
            graph.add_node(col)

    #Add edge
    for a, b in itertools.combinations(df.columns, 2):
        if a == b or graph.has_edge(a, b):
            continue
        val = df.loc[a, b]
        if abs(val) < threshold:
            continue
        graph.add_edge(a, b, weight=val)

    return graph

def draw_char_graph(G, fname, edge_cmap=plt.cm.Greys, figsize=(16, 8)):
    plt.figure(figsize=figsize)
    weights = [G[u][v]['weight'] for u, v, in G.edges()]
    pos = nx.spring_layout(G, k=16)

    nodes = pos.keys()
    colors = list(set([channel_to_color[n] for n in nodes]))
    color_to_id = {colors[i]: i for i in range(len(colors))}
    angs = np.linspace(0, 2*np.pi, 1+len(colors))
    repos = []
    rad = 3.5
    for ea in angs:
        repos.append(np.array([rad*np.cos(ea), rad*np.sin(ea)]))
    for ea in pos.keys():
        posx = 0
        posx = color_to_id[channel_to_color[ea]]
        pos[ea] += repos[posx]

    nx.draw(G,
            pos, 
            node_color=[channel_to_color[n] for n in G.nodes()],
            edge_cmap=edge_cmap,
            edge_vmin=-3e4,
            width=weights,
            with_labels=True,
            font_family='Yu Gothic',
            font_size=8,
            font_color='green')
    plt.savefig(fname, dpi=128)
    plt.show()

Create and draw a graph using these.

Whole network

The line thickness corresponds to a high percentage of viewers in common. .. ..

union_corr = corr_by_author_set_union(channels)
#It is difficult to understand if it is ChannelId, so rewrite it to ChannnelName
union_corr = rename_ChannelId_to_ChannelName(union_corr)
graph = create_graph(union_corr, threshold=0, is_directed=False)
draw_char_graph(graph, 'fig/graph_author_union.png', figsize=(16, 16))

――Overall, the direction of Nijisanji is facing, and the percentage of people who are looking at Nijisanji and other offices at the same time is high. --Mr. Shigure Ui and Mr. Tamaki Inuyama have a strong edge not only in the direction of Nijisanji but also in the direction of hololive.

Whole network (only the top 10% of edge weights)

Since the number of displays in the previous graph is too large, consider reducing the number of edges. 10% is a sense. Since only the top 10% is plotted, if a line is drawn here, it can be interpreted that the viewer's coverage between the channels is very high. .. ..

# (Edge th, betweenness_centrality)
pairs = [(90, 0)]
df = union_corr.copy()
for pair in pairs:
    th = np.percentile(df.fillna(0).values.ravel(), pair[0])
    print(pair, th)
    graph = create_graph(df, threshold=th, is_directed=False)
    draw_char_graph(graph , 'fig/graph_author_union_{}.png'.format(pair), figsize=(16, 16))

――High common rate of viewers in the same office --Most of Tamaki Inuyama's connections are to hololive, which has a stronger connection to hololive than Nijisanji. --Same as Shigure Ui

graph_author_union_(90, 0).png

Office network

Here, only the top 10% of the edges are plotted. 　 As a personal impression, if the connection with the office is weak, the following can be considered.

――The entire office is connected, and the audience is weakly covered. --The connection with the outside of the office is strong, and when the inside of the office is visualized, the connection appears weak.

Nijisanji network

――I don't understand because the lines overlap too much. graph_author_union_btw_Nijisanji Japan_and_Nijisanji Japan.png

Lower and upper 3 channels of weights connected to each node

--The bottom 3 channels of the weight average of the edges to which the top 3 are connected --The bottom three are the top three channels

index	Mean	kind
Azuchi peach	0.02581	Nijisanji Japan
♥ ️♠️ Alice Mononobe ♦ ️♣️	0.03546	Nijisanji Japan
Gilzaren III Season 2	0.04463	Nijisanji Japan
Akina Saegusa/ Saegusa Akina	0.13969	Nijisanji Japan
Amamiya Kokoro/Kokoro Amamiya [Nijisanji affiliation]	0.14043	Nijisanji Japan
Gweru male girl/Gwelu Os Gar [Nijisanji]	0.14336	Nijisanji Japan

Network in hololive

――As personally felt ――It is conspicuous that a triangle is formed by the cover of the viewer layer. .. .. I feel --Noefure

graph_author_union_btw_Hololive Japan_and_Hololive Japan.png

index	Mean	kind
Mel Channel Night sky Mel channel	0.1314	Hololive Japan
SoraCh.Tokino Sora Channel	0.1653	Hololive Japan
Nakiri Ayame Ch.Hyakuki Ayame	0.1859	Hololive Japan
Kanata Ch.Amane Kanata	0.2664	Hololive Japan
Watame Ch.For square winding	0.2684	Hololive Japan
Shion Ch.Shisaki Zion	0.2699	Hololive Japan

Network within holostars

――As personally felt --Kaoru Tsukishita is good

index	Mean	kind
Izuru Ch.Player Izuru	0.1544	Holostars
Kira Ch.Mirror Kira	0.1688	Holostars
Rikka ch.Ritsumei	0.1748	Holostars
astel ch.Astel	0.2173	Holostars
Shien Ch.Kageyama Cien	0.2178	Holostars
Temma Ch.Nobuo Kishi	0.2222	Holostars

774 Network in .inc

-Is the audience divided by Sugariri, Honeystrap, and AniMare?

graph_author_union_btw_774 inc._and_774 inc..png

index	Mean	kind
Patra Channel /Suo Patra [Honeystrap]	0.1307	774 inc.
Haneru Channel /Haneru Inaba [AniMare]	0.1335	774 inc.
CAMOMI Camomi Channel [Kamomi Camomi]	0.1369	774 inc.
Izumi Channel /Izumi Yuzuhara [AniMare]	0.1931	774 inc.
Anna Channel /Anna Torajo [Sugariri]	0.1949	774 inc.
Rene Channel /Ryugasaki Rin [Sugariri]	0.2055	774 inc.

Network in upd8

――The line is thin and there is not much coverage of the viewer group --The line between Babiniku uncle is thick

index	Mean	kind
Engine Kazumi	0.03281	upd8
Yuuki Channel [Fucking sex education]	0.03323	upd8
Cheri High Homecoming Department	0.03345	upd8
Nora Cat Channel	0.04661	upd8
Tomari Mari channel /Tomari Mari Channel	0.04728	upd8
Tuna channel	0.04752	upd8

Nori Pro Network

――Since the line disappears, draw the top 25% of the line only here --Mr. Yuzuru Himesaki and Mr. Takuma Kumagai haven't posted any videos yet.

index	Mean	kind
Norio Tsukudani [Tamaki Inuyama]	0.2353	Noripuro
Aimiya Milk Milk Enomiya	0.2453	Noripuro
Shirayuki Mishiro	0.2591	Noripuro
Norio Tsukudani [Tamaki Inuyama]	0.2353	Noripuro
Aimiya Milk Milk Enomiya	0.2453	Noripuro
Shirayuki Mishiro	0.2591	Noripuro

Personal network

――I noticed after plotting, but Yui Yui and Shia Minase belong to the office. It is also obvious that the viewers overlap

graph_author_union_btw_Other VTubers_and_Other VTubers.png

index	Mean	kind
Kobana	0.08867	Other VTubers
Kazenomiya Festival/ Matsuri Channel	0.09249	Other VTubers
Heavenly Hiyo	0.09265	Other VTubers
Makio [Individual]	0.10765	Other VTubers
Sia Minase [Sia Channel]	0.11933	Other VTubers
Musubime Yui 〖YouTube〗	0.12053	Other VTubers

Network with another office

――If you do all the combinations, there will be a lot of images, so only between Nijisanji and hololive. ――Is it the influence of the Ozora family that Subaru Ozora and Keisuke Maimoto are in the top of the weight of the connected edge?

graph_author_union_btw_Hololive Japan_and_Nijisanji Japan.png

Among the hololive channels, the bottom 3 of the average weight of the edges connected to Nijisanji

index	Mean	kind
Mel Channel Night sky Mel channel	0.02613	Hololive Japan
SoraCh.Tokino Sora Channel	0.03727	Hololive Japan
Towa Ch.Everlasting Towa	0.03840	Hololive Japan
Kanata Ch.Amane Kanata	0.05254	Hololive Japan
Marine Ch.Treasure bell marine	0.05539	Hololive Japan
Subaru Ch.Ozora Subaru	0.05971	Hololive Japan

Of the Nijisanji channels, the bottom 3 of the average weight of the edges connected to hololive

index	Mean	kind
Azuchi peach	0.003585	Nijisanji Japan
Harusaki Air	0.009090	Nijisanji Japan
Gilzaren III Season 2	0.009123	Nijisanji Japan
[3rd grade 0 group] Mirei Gunmichi's classroom	0.085043	Nijisanji Japan
Keisuke Maimoto	0.087824	Nijisanji Japan
Lulu Suzuhara [Nijisanji affiliation]	0.096265	Nijisanji Japan

Impressions

――If you improve the collaboration, the viewers will be overwhelmed, that's right. ――It seems interesting to do core extraction and cluster analysis.

[PYTHON] I tried to visualize the common condition of VTuber channel viewers