Samari

--A brief summary of cluster coefficients, which is one of the network analysis methods --I tried to apply the cluster coefficient to the network where the edge was drawn by the common ratio of the viewers created last time. --Result --Calculate the cluster coefficient of Nijisanji and hololive by Zhang's method --The top 5 channels with cluster coefficients in each office are as follows.

name	c_coeff	kind
Ibrahim [Nijisanji]	0.4308	Zhang
Himawari Honma- Himawari Honma -	0.4316	Zhang
Kanae Channel	0.4317	Zhang
Ars Almar-ars almal-[Nijisanji]	0.4338	Zhang
Kuzuha Channel	0.4424	Zhang

name	c_coeff	kind
Fubuki Ch. Shirakami Fubuki	0.6696	Zhang
Aqua Ch.Minato Aqua	0.6708	Zhang
Coco Ch.Kiryu Coco	0.6719	Zhang
Pekora Ch.Usada Pekora	0.6748	Zhang
Korone Ch.Inugami Korone	0.6758	Zhang

Cluster coefficient

What is the cluster coefficient?

How cluster-centric are the nodes in the network in the network? Is one of the quantitative expressions of. As you can see from the definition, if it is a cluster, there is a desire that the surrounding area be a creek. .. .. The definition of the cluster coefficient differs depending on whether the weight of the edge of the graph is taken into consideration or not. In addition, all the definitions introduced below are definitions in undirected graphs.

For undirected graphs without weights

Definition

The cluster factor $ C_i $ for node $ i $ is defined below. However, if the number of adjacent nodes is 1 or less, it is set to 0.

\displaystyle{
\begin{aligned}
C_i &:= \frac{\sum_{j, k \in \Pi(i), j\neq k} a_{ij}a_{jk}a_{ki}}{k_i (k_i - 1)}\\
k_i &:= \sum_{j\in \Pi(i)} a_{ij}\\
a_{ij} &:Adjacency matrix\ A\of\ (i, j)\component\\
\Pi(i) &:node\ i\A set of nodes adjacent to
\end{aligned}
}

Since we are dealing with an undirected graph without weights, the components of the adjacency matrix have only $ 1 $ or $ 0 $ as values.

As for feelings, the more cluster-centric --The self and the two nodes connected to it are more closely connected --In many cases, you and the two nodes connected to you become a 3-creek (in other words).

Concrete example

Abbreviation

It will come out soon if you go around

For weighted undirected graphs

There seem to be various definitions of cluster coefficients in the case of weighted undirected graphs. Here are some of them. A weighted adjacency matrix is represented by $ W , and an adjacency matrix with only connection information ( 1 $ or $ 0 $) is represented by $ A $. Also, let $ s_i: = \ sum_ {j \ in \ Pi (i)} w_ {ij} $.

Note that if the weight matrix component is 0 or 1 (that is, an unweighted adjacency matrix), they match the cluster coefficients with no weights (see the reference link because the calculation is crappy).

reference

http://downloads.hindawi.com/journals/ddns/2008/375452.pdf
https://pdfs.semanticscholar.org/acb2/3e3c2264dbc900bac278c5b4477737741027.pdf

Definition

Zhang

\displaystyle{
\begin{aligned}
C_{i}^Z := \frac{\sum_{j, k \in \Pi(i), j\neq k} w_{ij}w_{jk}w_{ki}}{\left((\sum_{j\in\Pi(i)} w_{ij})^2 - \sum_{j\in\Pi(i)} w_{ij}^2\right) \max(w_{jk})}
\end{aligned}
}

--A simple extension of the cluster coefficient, where the numerator simply replaces $ a $ in the cluster coefficient with $ w $ and has a similar format if the denominator is also rearranged. -Useful when you want to consider all the weights of $ ij, jk, ki $ --The weight matrix $ w $ is normalized by $ \ max (w_ {jk}) $

Lopez-Fernandez

\displaystyle{
\begin{aligned}
C_{i}^L := \frac{\sum_{j, k \in \Pi(i), j\neq k} w_{jk}}{k_i (k_i-1)}
\end{aligned}
}

--Useful when you want to focus only on the weight between adjacent nodes $ j, k $ -I don't care about the weights of $ ij and ik $ --The weight matrix is not standardized

Onnela

\displaystyle{
\begin{aligned}
C_{i}^O := \frac{\sum_{j, k \in \Pi(i), j\neq k} (w_{ij}w_{jk}w_{ki})^{1/3}}{k_i (k_i-1) \max(w_{jk})}
\end{aligned}
}

-Useful when you want to consider all the weights of $ ij, jk, ki $ ――Because it is 1/3 powered, the effect of individual weights is weaker than that of Zhang. --The weight matrix is normalized by $ \ max (w_ {jk}) $ --This is implemented by the clustering method of NetworkX, one of the python libraries.

Barrat

\displaystyle{
\begin{aligned}
C_{i}^B := \frac{\sum_{j, k \in \Pi(i), j\neq k} (w_{ij} + w_{ki})a_{jk}}{2s_i (k_i-1)}
\end{aligned}
}

--The weights are not products, but sums. -Does not consider weights between $ jk $ --Lopez-The opposite of Fernandez. Useful when you want to consider the connection strength of $ ij, ik $ --The weight is standardized by $ \ max (s_i) $

Serrano

\displaystyle{
\begin{aligned}
C_{i}^S &:= \frac{\sum_{j, k \in \Pi(i), j\neq k} w_{ij}a_{jk}w_{ki}}{s_i^2 (1-Y_i)}\\
Y_i &:= \sum_{j\in\Pi(i)} \left(\frac{w_{ij}}{s_i}\right)^2
\end{aligned}
}

-Does not consider weights between $ jk $ --Barrat and Nori are the same --The weight is standardized by $ \ max (s_i) $

Application to VTuber network

As a motivation, I would like to find out whether the core distributor can be found with these indicators, and whether these indicators work in the network defined last time.

Last defined network

The edges between each channel are weighted by a common percentage of the set of commented viewers.

\displaystyle{
\begin{aligned}
w_{ij} &:= \frac{|U_i \cap U_j|}{U_i \cup U_j|}\\
U_i &:A set of users who commented on channel i
\end{aligned}
}

Network features

--Almost fully connected (users almost never suffer) ――The strength of the bond is basically strong inside the same office and weak outside the office. --However, except for those who have a strong relationship with another office, such as Tamaki Inuyama and Shigure Ui. ――In a large office, the fluctuation of weight is small in the same office

Data used, conditions, etc. --Data: Comments obtained from YouTube archive broadcast --Period: 2020/1/1 ~ 2020/6/30

The whole network looks like this (I added a few channels from the last time). The line thickness corresponds to the high percentage of viewers in common.

Calculation of cluster coefficient

Here, the calculation is performed for the network of Nijisanji only and the network of hololive only. The reason for not calculating the cluster coefficient that mixes multiple offices is that the number of nodes in the cluster is imbalanced. For nodes in a large cluster, a calculation result with a large cluster coefficient can be obtained. I don't know (currently I) how to calculate the appropriate cluster coefficient for a network between clusters with an imbalanced number of nodes, so I will narrow down the calculation once.

For each network, calculate the cluster coefficient and list the top 5, bottom 5, and percentiles. Also, the code used will be posted at the end of the article.

Adaptation to Nijisanji

The graph of Nijisanji looks like this. The number of nodes and the number of edges are both large, and the graph is not clear. .. .. Nijisanji Japan.png

Zhang

name	c_coeff	kind
Gilzaren III Season 2	0.3171	Zhang
Azuchi peach	0.3174	Zhang
Rine- Rine Yaguruma -	0.3246	Zhang
Naruse Naru	0.3265	Zhang
Amemori Saya	0.3284	Zhang

name	c_coeff	kind
Ibrahim [Nijisanji]	0.4308	Zhang
Himawari Honma- Himawari Honma -	0.4316	Zhang
Kanae Channel	0.4317	Zhang
Ars Almar-ars almal-[Nijisanji]	0.4338	Zhang
Kuzuha Channel	0.4424	Zhang

count	mean	std	min	10%	50%	90%	max
98	0.3907	0.02992	0.3171	0.3458	0.3945	0.4229	0.4424

Lopez-Fernandez

name	c_coeff	kind
Gweru male girl/Gwelu Os Gar [Nijisanji]	0.1184	Lopez_Fernandez
Rion Takamiya	0.1184	Lopez_Fernandez
Watch at night/yorumi rena [Nijisanji affiliation]	0.1185	Lopez_Fernandez
Debidebi Debi	0.1185	Lopez_Fernandez
Akina Saegusa/ Saegusa Akina	0.1186	Lopez_Fernandez

name	c_coeff	kind
Amemori Saya	0.1207	Lopez_Fernandez
Rine- Rine Yaguruma -	0.1208	Lopez_Fernandez
Naruse Naru	0.1208	Lopez_Fernandez
Azuchi peach	0.1209	Lopez_Fernandez
Gilzaren III Season 2	0.1211	Lopez_Fernandez

count	mean	std	min	10%	50%	90%	max
98	0.1193	0.000656	0.1184	0.1186	0.1191	0.1203	0.1211

Onnela

name	c_coeff	kind
Gilzaren III Season 2	0.1460	Onnela
Azuchi peach	0.1708	Onnela
Naruse Naru	0.1752	Onnela
Rine- Rine Yaguruma -	0.1798	Onnela
Amemori Saya	0.1865	Onnela

name	c_coeff	kind
Ryushen channel	0.3967	Onnela
Debidebi Debi	0.3989	Onnela
Watch at night/yorumi rena [Nijisanji affiliation]	0.3998	Onnela
Rion Takamiya	0.4060	Onnela
Gweru male girl/Gwelu Os Gar [Nijisanji]	0.4093	Onnela

count	mean	std	min	10%	50%	90%	max
98	0.3295	0.06136	0.1460	0.2361	0.3478	0.3917	0.4093

Barrat

name	c_coeff	kind
Kou Uzuki	1.0000	Barrat
Mahiro Yukishiro/Yukishiro Mahiro [Nijisanji affiliation]	1.0000	Barrat
Haruka Onomachi ♨ Onomachi Haruka Nijisanji	1.0000	Barrat
Rine- Rine Yaguruma -	1.0000	Barrat
Fren E. Lustario	1.0000	Barrat

name	c_coeff	kind
Naruse Naru	1.000	Barrat
Ellie Conifer/Eli Conifer [Nijisanji]	1.000	Barrat
Rion Takamiya	1.000	Barrat
Watch at night/yorumi rena [Nijisanji affiliation]	1.000	Barrat
Aiba Uiha 〖Aiba Uiha〗 Nijisanji affiliation	1.000	Barrat

count	mean	std	min	10%	50%	90%	max
98	1.0000	0.000000	1.0000	1.0000	1.0000	1.000	1.000

Serrano

name	c_coeff	kind
Hanasaki Morinaka	1.0000	Serrano
Aiba Uiha 〖Aiba Uiha〗 Nijisanji affiliation	1.0000	Serrano
Amamiya Kokoro/Kokoro Amamiya [Nijisanji affiliation]	1.0000	Serrano
Haru Kaida/Kaida Haru [Nijisanji]	1.0000	Serrano
Gilzaren III Season 2	1.0000	Serrano

name	c_coeff	kind
Kou Uzuki	1.000	Serrano
Yoko Akabane	1.000	Serrano
Keisuke Maimoto	1.000	Serrano
Quarter moon Fujishiro/Genzuki Tojiro [Nijisanji]	1.000	Serrano
Kana Sukoya [Nijisanji] Kana Sukoya	1.000	Serrano

count	mean	std	min	10%	50%	90%	max
98	1.0000	0.000000	1.0000	1.0000	1.000	1.000	1.000

Adaptation to hololive

The hololive network is below. It is easy to see because it has fewer nodes than Nijisanji. .. .. The cluster coefficients of Barrat and Serrano are omitted here because all cluster coefficients are 1 for the same reason. The reason will be described later.

Hololive Japan.png

Zhang

name	c_coeff	kind
Mel Channel Night sky Mel channel	0.6304	Zhang
hololive hololive- VTuber Group	0.6325	Zhang
SoraCh.Tokino Sora Channel	0.6331	Zhang
Akiroze Ch. Vtuber/Hololive affiliation	0.6366	Zhang
Choco Ch.Choco Heitsuki	0.6373	Zhang

name	c_coeff	kind
Fubuki Ch. Shirakami Fubuki	0.6696	Zhang
Aqua Ch.Minato Aqua	0.6708	Zhang
Coco Ch.Kiryu Coco	0.6719	Zhang
Pekora Ch.Usada Pekora	0.6748	Zhang
Korone Ch.Inugami Korone	0.6758	Zhang

count	mean	std	min	10%	50%	90%	max
28	0.6521	0.01336	0.6304	0.6355	0.6516	0.6711	0.6758

Lopez-Fernandez

name	c_coeff	kind
Shion Ch.Shisaki Zion	0.2024	Lopez_Fernandez
Watame Ch.For square winding	0.2025	Lopez_Fernandez
Kanata Ch.Amane Kanata	0.2025	Lopez_Fernandez
Mio Channel Ogami Mio	0.2028	Lopez_Fernandez
Flare Ch.Shiranui flare	0.2031	Lopez_Fernandez

name	c_coeff	kind
Choco Ch.Choco Heitsuki	0.2069	Lopez_Fernandez
hololive hololive- VTuber Group	0.2071	Lopez_Fernandez
Nakiri Ayame Ch.Hyakuki Ayame	0.2092	Lopez_Fernandez
SoraCh.Tokino Sora Channel	0.2108	Lopez_Fernandez
Mel Channel Night sky Mel channel	0.2133	Lopez_Fernandez

count	mean	std	min	10%	50%	90%	max
28	0.2052	0.002504	0.2024	0.2027	0.2049	0.2077	0.2133

Onnela

name	c_coeff	kind
Mel Channel Night sky Mel channel	0.3797	Onnela
SoraCh.Tokino Sora Channel	0.4607	Onnela
Nakiri Ayame Ch.Hyakuki Ayame	0.5097	Onnela
hololive hololive- VTuber Group	0.5653	Onnela
Choco Ch.Choco Heitsuki	0.5694	Onnela

name	c_coeff	kind
Flare Ch.Shiranui flare	0.6669	Onnela
Mio Channel Ogami Mio	0.6743	Onnela
Kanata Ch.Amane Kanata	0.6800	Onnela
Watame Ch.For square winding	0.6804	Onnela
Shion Ch.Shisaki Zion	0.6832	Onnela

count	mean	std	min	10%	50%	90%	max
28	0.6097	0.06734	0.3797	0.5486	0.6185	0.6760	0.6832

Impressions

--Lower (higher in Lopez-Fernandez) broadcasts less frequently, has not been broadcast for a long time, and has many channels with few live broadcasts in the first place When the broadcast period is free, the viewer suffers less. obvious.

――The upper (lower) differs greatly depending on the method For Lopez_Fernandez, it depends only on the weight $ w_ {jk} $ of the nodes $ j, k $ adjacent to the node $ i $. Therefore, in the case of a fully connected network, the one with the smaller weight of $ w_ {ij}, w_ {ik} $ comes to the top of the cluster coefficient. Looking at hololive, for example, Mr. Yozora Mel was ranked high because he had a long pause, and the official channel was ranked high because the number of live broadcasts was small in the first place. Is it considered that the other high-ranking channels (Choco-san, Tokino Sora-san, etc.) have weak cluster formation with other nodes?

--Lopez_Fernandez and Onnela have similar top (and bottom) and bottom (and top) The main difference between Zhang and Onnela is whether the third-order term of the weight is 1/3 power. When multiplied by 1/3, the fluctuation of weight becomes smaller. Calculations show that the first-order contributions of fluctuations are the same, but the second-order contributions are kept small (in Zhang ratio) by the contributions from the 1 / 3rd power.

--Barrat and Serrano cluster coefficients are all 1 In the case of a fully connected network, all cluster coefficients are 1 by definition. obvious.

――What kind of cluster coefficient is good? Since this network is a weighted fully connected network, Lopez_Fernandez will calculate a large cluster coefficient even if a node is isolated (in the sense that the weight is light). So is Lopez_Fernandez suitable? There is a big difference between Zhang and Onnela with or without 1/3 power. Since the weight is leveled by the presence of 1/3, it is only necessary to select how sensitive the fluctuation of the edge weight should be. This time, it is a fully connected network and the fluctuation of weight is small, so I want you to be sensitive to the small fluctuation. So Zhang's model looks good?

――Which is cluster-centric after all? With Zhang down, the top 5 channels with cluster coefficients in Nijisanji and hololive are: As for hololive, it's intuitive, but I'm still ignorant of Nijisanji, so I'm not sure if this result is intuitive. .. .. Please let me know. .. ..

name	c_coeff	kind
Ibrahim [Nijisanji]	0.4308	Zhang
Himawari Honma- Himawari Honma -	0.4316	Zhang
Kanae Channel	0.4317	Zhang
Ars Almar-ars almal-[Nijisanji]	0.4338	Zhang
Kuzuha Channel	0.4424	Zhang

name	c_coeff	kind
Fubuki Ch. Shirakami Fubuki	0.6696	Zhang
Aqua Ch.Minato Aqua	0.6708	Zhang
Coco Ch.Kiryu Coco	0.6719	Zhang
Pekora Ch.Usada Pekora	0.6748	Zhang
Korone Ch.Inugami Korone	0.6758	Zhang

(Fucking) code

like this

class clustering_coefficient:
    def __init__(self, df):
        self.df = df.copy()
        self.names = list(self.df.columns)
        for c in self.names: self.df.loc[c, c] = 0
        self.ki = {name: (self.df.loc[self.df.index[self.df.index != name], :][name] > 0).sum() for name in self.names}
        self.si = {name: self.df.loc[self.df.index[self.df.index != name], :][name].sum() for name in self.names}
        self.max_w = self.df.values.ravel().max()

    def Zhang(self):
        return {name: sum([(self.df.loc[n1, n2] * self.df.loc[n1, name] * self.df.loc[name, n2]) \
            for n1, n2 in itertools.permutations(self.names, 2)])\
            / sum([(self.df.loc[n1, name] * self.df.loc[name, n2]) \
            for n1, n2 in itertools.permutations(self.names, 2)]) / self.max_w\
            for name in self.names}

    def Lopez_Fernandez(self):
        return {name: sum([(self.df.loc[n1, n2] * (self.df.loc[n1, name] > 0) * (self.df.loc[name, n2] > 0)) \
            for n1, n2 in itertools.permutations(self.names, 2)]) / (self.ki[name]* (self.ki[name] - 1)) for name in self.names}

    def Onnela(self):
        return {name: sum([(self.df.loc[n1, n2] * self.df.loc[n1, name] * self.df.loc[name, n2]) ** (1/3.)\
                for n1, n2 in itertools.permutations(self.names, 2)]) \
                / (self.ki[name]* (self.ki[name] - 1) * self.max_w) for name in self.names}

    def Barrat(self):
        return {name: sum([(self.df.loc[n1, name] + self.df.loc[name, n2]) * (self.df.loc[name, n1] > 0) * \
                (self.df.loc[name, n2] > 0) * (self.df.loc[n1, n2] > 0)for n1, n2 in itertools.permutations(self.names, 2)]) \
         / ((self.ki[name] - 1)* self.si[name] * 2) for name in self.names}

    def Serrano(self):
        return {name: sum([((self.df.loc[n1, n2] > 0) * self.df.loc[n1, name] * self.df.loc[name, n2]) \
                for n1, n2 in itertools.permutations(self.names, 2)]) \
                / (self.si[name] ** 2 * (1 - ((self.df[name] / self.si[name]) ** 2).sum()))\
                for name in self.names}

[PYTHON] Evaluation of cluster coefficient of VTuber channel

Samari

Cluster coefficient

What is the cluster coefficient?

For undirected graphs without weights

Definition

Concrete example

For weighted undirected graphs

Definition

Application to VTuber network

Last defined network

Calculation of cluster coefficient

Adaptation to Nijisanji

Adaptation to hololive

Impressions

(Fucking) code