--A brief summary of cluster coefficients, which is one of the network analysis methods --I tried to apply the cluster coefficient to the network where the edge was drawn by the common ratio of the viewers created last time. --Result --Calculate the cluster coefficient of Nijisanji and hololive by Zhang's method --The top 5 channels with cluster coefficients in each office are as follows.
name | c_coeff | kind |
---|---|---|
Ibrahim [Nijisanji] | 0.4308 | Zhang |
Himawari Honma- Himawari Honma - | 0.4316 | Zhang |
Kanae Channel | 0.4317 | Zhang |
Ars Almar-ars almal-[Nijisanji] | 0.4338 | Zhang |
Kuzuha Channel | 0.4424 | Zhang |
name | c_coeff | kind |
---|---|---|
Fubuki Ch. Shirakami Fubuki | 0.6696 | Zhang |
Aqua Ch.Minato Aqua | 0.6708 | Zhang |
Coco Ch.Kiryu Coco | 0.6719 | Zhang |
Pekora Ch.Usada Pekora | 0.6748 | Zhang |
Korone Ch.Inugami Korone | 0.6758 | Zhang |
How cluster-centric are the nodes in the network in the network? Is one of the quantitative expressions of. As you can see from the definition, if it is a cluster, there is a desire that the surrounding area be a creek. .. .. The definition of the cluster coefficient differs depending on whether the weight of the edge of the graph is taken into consideration or not. In addition, all the definitions introduced below are definitions in undirected graphs.
The cluster factor $ C_i $ for node $ i $ is defined below. However, if the number of adjacent nodes is 1 or less, it is set to 0.
\displaystyle{
\begin{aligned}
C_i &:= \frac{\sum_{j, k \in \Pi(i), j\neq k} a_{ij}a_{jk}a_{ki}}{k_i (k_i - 1)}\\
k_i &:= \sum_{j\in \Pi(i)} a_{ij}\\
a_{ij} &:Adjacency matrix\ A\of\ (i, j)\component\\
\Pi(i) &:node\ i\A set of nodes adjacent to
\end{aligned}
}
Since we are dealing with an undirected graph without weights, the components of the adjacency matrix have only $ 1 $ or $ 0 $ as values.
As for feelings, the more cluster-centric --The self and the two nodes connected to it are more closely connected --In many cases, you and the two nodes connected to you become a 3-creek (in other words).
Abbreviation
It will come out soon if you go around
There seem to be various definitions of cluster coefficients in the case of weighted undirected graphs. Here are some of them. A weighted adjacency matrix is represented by $ W
Note that if the weight matrix component is 0 or 1 (that is, an unweighted adjacency matrix), they match the cluster coefficients with no weights (see the reference link because the calculation is crappy).
reference
Zhang
\displaystyle{
\begin{aligned}
C_{i}^Z := \frac{\sum_{j, k \in \Pi(i), j\neq k} w_{ij}w_{jk}w_{ki}}{\left((\sum_{j\in\Pi(i)} w_{ij})^2 - \sum_{j\in\Pi(i)} w_{ij}^2\right) \max(w_{jk})}
\end{aligned}
}
--A simple extension of the cluster coefficient, where the numerator simply replaces $ a $ in the cluster coefficient with $ w $ and has a similar format if the denominator is also rearranged. -Useful when you want to consider all the weights of $ ij, jk, ki $ --The weight matrix $ w $ is normalized by $ \ max (w_ {jk}) $
Lopez-Fernandez
\displaystyle{
\begin{aligned}
C_{i}^L := \frac{\sum_{j, k \in \Pi(i), j\neq k} w_{jk}}{k_i (k_i-1)}
\end{aligned}
}
--Useful when you want to focus only on the weight between adjacent nodes $ j, k $ -I don't care about the weights of $ ij and ik $ --The weight matrix is not standardized
Onnela
\displaystyle{
\begin{aligned}
C_{i}^O := \frac{\sum_{j, k \in \Pi(i), j\neq k} (w_{ij}w_{jk}w_{ki})^{1/3}}{k_i (k_i-1) \max(w_{jk})}
\end{aligned}
}
-Useful when you want to consider all the weights of $ ij, jk, ki $ ――Because it is 1/3 powered, the effect of individual weights is weaker than that of Zhang. --The weight matrix is normalized by $ \ max (w_ {jk}) $ --This is implemented by the clustering method of NetworkX, one of the python libraries.
Barrat
\displaystyle{
\begin{aligned}
C_{i}^B := \frac{\sum_{j, k \in \Pi(i), j\neq k} (w_{ij} + w_{ki})a_{jk}}{2s_i (k_i-1)}
\end{aligned}
}
--The weights are not products, but sums. -Does not consider weights between $ jk $ --Lopez-The opposite of Fernandez. Useful when you want to consider the connection strength of $ ij, ik $ --The weight is standardized by $ \ max (s_i) $
Serrano
\displaystyle{
\begin{aligned}
C_{i}^S &:= \frac{\sum_{j, k \in \Pi(i), j\neq k} w_{ij}a_{jk}w_{ki}}{s_i^2 (1-Y_i)}\\
Y_i &:= \sum_{j\in\Pi(i)} \left(\frac{w_{ij}}{s_i}\right)^2
\end{aligned}
}
-Does not consider weights between $ jk $ --Barrat and Nori are the same --The weight is standardized by $ \ max (s_i) $
As a motivation, I would like to find out whether the core distributor can be found with these indicators, and whether these indicators work in the network defined last time.
The edges between each channel are weighted by a common percentage of the set of commented viewers.
\displaystyle{
\begin{aligned}
w_{ij} &:= \frac{|U_i \cap U_j|}{U_i \cup U_j|}\\
U_i &:A set of users who commented on channel i
\end{aligned}
}
Network features
--Almost fully connected (users almost never suffer) ――The strength of the bond is basically strong inside the same office and weak outside the office. --However, except for those who have a strong relationship with another office, such as Tamaki Inuyama and Shigure Ui. ――In a large office, the fluctuation of weight is small in the same office
Data used, conditions, etc. --Data: Comments obtained from YouTube archive broadcast --Period: 2020/1/1 ~ 2020/6/30
The whole network looks like this (I added a few channels from the last time). The line thickness corresponds to the high percentage of viewers in common.
Here, the calculation is performed for the network of Nijisanji only and the network of hololive only. The reason for not calculating the cluster coefficient that mixes multiple offices is that the number of nodes in the cluster is imbalanced. For nodes in a large cluster, a calculation result with a large cluster coefficient can be obtained. I don't know (currently I) how to calculate the appropriate cluster coefficient for a network between clusters with an imbalanced number of nodes, so I will narrow down the calculation once.
For each network, calculate the cluster coefficient and list the top 5, bottom 5, and percentiles. Also, the code used will be posted at the end of the article.
The graph of Nijisanji looks like this. The number of nodes and the number of edges are both large, and the graph is not clear. .. ..
Zhang
name | c_coeff | kind |
---|---|---|
Gilzaren III Season 2 | 0.3171 | Zhang |
Azuchi peach | 0.3174 | Zhang |
Rine- Rine Yaguruma - | 0.3246 | Zhang |
Naruse Naru | 0.3265 | Zhang |
Amemori Saya | 0.3284 | Zhang |
name | c_coeff | kind |
---|---|---|
Ibrahim [Nijisanji] | 0.4308 | Zhang |
Himawari Honma- Himawari Honma - | 0.4316 | Zhang |
Kanae Channel | 0.4317 | Zhang |
Ars Almar-ars almal-[Nijisanji] | 0.4338 | Zhang |
Kuzuha Channel | 0.4424 | Zhang |
count | mean | std | min | 10% | 50% | 90% | max |
---|---|---|---|---|---|---|---|
98 | 0.3907 | 0.02992 | 0.3171 | 0.3458 | 0.3945 | 0.4229 | 0.4424 |
Lopez-Fernandez
name | c_coeff | kind |
---|---|---|
Gweru male girl/Gwelu Os Gar [Nijisanji] | 0.1184 | Lopez_Fernandez |
Rion Takamiya | 0.1184 | Lopez_Fernandez |
Watch at night/yorumi rena [Nijisanji affiliation] | 0.1185 | Lopez_Fernandez |
Debidebi Debi | 0.1185 | Lopez_Fernandez |
Akina Saegusa/ Saegusa Akina | 0.1186 | Lopez_Fernandez |
name | c_coeff | kind |
---|---|---|
Amemori Saya | 0.1207 | Lopez_Fernandez |
Rine- Rine Yaguruma - | 0.1208 | Lopez_Fernandez |
Naruse Naru | 0.1208 | Lopez_Fernandez |
Azuchi peach | 0.1209 | Lopez_Fernandez |
Gilzaren III Season 2 | 0.1211 | Lopez_Fernandez |
count | mean | std | min | 10% | 50% | 90% | max |
---|---|---|---|---|---|---|---|
98 | 0.1193 | 0.000656 | 0.1184 | 0.1186 | 0.1191 | 0.1203 | 0.1211 |
Onnela
name | c_coeff | kind |
---|---|---|
Gilzaren III Season 2 | 0.1460 | Onnela |
Azuchi peach | 0.1708 | Onnela |
Naruse Naru | 0.1752 | Onnela |
Rine- Rine Yaguruma - | 0.1798 | Onnela |
Amemori Saya | 0.1865 | Onnela |
name | c_coeff | kind |
---|---|---|
Ryushen channel | 0.3967 | Onnela |
Debidebi Debi | 0.3989 | Onnela |
Watch at night/yorumi rena [Nijisanji affiliation] | 0.3998 | Onnela |
Rion Takamiya | 0.4060 | Onnela |
Gweru male girl/Gwelu Os Gar [Nijisanji] | 0.4093 | Onnela |
count | mean | std | min | 10% | 50% | 90% | max |
---|---|---|---|---|---|---|---|
98 | 0.3295 | 0.06136 | 0.1460 | 0.2361 | 0.3478 | 0.3917 | 0.4093 |
Barrat
name | c_coeff | kind |
---|---|---|
Kou Uzuki | 1.0000 | Barrat |
Mahiro Yukishiro/Yukishiro Mahiro [Nijisanji affiliation] | 1.0000 | Barrat |
Haruka Onomachi ♨ Onomachi Haruka Nijisanji | 1.0000 | Barrat |
Rine- Rine Yaguruma - | 1.0000 | Barrat |
Fren E. Lustario | 1.0000 | Barrat |
name | c_coeff | kind |
---|---|---|
Naruse Naru | 1.000 | Barrat |
Ellie Conifer/Eli Conifer [Nijisanji] | 1.000 | Barrat |
Rion Takamiya | 1.000 | Barrat |
Watch at night/yorumi rena [Nijisanji affiliation] | 1.000 | Barrat |
Aiba Uiha 〖Aiba Uiha〗 Nijisanji affiliation | 1.000 | Barrat |
count | mean | std | min | 10% | 50% | 90% | max |
---|---|---|---|---|---|---|---|
98 | 1.0000 | 0.000000 | 1.0000 | 1.0000 | 1.0000 | 1.000 | 1.000 |
Serrano
name | c_coeff | kind |
---|---|---|
Hanasaki Morinaka | 1.0000 | Serrano |
Aiba Uiha 〖Aiba Uiha〗 Nijisanji affiliation | 1.0000 | Serrano |
Amamiya Kokoro/Kokoro Amamiya [Nijisanji affiliation] | 1.0000 | Serrano |
Haru Kaida/Kaida Haru [Nijisanji] | 1.0000 | Serrano |
Gilzaren III Season 2 | 1.0000 | Serrano |
name | c_coeff | kind |
---|---|---|
Kou Uzuki | 1.000 | Serrano |
Yoko Akabane | 1.000 | Serrano |
Keisuke Maimoto | 1.000 | Serrano |
Quarter moon Fujishiro/Genzuki Tojiro [Nijisanji] | 1.000 | Serrano |
Kana Sukoya [Nijisanji] Kana Sukoya | 1.000 | Serrano |
count | mean | std | min | 10% | 50% | 90% | max |
---|---|---|---|---|---|---|---|
98 | 1.0000 | 0.000000 | 1.0000 | 1.0000 | 1.000 | 1.000 | 1.000 |
The hololive network is below. It is easy to see because it has fewer nodes than Nijisanji. .. .. The cluster coefficients of Barrat and Serrano are omitted here because all cluster coefficients are 1 for the same reason. The reason will be described later.
Zhang
name | c_coeff | kind |
---|---|---|
Mel Channel Night sky Mel channel | 0.6304 | Zhang |
hololive hololive- VTuber Group | 0.6325 | Zhang |
SoraCh.Tokino Sora Channel | 0.6331 | Zhang |
Akiroze Ch. Vtuber/Hololive affiliation | 0.6366 | Zhang |
Choco Ch.Choco Heitsuki | 0.6373 | Zhang |
name | c_coeff | kind |
---|---|---|
Fubuki Ch. Shirakami Fubuki | 0.6696 | Zhang |
Aqua Ch.Minato Aqua | 0.6708 | Zhang |
Coco Ch.Kiryu Coco | 0.6719 | Zhang |
Pekora Ch.Usada Pekora | 0.6748 | Zhang |
Korone Ch.Inugami Korone | 0.6758 | Zhang |
count | mean | std | min | 10% | 50% | 90% | max |
---|---|---|---|---|---|---|---|
28 | 0.6521 | 0.01336 | 0.6304 | 0.6355 | 0.6516 | 0.6711 | 0.6758 |
Lopez-Fernandez
name | c_coeff | kind |
---|---|---|
Shion Ch.Shisaki Zion | 0.2024 | Lopez_Fernandez |
Watame Ch.For square winding | 0.2025 | Lopez_Fernandez |
Kanata Ch.Amane Kanata | 0.2025 | Lopez_Fernandez |
Mio Channel Ogami Mio | 0.2028 | Lopez_Fernandez |
Flare Ch.Shiranui flare | 0.2031 | Lopez_Fernandez |
name | c_coeff | kind |
---|---|---|
Choco Ch.Choco Heitsuki | 0.2069 | Lopez_Fernandez |
hololive hololive- VTuber Group | 0.2071 | Lopez_Fernandez |
Nakiri Ayame Ch.Hyakuki Ayame | 0.2092 | Lopez_Fernandez |
SoraCh.Tokino Sora Channel | 0.2108 | Lopez_Fernandez |
Mel Channel Night sky Mel channel | 0.2133 | Lopez_Fernandez |
count | mean | std | min | 10% | 50% | 90% | max |
---|---|---|---|---|---|---|---|
28 | 0.2052 | 0.002504 | 0.2024 | 0.2027 | 0.2049 | 0.2077 | 0.2133 |
Onnela
name | c_coeff | kind |
---|---|---|
Mel Channel Night sky Mel channel | 0.3797 | Onnela |
SoraCh.Tokino Sora Channel | 0.4607 | Onnela |
Nakiri Ayame Ch.Hyakuki Ayame | 0.5097 | Onnela |
hololive hololive- VTuber Group | 0.5653 | Onnela |
Choco Ch.Choco Heitsuki | 0.5694 | Onnela |
name | c_coeff | kind |
---|---|---|
Flare Ch.Shiranui flare | 0.6669 | Onnela |
Mio Channel Ogami Mio | 0.6743 | Onnela |
Kanata Ch.Amane Kanata | 0.6800 | Onnela |
Watame Ch.For square winding | 0.6804 | Onnela |
Shion Ch.Shisaki Zion | 0.6832 | Onnela |
count | mean | std | min | 10% | 50% | 90% | max |
---|---|---|---|---|---|---|---|
28 | 0.6097 | 0.06734 | 0.3797 | 0.5486 | 0.6185 | 0.6760 | 0.6832 |
--Lower (higher in Lopez-Fernandez) broadcasts less frequently, has not been broadcast for a long time, and has many channels with few live broadcasts in the first place When the broadcast period is free, the viewer suffers less. obvious.
――The upper (lower) differs greatly depending on the method For Lopez_Fernandez, it depends only on the weight $ w_ {jk} $ of the nodes $ j, k $ adjacent to the node $ i $. Therefore, in the case of a fully connected network, the one with the smaller weight of $ w_ {ij}, w_ {ik} $ comes to the top of the cluster coefficient. Looking at hololive, for example, Mr. Yozora Mel was ranked high because he had a long pause, and the official channel was ranked high because the number of live broadcasts was small in the first place. Is it considered that the other high-ranking channels (Choco-san, Tokino Sora-san, etc.) have weak cluster formation with other nodes?
--Lopez_Fernandez and Onnela have similar top (and bottom) and bottom (and top) The main difference between Zhang and Onnela is whether the third-order term of the weight is 1/3 power. When multiplied by 1/3, the fluctuation of weight becomes smaller. Calculations show that the first-order contributions of fluctuations are the same, but the second-order contributions are kept small (in Zhang ratio) by the contributions from the 1 / 3rd power.
--Barrat and Serrano cluster coefficients are all 1 In the case of a fully connected network, all cluster coefficients are 1 by definition. obvious.
――What kind of cluster coefficient is good? Since this network is a weighted fully connected network, Lopez_Fernandez will calculate a large cluster coefficient even if a node is isolated (in the sense that the weight is light). So is Lopez_Fernandez suitable? There is a big difference between Zhang and Onnela with or without 1/3 power. Since the weight is leveled by the presence of 1/3, it is only necessary to select how sensitive the fluctuation of the edge weight should be. This time, it is a fully connected network and the fluctuation of weight is small, so I want you to be sensitive to the small fluctuation. So Zhang's model looks good?
――Which is cluster-centric after all? With Zhang down, the top 5 channels with cluster coefficients in Nijisanji and hololive are: As for hololive, it's intuitive, but I'm still ignorant of Nijisanji, so I'm not sure if this result is intuitive. .. .. Please let me know. .. ..
name | c_coeff | kind |
---|---|---|
Ibrahim [Nijisanji] | 0.4308 | Zhang |
Himawari Honma- Himawari Honma - | 0.4316 | Zhang |
Kanae Channel | 0.4317 | Zhang |
Ars Almar-ars almal-[Nijisanji] | 0.4338 | Zhang |
Kuzuha Channel | 0.4424 | Zhang |
name | c_coeff | kind |
---|---|---|
Fubuki Ch. Shirakami Fubuki | 0.6696 | Zhang |
Aqua Ch.Minato Aqua | 0.6708 | Zhang |
Coco Ch.Kiryu Coco | 0.6719 | Zhang |
Pekora Ch.Usada Pekora | 0.6748 | Zhang |
Korone Ch.Inugami Korone | 0.6758 | Zhang |
like this
class clustering_coefficient:
def __init__(self, df):
self.df = df.copy()
self.names = list(self.df.columns)
for c in self.names: self.df.loc[c, c] = 0
self.ki = {name: (self.df.loc[self.df.index[self.df.index != name], :][name] > 0).sum() for name in self.names}
self.si = {name: self.df.loc[self.df.index[self.df.index != name], :][name].sum() for name in self.names}
self.max_w = self.df.values.ravel().max()
def Zhang(self):
return {name: sum([(self.df.loc[n1, n2] * self.df.loc[n1, name] * self.df.loc[name, n2]) \
for n1, n2 in itertools.permutations(self.names, 2)])\
/ sum([(self.df.loc[n1, name] * self.df.loc[name, n2]) \
for n1, n2 in itertools.permutations(self.names, 2)]) / self.max_w\
for name in self.names}
def Lopez_Fernandez(self):
return {name: sum([(self.df.loc[n1, n2] * (self.df.loc[n1, name] > 0) * (self.df.loc[name, n2] > 0)) \
for n1, n2 in itertools.permutations(self.names, 2)]) / (self.ki[name]* (self.ki[name] - 1)) for name in self.names}
def Onnela(self):
return {name: sum([(self.df.loc[n1, n2] * self.df.loc[n1, name] * self.df.loc[name, n2]) ** (1/3.)\
for n1, n2 in itertools.permutations(self.names, 2)]) \
/ (self.ki[name]* (self.ki[name] - 1) * self.max_w) for name in self.names}
def Barrat(self):
return {name: sum([(self.df.loc[n1, name] + self.df.loc[name, n2]) * (self.df.loc[name, n1] > 0) * \
(self.df.loc[name, n2] > 0) * (self.df.loc[n1, n2] > 0)for n1, n2 in itertools.permutations(self.names, 2)]) \
/ ((self.ki[name] - 1)* self.si[name] * 2) for name in self.names}
def Serrano(self):
return {name: sum([((self.df.loc[n1, n2] > 0) * self.df.loc[n1, name] * self.df.loc[name, n2]) \
for n1, n2 in itertools.permutations(self.names, 2)]) \
/ (self.si[name] ** 2 * (1 - ((self.df[name] / self.si[name]) ** 2).sum()))\
for name in self.names}