An introduction to community detection problem with a view from matrix factorization.
Network data is presented everywhere nowadays, such as network of friends on social media, network of bees, networks of proteins. While these networks are very large and complex, their underlying structure could be seen as a set of a small groups call communities. Discovering these communities from a complex networks brings out essential characteristics of the networks, which is an important task for data analysis and exploration. Community detection problems aims at finding out membership of all nodes in a given network.
In this article, we cover how this problem is modeled using matrices, and dealt with using matrix factorization techniques.
Model and Problem Statement
We assume that there are
A simple scenario is that a node can only belong to a single group. This happens for example in classification problem, where each item has a single class label.
But different from classification problem, we only observe interactions among nodes. Particularly, nodes within a community have similar chances of being connected to each other in compared to a connection among nodes of different communities. These connections are expressed as the edges of the network, and can be directional, un-directional, unweighted, or weighted, depending on particular model and dataset.
Consider a simple case of un-directional, unweighted edges.
Then an edge between two nodes
- Membership of the two nodes belong to the same groups.
- The inherent interaction among groups.
One of the well-known models named Mixed-Membership Stochastic Blockmodels [1] assumes
The matrix
On the other hand, a off-diagonal dominant matrix
A node can belong to a single community, or to simultaneously multiple communities. Elements in the membership vector for each node indicates how strongly the node belongs to certain groups. Specifically,
We are assuming that a network of
Problem statement:
Given the binary matrix
Some Hints
There are at least two challenges in recovering
Fortunately, a key finding in [2] reveals that knowing subspace
in Matlab
and show the result in the following figure.
As you can see, as number of nodes increases, the angle between
where
Then the task boils down to estimating
The so-called separability condition has been proposed and widely used to guarantee that problem [x] has unique solution. In the context of community detection, it assumes the existence of a ‘pure node’ for each community. A pure node is a node that only belongs to a single community.
With the assumption of separability on
Algorithm
Given observation
Having
Result
We generated a network of
References
- [1] Airoldi, Edo M., et al. “Mixed membership stochastic blockmodels.” Advances in neural information processing systems 21 (2008).
- [2] Panov, Maxim, Konstantin Slavnov, and Roman Ushakov. “Consistent estimation of mixed memberships with successive projections.” Complex Networks & Their Applications VI: Proceedings of Complex Networks 2017 (The Sixth International Conference on Complex Networks and Their Applications). Springer International Publishing, 2018.
- [3] Nguyen, Tri, Xiao Fu, and Ruiyuan Wu. “Memory-efficient convex optimization for self-dictionary separable nonnegative matrix factorization: A Frank–Wolfe approach.” IEEE Transactions on Signal Processing 70 (2022): 3221-3236.
- [4] Gillis, Nicolas, and Robert Luce. “A fast gradient method for nonnegative sparse regression with self-dictionary.” IEEE Transactions on Image Processing 27.1 (2017): 24-37.