Abstract:
Theproblemofcommunitydetectioninsocialmediahasbeenwidelystudiedinthe social networking community in the context of the structure of the underlying graphs. Mostcommunitydetectionalgorithmsusethelinksbetweenthenodesinordertodeterminethedenseregionsinthegraph. Thesedenseregionsarethecommunitiesofsocial mediainthegraph. Suchmethodsaretypicallybasedpurelyonthelinkagestructureof the underlying social media network. Community detection algorithms are fundamentaltoolsthatallowustouncoverorganizationalprinciplesinnetworks. Whendetecting communities, there are two possible sources of information one can use: the network structure, and the features and attributes of nodes. Even though communities form around nodes that have common edges and common attributes, typically, algorithms have only focused on one of these two data modalities: community detection algorithms traditionally focus only on the network structure, while clustering algorithms mostly consider only node attributes. Inthispaper,weexplorearangeofnetworkcommunitydetectionmethodsinorder tocomparethemandtounderstandtheirrelativeperformanceandthesystematicbiases in the clusters they identify. We evaluate several common objective functions that are usedtoformalizethenotionofanetworkcommunity,andweexamineseveraldifferent classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best community of any size, we consider a size-resolved version of the optimization problem. Consideringcommunityqualityasafunctionofitssizeprovidesamuchfiner lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have non-obvious size-dependent behavior. And we propose a new algorithm Fast Network Clustering Algorithm (FaNClust) for better performance.