Abstract:
|
The theory of complex networks has proven to be very important inthe study of the characteristics and structure of various complex systems. In thelast two decades, a large number of researches have been directed towards thedevelopment of methods for clustering in complex networks. In 2012, Center forDiscrete Mathematics and Theoretical Computer Science (DIMACS), which is awell-known consortium of prestigious academic institutions (Rutgers University,Princeton, Colombia) and research laboratories (Microsoft, IBM, AT & T, NEC),included the problem of clustering in complex networks on the list of the mostimportant problems and challenges in computer science. Clustering in complexnetworks can be applied in a variety of contexts to achieve different goals, andtherefore, there is no generally accepted definition of a cluster. For this reason,different approaches are used in developing clustering methods. An approach thathas attracted the most attention of researchers involves two subproblems: defininga function to determine the quality of a partition and constructing methods to finda partition that has the maximum value of the defined quality function. In thisapproach, the problem of clustering is formulated as the problem of combinatorialoptimization and various methods of mathematical optimization can be used tosolve it. One of the most commonly used quality function is the modularity.Clustering by modularity maximization, i.e., finding a partition with the max-imum value of modularity, is NP-hard problem. Thus, only heuristic algorithmsare suitable of processing large datasets. In this dissertation, a novel method formodularity maximization based on the variable neighborhood search heuristic isproposed. For the purpose of efficient application in large-scale complex networks,a procedure for decomposition of the problem into smaller subproblems is devel-oped. In addition, a mechanism for overcoming local maxima of modularity isimproved using criteria for occasional acceptance of solution which is worse thanthe current one. DIMACS instances are used to test the proposed method, and theobtained results are compared with the best ones presented in the literature, ob-tained by two methods developed in DIMACS implementation challenge in 2012.In addition, the obtained results are compared with the results of six methodsdeveloped after 2012, from the literature. A comparative analysis of the resultsshows that the proposed method outperforms the existing methods for modularitymaximization and improves the best known solutions on 9 out of 13 hard instances.
Clustering by modularity maximization is not suitable for detecting small clus-ters in large networks. For this reason, a new function for measuring the qualityof a partition has been proposed in the dissertation. Through three theorems, itis shown that the new measure, called E-function, has the potential to identifyclusters in the network and overcome limits of modularity. For testing the pro-posed E-function and comparison with the modularity function, a generic variableneighborhood method is developed to optimize the considered quality function.Computational experiments are performed on generated and real instances fromthe literature for which the correct division into clusters is known. The resultsshow that the expected clusters can be identified, both on artificial and real in-stances, by maximizing the E-function. |