Community Structure and Detection

Ken Webb 2010-02-19T17:09:10Z

Most software developers are familiar with the concepts of cohesion and coupling. A module with high cohesion bundles similar things together. Modules with low coupling have relatively few connections between them. It has long been considered a best practice to write software that has high cohesion and low coupling. According to the wikipedia article on cohesion (Cohesion (computer science)), "The software quality metrics of coupling and cohesion were invented by Larry Constantine." (W. Stevens, G. Myers, L. Constantine, "Structured Design", IBM Systems Journal, 13 (2), 115-139, 1974.).

In their book (Yourdon, E. and Constantine, L. (1979) Structured Design - Fundamentals of a Discipline of Computer Program and Systems Design.), Yourdon and Constantine devote an entire chapter to cohesion and coupling. (Cohesion, p.105-141). "the cohesion of each module in isolation -- how tightly bound or related its internal elements are to one another." (p.106) "cohesion and coupling are interrelated. The greater the cohesion of individual modules in the system, the lower the coupling between modules will be." "We prefer the term "cohesion", as this is the accepted term for the identical concept in sociology, another discipline in which cohesion -- in this case, the cohesion of groups -- is important. Cohesion is used in a variety of engineering and other scientific disciplines as well, and it almost always has the same connotation as our use of it in this book."

In recent years, researchers in networks and graphs have explored the concepts of community structure and community detection.

"In the study of complex networks, a network is said to have community structure if it divides naturally into groups of nodes with dense connections within groups and sparser connections between groups." (Community structure. (2010, February 4). In Wikipedia, The Free Encyclopedia. Retrieved 13:41, February 19, 2010, from http://en.wikipedia.org/w/index.php?title=Community_structure&oldid=341878269

In an extensive review of the literature, Fortunato describes community structure as "the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters." (Forunato, S. (2010) Community detection in graphs. Available from http://arxiv.org/abs/0906.0612)

Fortunato further states (p.3) that "Another important aspect related to community structure is the hierarchical organization displayed by most networked systems in the real world. Real networks are usually composed by communities including smaller communities, which in turn include smaller communities, etc.".

An increasing number of algorithms are available for community detection.

Xholon applications are structured in a way that makes it straight-forward to apply these algorithms. For example, any Xholon application can be written out as a .nwb file that can be analyzed using the blondel community detection algorithm (http://lanl.arxiv.org/abs/0803.0476v2) included in the Network Workbench (http://nwb.slis.indiana.edu/) software. This is a useful way of exploring the hierarchy in a system. The hierarchy built into, or lacking in, a Xholon application may be different from the hierarchy found by this and other algorithms. Automatic community detection can be a good way to find modular and hierarchical structure in a Xholon application, that may lead to increased cohesion and reduced coupling.

Cohesion and coupling are usually applied top-down by software developers. They are part of the intentional design process. Community detection is bottom-up. It looks for the actual structure that exists, whether intentionally designed-in or not.

Genetic programming (GP), which is also available in Xholon through the ECJ library, offers another bottom-up approach to deriving modules and hierarchy. In GP evolved structures are called automatically defined functions (ADF). See for example "Evolving Modular and Hierarchical Structures" at http://cswww.essex.ac.uk/staff/poli/gp-field-guide/61EvolvingModularandHierarchicalStructures.html .

Example

This is an example of using the blondel community detection algorithm in Network Workbench. The following .nwb file can be automatically generated by Xholon from the Cell application while it's running. This output file does not include the hierarchy information that is part of the Xholon application.

# To view this file, download open-source Network Workbench from http://nwb.slis.indiana.edu
#
# Automatically generated by Xholon version 0.8.1, using Xholon2Nwb.java
# Fri Feb 12 13:00:20 EST 2010 1265997620210
# model: Cell - BioSystems paper
# www.primordion.com/Xholon
*Nodes
id*int	label*string
#0 "ExtraCellularSpace"
1 "ExtraCellularSolution"
2 "Glucose"
3 "EukaryoticCell"
4 "CellMembrane"
5 "CellBilayer"
6 "Cytoplasm"
7 "Cytosol"
8 "Glucose"
9 "Glucose_6_Phosphate"
10 "Fructose_6_Phosphate"
11 "Fructose_1x6_Biphosphate"
12 "DihydroxyacetonePhosphate"
13 "Glyceraldehyde_3_Phosphate"
14 "X1x3_BisphosphoGlycerate"
15 "X3_PhosphoGlycerate"
16 "X2_PhosphoGlycerate"
17 "PhosphoEnolPyruvate"
18 "Pyruvate"
19 "Hexokinase"
20 "PhosphoGlucoIsomerase"
21 "PhosphoFructokinase"
22 "Aldolase"
23 "TriosePhosphateIsomerase"
24 "Glyceraldehyde_3_phosphateDehydrogenase"
25 "PhosphoGlycerokinase"
26 "PhosphoGlyceromutase"
27 "Enolase"
28 "PyruvateKinase"
*DirectedEdges
source*int	target*int
5 2
5 8
19 8
19 9
20 9
20 10
21 10
21 11
22 11
22 13
22 12
23 12
23 13
24 13
24 14
25 14
25 15
26 15
26 16
27 16
27 17
28 17
28 18

The Network Workbench implementation of the blondel community detection algorithm automatically transforms this into the following structure that includes two additional columns of information.

# To view this file, download open-source Network Workbench from http://nwb.slis.indiana.edu
#
# Automatically generated by Xholon version 0.8.1, using Xholon2Nwb.java
# Fri Feb 12 13:00:20 EST 2010 1265997620210
# model: Cell - BioSystems paper
# www.primordion.com/Xholon
*Nodes
id*int	label*string	blondel_community_level_0*string	blondel_community_level_1*string	blondel_community_level_2*string
#0 "ExtraCellularSpace"
1	"ExtraCellularSolution"	"isolate_0"	"isolate_1"	"isolate_2"
2	"Glucose"	"community_0"	"community_0"	"community_0"
3	"EukaryoticCell"	"isolate_3"	"isolate_4"	"isolate_5"
4	"CellMembrane"	"isolate_6"	"isolate_7"	"isolate_8"
5	"CellBilayer"	"community_0"	"community_0"	"community_0"
6	"Cytoplasm"	"isolate_9"	"isolate_10"	"isolate_11"
7	"Cytosol"	"isolate_12"	"isolate_13"	"isolate_14"
8	"Glucose"	"community_0"	"community_0"	"community_0"
9	"Glucose_6_Phosphate"	"community_1"	"community_0"	"community_0"
10	"Fructose_6_Phosphate"	"community_2"	"community_0"	"community_0"
11	"Fructose_1x6_Biphosphate"	"community_3"	"community_1"	"community_1"
12	"DihydroxyacetonePhosphate"	"community_4"	"community_1"	"community_1"
13	"Glyceraldehyde_3_Phosphate"	"community_5"	"community_1"	"community_1"
14	"X1x3_BisphosphoGlycerate"	"community_6"	"community_2"	"community_2"
15	"X3_PhosphoGlycerate"	"community_7"	"community_2"	"community_2"
16	"X2_PhosphoGlycerate"	"community_8"	"community_3"	"community_3"
17	"PhosphoEnolPyruvate"	"community_9"	"community_3"	"community_3"
18	"Pyruvate"	"community_10"	"community_3"	"community_3"
19	"Hexokinase"	"community_1"	"community_0"	"community_0"
20	"PhosphoGlucoIsomerase"	"community_2"	"community_0"	"community_0"
21	"PhosphoFructokinase"	"community_3"	"community_1"	"community_1"
22	"Aldolase"	"community_5"	"community_1"	"community_1"
23	"TriosePhosphateIsomerase"	"community_4"	"community_1"	"community_1"
24	"Glyceraldehyde_3_phosphateDehydrogenase"	"community_6"	"community_2"	"community_2"
25	"PhosphoGlycerokinase"	"community_7"	"community_2"	"community_2"
26	"PhosphoGlyceromutase"	"community_8"	"community_3"	"community_3"
27	"Enolase"	"community_9"	"community_3"	"community_3"
28	"PyruvateKinase"	"community_10"	"community_3"	"community_3"
*DirectedEdges
source*int	target*int
5	2
5	8
19	8
19	9
20	9
20	10
21	10
21	11
22	11
22	13
22	12
23	12
23	13
24	13
24	14
25	14
25	15
26	15
26	16
27	16
27	17
28	17
28	18

This file can then be visualized using the Network Workbench implementation of Circular Hierarchy (for detailed information on this type of visualization, see the work of Danny Holten at http://www.win.tue.nl/~dholten/extravis/index.htm). The resulting diagram (see below) shows the named nodes, the edges between the nodes, and the levels of hierarchy as colored arcs in the containing circles.

Xholon can also write out a .nwb file that includes the intended hierarchy, expressed using the same blondel community detection columns but with different details.

# To view this file, download open-source Network Workbench from http://nwb.slis.indiana.edu
#
# Automatically generated by Xholon version 0.8.1, using Xholon2Nwb.java
# Fri Feb 19 10:33:32 EST 2010 1266593612915
# model: Cell - BioSystems paper
# www.primordion.com/Xholon
*Nodes
id*int	label*string	blondel_community_level_0*string	blondel_community_level_1*string	blondel_community_level_2*string
2147483647 "ExtraCellularSpace" "community_2147483647" "community_2147483647" "community_2147483647"
1 "ExtraCellularSolution" "community_2147483647" "community_2147483647" "community_2147483647"
2 "Glucose" "community_1" "community_2147483647" "community_2147483647"
3 "EukaryoticCell" "community_2147483647" "community_2147483647" "community_2147483647"
4 "CellMembrane" "community_3" "community_2147483647" "community_2147483647"
5 "CellBilayer" "community_4" "community_3" "community_2147483647"
6 "Cytoplasm" "community_3" "community_2147483647" "community_2147483647"
7 "Cytosol" "community_6" "community_3" "community_2147483647"
8 "Glucose" "community_7" "community_6" "community_3"
9 "Glucose_6_Phosphate" "community_7" "community_6" "community_3"
10 "Fructose_6_Phosphate" "community_7" "community_6" "community_3"
11 "Fructose_1x6_Biphosphate" "community_7" "community_6" "community_3"
12 "DihydroxyacetonePhosphate" "community_7" "community_6" "community_3"
13 "Glyceraldehyde_3_Phosphate" "community_7" "community_6" "community_3"
14 "X1x3_BisphosphoGlycerate" "community_7" "community_6" "community_3"
15 "X3_PhosphoGlycerate" "community_7" "community_6" "community_3"
16 "X2_PhosphoGlycerate" "community_7" "community_6" "community_3"
17 "PhosphoEnolPyruvate" "community_7" "community_6" "community_3"
18 "Pyruvate" "community_7" "community_6" "community_3"
19 "Hexokinase" "community_6" "community_3" "community_2147483647"
20 "PhosphoGlucoIsomerase" "community_6" "community_3" "community_2147483647"
21 "PhosphoFructokinase" "community_6" "community_3" "community_2147483647"
22 "Aldolase" "community_6" "community_3" "community_2147483647"
23 "TriosePhosphateIsomerase" "community_6" "community_3" "community_2147483647"
24 "Glyceraldehyde_3_phosphateDehydrogenase" "community_6" "community_3" "community_2147483647"
25 "PhosphoGlycerokinase" "community_6" "community_3" "community_2147483647"
26 "PhosphoGlyceromutase" "community_6" "community_3" "community_2147483647"
27 "Enolase" "community_6" "community_3" "community_2147483647"
28 "PyruvateKinase" "community_6" "community_3" "community_2147483647"
*DirectedEdges
source*int	target*int
5 2
5 8
19 8
19 9
20 9
20 10
21 10
21 11
22 11
22 13
22 12
23 12
23 13
24 13
24 14
25 14
25 15
26 15
26 16
27 16
27 17
28 17
28 18

This file can also be visualized in the Network Workbench.

It can be instructive to compare these two results. (TODO - write more about this)

return to main page