On the Fulfillment of Coordination Requirements in Open-Source Software Projects: An Exploratory Study

Abstract

Coordination among developers is crucial in large-scale open-source software projects, where developers are often distributed across the entire planet. By assessing the alignment of collaboration and communication in such software projects in terms of coordination requirements, we can estimate whether a state of socio-technical congruence is achieved, which is associated with software quality and project success. By means of an empirical study on a substantial set of large-scale open-source software projects (including OpenSSL, Git, and LLVM)—altogether making up over 180 years of development history—we aim at shedding light at this issue. Compared to the state of the art in this research area, we do not only identify coordination requirements arising from files and functions only, but also those arising from features. This way, we take a more semantic view on this phenomenon. We found that open-source developers fulfill coordination requirements on purpose, but mostly those coordination requirements arising from coupled source-code artifacts, while they resolve simpler ones independently. Furthermore, we found that neither of the considered abstraction levels of source-code artifacts (files, functions, features) is more suitable as constructional argument for coordination requirements with respect to their fulfillment. This finding strongly indicates that features do not play an as important role in the development process as expected and commonly believed by the research community in the area of feature-oriented and feature-driven development. Finally, we identified notable evolutionary trends in the fulfillment of coordination requirements and showed that far-reaching social events have a huge impact on their fulfillment, both negatively and positively. The key findings of our empirical study are that socio-technical relations are important to understand open-source development communities and that the incorporation of different abstraction levels for developer collaboration does yield important insights to further improve the evolution in open-source software projects.

Keywords: coordination requirements socio-technical congruence social-network analysis coronet Codeface open-source software systems configurable systems software product lines feature-oriented software development

Research Questions and Hypotheses

Research Questions

RQ₁: Does developer communication align with artifact-based coordination requirements in real-world OSS projects such that the coordination requirements are fulfilled?
RQ₂: Does developer communication align better with feature-based coordination requirements than with function-based or file-based coordination requirements?
RQ₃: Does the degree of fulfillment of coordination requirements change for different artifact types during project evolution??

Hypotheses

RH₁: A high number of developer pairs collaborating on the same artifacts do exchange e-mails on the same threads of the mailing list, such that coordination requirements arising from any type of artifact are fulfilled not only by chance.
RH₂: The fraction of fulfilled coordination requirements is lower for the square motif than for the triangle motif, independent of the observed artifact abstraction.
RH₃: The fraction of fulfilled coordination requirements differs for the different artifacts, and is significantly highest for abstraction level of features.
RH₄: In later stages of development, the fraction of fulfilled coordination requirements is higher than in earlier stages for all motifs and artifacts.

Network Approach

Coordination-Requirement Networks and Motifs

To analyze the fulfillment of coordination requirements in a software project, we construct coordination-requirement networks, which we can analyze with network-analytic methods. We show an exemplary coordination-requirement network in Figure 1. Formally, such a network is defined as an undirected graph G = (D ∪ A, E), where we encode developers (, set D) and artifacts (, set A) as vertices; E is the set of edges among the vertices. We encode the following three relations in the edges:

Developer–artifact relation (): Developers work on code artifacts while committing to the project’s version control system. Such artifacts may be files, functions, or features (which may crosscut the file and function decomposition).
Artifact–artifact relation (): Artifacts can be related in various ways, giving raise to interdependencies. We consider co-changes to describe logical coupling among artifacts. The term “co-changes” refers to artifacts that are concurrently changed in a single commit in the project’s version control system.
Developer–developer relation (): We consider contributions of the developers to their project’s mailing list: In line with the seminal work by Bird et al., we assume that two developers coordinate their work iff they contribute to the same thread on the mailing list [8].

Figure 1. An exemplary coordination-requirement network. Circles () represent developers, dashed edges among developers () represent coordination effort. Squares () represent artifacts, dashed edges among artifacts () represent coupling among the connected artifacts. Developers are connected to artifacts (solid edges, ) if they worked on that artifact in a commit.

To automatically identify coordination requirements in coordination-requirement networks, we encode coordination requirements or, rather, the patterns they represent as network motifs. Network motifs are recurrent sub-graphs in a given network [75]. Motifs can be described formally as a set of vertices (e.g., {d₁, d₂, a₁}) with specific edges connecting them. We show two network motifs for coordination requirements in Figure 2, the triangle motif and the square motif.

Figure 2. Triangle and square motifs. Edges among artifacts () represent coupling, while a developer is connected to an artifact (), if they worked on that artifact. An edge among two developers () represents coordination, where the edge’s existence indicates the fulfillment of the encoded coordination requirement.

Abstraction Levels for Source-Code Artifacts

In previous work [14, 13], researchers have tracked concurrent contributions of developers on the same file to derive coordination requirements. We conjecture that this view may be too technical to capture the richness of coordination. Thus, we introduce the two code-artifact abstractions function and feature to infer coordination requirements at different levels of abstraction. Although these abstraction levels are based on heuristics, they have been shown to be reliable in multiple previous studies [51, 55, 43, 44]. Additionally, with regard to network constructions, the abstraction file has been shown to produce dense networks that are known to hinder community detection [10, 43] and, thus, represent more precise coordination relations among developers. It has been already shown that a function-level view is more accurate [44].

For illustration, we show how strongly the choice of abstraction level influences the extraction of coordination requirements by means of the triangle motif and its manifestation in the source-code excerpt listed in Figure 3 (show/hide). We show the resulting coordination-requirement networks in Figure 4.

diff --git a/actions.c b/actions.c
index d4ea8ff..ecb9f59 100644
--- a/actions.c
+++ b/actions.c
@@ -1,8 +1,11 @@ Changes by Dev C
void delete (struct DBconnection *DBconn,
          char *command) {
 //...
+ #ifdef PERSIST
+    persist();
+ #endif
}

 #ifdef FEATURE_LOCKING
void lockOnAction(struct DBaction *action) {

@@ -11,12 +11,12 @@ Changes by Dev C
-    // old code
+    // Dev C makes changes here

@@ -13,21 +13,21 @@ Changes by Dev B
-    if (data == NULL) {
+    if (data != NULL) {
      // ...
 }

 // more code

}
 #endif

diff --git a/db.c b/db.c
index 130f79c..d1c26b1 100644
--- a/db.c
+++ b/db.c
@@ -1,6 +1,6 @@ Changes by Dev A
 #ifdef PERSIST
-void persist() {
-    // old code
+void persist(char *filename) {
+    // completely rewritten code
}
 #endif

@@ -7,10 +7,10 @@ Changes by Dev B
void execute(struct DBConn *conn,
          struct DBAction *action) {
-    // old code
+    // code with bugfixes
}

Figure 3. Code example containing two files, four functions, and two features (controlled by #ifdef directives). Lines starting either with + or − indicate patch blocks applied individual developers.

Figure 4. Coordination-requirement networks (excluding coordination edges) extracted from the source code in Figure 3 (show/hide) for each of the abstraction levels file, function, and feature.

Statistics and Formulas

To analyze the alignment of the email-based developer coordination and the actual artifact-based collaboration, we measure the fraction of fulfilled coordination requirements. Given a coordination-requirement network constructed using one type of code artifact (i.e., file, function, or feature) and a motif m to identify coordination requirements, we define the fraction frac_cr(a, m) of fulfilled coordination requirements as follows:

frac_cr(a, m) = |cr_full(a, m)| / |cr_found(a, m)|, where

cr_found(a, m) = { c | matched instance c of motif m for artifact a in the current network } and
cr_full(a, m) = { c_f | c_f ∈ cr_found(a, m), c_f is fulfilled },

Tools

For data extraction, we mainly use the tool Codeface. Based on the Codeface results, we construct and analyze coordination-requirement networks using our network-construction library coronet and a set of self-written R scripts. Our script setup is available in the Downloads section.

Codeface

CCodeface logo

Codeface is a framework and interactive web frontend for the social and technical analysis of software development projects.

https://siemens.github.io/codeface/

coronet

coronet is a library to construct socio-technical developer networks based on various data sources in a configurable and reproducible way.

https://github.com/se-passau/coronet/

R scripts

R logo

We developed a set of R scripts on top of Codeface and coronet for our analysis, which are available in the Downloads section.

→ Downloads

Subject Systems

**Table 2.1.** List of subject projects.
Project	Time	# Commits	# E-Mails	# Developers
Apache HTTP	1996–2017	29704	54921	2146
BusyBox	1999–2016	14313	42013	2736
FFmpeg	2000–2017	80605	242295	5998
Git	2005–2017	34898	338500	9246
LLVM	2001–2017	158562	706716	6407
OpenSSL	1998–2016	18143	32659	4786
PostgreSQL	1996–2017	44062	320711	4647
QEMU	2003–2016	46633	430561	7205
U-Boot	1988–2017	44736	319160	7924
Wine	1993–2017	121815	111333	4087

Results

Overview on Subject Systems

**Table 2.2.** Empirical data on fulfilled coordination requirements for all subject projects.
Project	Artifact	# Artifacts	Triangle motif m_▵			Square motif m_□
Project	Artifact	# Artifacts	cr_found	cr_full	frac_cr	cr_found	cr_full	frac_cr
Apache HTTP	file	1366	5710	2818	0.49	316417	134993	0.43
	function	16869	2830	1612	0.57	173364	121281	0.70
	feature	1357	654	322	0.49	7224	4380	0.61
BusyBox	file	1370	2021	907	0.45	181236	69771	0.38
	function	10942	1661	756	0.46	164853	29122	0.18
	feature	2534	670	312	0.47	18384	3407	0.19
FFmpeg	file	3257	42978	18277	0.43	4030726	1886168	0.47
	function	33078	19303	8658	0.45	1154399	438927	0.38
	feature	2079	5742	2369	0.41	110	51972	0.47
Git	file	1740	13690	4798	0.35	240103	77443	0.32
	function	11937	7151	2473	0.35	128572	39928	0.31
	feature	175	185	119	0.64	209	177	0.85
LLVM	file	5619	119428	42219	0.35	9974743	3832917	0.38
	function	50201	17647	8457	0.48	7912	571735	0.57
	feature	937	8495	2432	0.29	26508	9420	0.36
OpenSSL	file	1444	5295	1922	0.36	712756	307113	0.43
	function	12941	3044	1195	0.39	183091	88851	0.49
	feature	1132	3445	1134	0.33	153236	68464	0.45
PostgreSQL	file	2192	21798	16671	0.76	5147123	4210	0.82
	function	34960	14237	10797	0.76	3193322	2551078	0.80
	feature	1061	1764	1339	0.76	19753	13158	0.67
QEMU	file	3227	36467	23096	0.63	3466452	2486040	0.72
	function	57955	15394	10365	0.67	1166730	810775	0.69
	feature	1753	13892	5433	0.39	156289	59658	0.38
U-Boot	file	8257	11096	5755	0.52	307816	185906	0.60
	function	63067	4664	2680	0.57	162257	415	0.62
	feature	7065	20711	6931	0.33	423328	147434	0.35
Wine	file	5568	45088	19	0.44	1463843	817654	0.56
	function	164073	23665	12506	0.53	1431211	818966	0.57
	feature	1687	6348	2328	0.37	32501	12884	0.40

**Table 2.3.** Commits per developer in subject projects (mean, standard deviation, .8 quantile, maximum).
Project	# Commits	# Commits per developer
Project	# Commits	Avg. ± Std. dev.	Median	.8 quantile	Max.
Apache HTTP	29671	237.37 ± 417.48	72.00	360.00	2452
BusyBox	14259	53.01 ± 433.51	1.00	5.00	6495
FFmpeg	80535	52.84 ± 560.22	2.00	9.00	19516
Git	34872	23.50 ± 159.14	2.00	10.00	3989
LLVM	158519	183.47 ± 1023.25	17.00	104.00	26580
OpenSSL	18077	62.33 ± 388.51	1.00	5.00	4535
PostgreSQL	44010	1047.86 ± 2821.8	166.50	778.60	13327
QEMU	46578	43.82 ± 188.6	3.00	18.00	2505
U-Boot	44680	27.48 ± 146.9	3.00	18.00	4154
Wine	121731	79.82 ± 532.83	2.00	17.20	14089

**Table 2.4.** Commits per artifact in subject projects (mean, standard deviation, .8 quantile, maximum).
Project	Artifact	# Commits per artifact
Project	Artifact	Avg. ± Std. dev.	Median	.8 quantile	Max.
Apache HTTP	file	16.37 ± 43.77	1.00	17.00	469
	function	3.08 ± 40.53	1.00	3.00	5227
	feature	8.18 ± 321.1	1.00	3.00	21200
BusyBox	file	20.81 ± 39.83	8.00	29.00	645
	function	4.96 ± 54.18	2.00	6.00	5620
	feature	6.58 ± 201.26	2.00	5.00	17700
FFmpeg	file	26.81 ± 69.74	9.00	33.00	1873
	function	4.82 ± 114.61	2.00	5.00	20796
	feature	24.07 ± 1323.09	2.00	5.00	97296
Git	file	18.24 ± 35.96	7.00	24.00	518
	function	4.59 ± 92.82	2.00	5.00	10127
	feature	85.7 ± 1796.61	1.00	3.00	39118
LLVM	file	31.2 ± 98.95	6.00	32.00	4182
	function	4.55 ± 323.23	2.00	4.00	72385
	feature	57.66 ± 3108.01	1.00	3.00	181534
OpenSSL	file	20.29 ± 32.48	11.00	28.00	392
	function	4.25 ± 40.11	2.00	5.00	4521
	feature	8.24 ± 188.34	2.00	4.00	13572
PostgreSQL	file	38.37 ± 73.76	13.00	48.00	862
	function	5.25 ± 76.96	2.00	6.00	14305
	feature	18.61 ± 740.11	2.00	5.00	43704
QEMU	file	22.26 ± 55.97	7.00	27.00	1401
	function	3.2 ± 79.15	2.00	4.00	19012
	feature	16.82 ± 843.12	1.00	4.00	66572
U-Boot	file	6.91 ± 11.13	4.00	9.00	238
	function	2.11 ± 62.79	1.00	2.00	15762
	feature	5.12 ± 257.7	1.00	3.00	41286
Wine	file	33.3 ± 68.9	12.00	45.00	1824
	function	3.65 ± 111.78	2.00	4.00	45243
	feature	41.16 ± 2579.56	1.00	4.00	192364

Hypothesis RH₁

After downloading and extracting the statistical results, the raw results for this hypothesis can be found in the folders stats/hypo1-collect/ (raw data, empirical and null model) and stats/hypo1/empirical/ (statistical tests). There are both input and output data available, alongside with all plots presented in this section.

Figure 5. Fraction of fulfilled coordination requirements frac_cr(a, m) per motif m and artifact a as violin plot

Figure 6. Fraction of fulfilled coordination requirements frac_cr(a, m) per motif m and artifact a as violin plot

**Table 3.** Paired Wilcoxon signed-rank test regarding Hypothesis RH₁, data paired by motif m and artifact a (H_0: frac_cr(a, m) ≤ frac_cr^null(a, m), N = 10 for all tests).
Paired Wilcoxon signed-rank test
H₀:	frac_cr(⏺, m_▵) ≤ frac_cr^null(⏺, m_▵)	frac_cr(⏺, m_□) ≤ frac_cr^null(⏺, m_□)
frac_cr(file, ⏺)	W ≈ 54, p < 0.01*, δ = 0.42	$W ≈ 48, p ≈ 0.02*, δ = 0.12
frac_cr(function, ⏺)	W ≈ 53, p < 0.01*, δ = 0.48	W ≈ 53, p < 0.01*, δ = 0.24
frac_cr(feature, ⏺)	W ≈ 54, p < 0.01*, δ = 0.3	W ≈ 51, p < 0.01*, δ = 0.26
W = test value W, p = p-value, * for p < 0.05, δ = Cliff's δ effect size

Hypothesis RH₁: Accepted. The comparison of the empirical data on the fulfillment of coordination requirements (for both types of motifs and across all artifacts) to the the respective values of the null model shows that the identified coordination requirements are indeed not fulfilled by chance.

Sensitivity Analysis following Kossinets (2006)

We performed a sensitivity analysis following Kossinets (2006) to investigate on the stability of our results. In detail, we used the simulation algorithm "BSPC" (boundary specification problem for contexts) to simulate the absence of coordination effort from the mailing list (which may occur on different platforms such as face-to-face meetings or chats instead) and, thus, incomplete information sources (i.e., mailing-list data) – similar to the null models (see Section 3.2.4). The algorithm removes a defined number of random e-mail threads before constructing analyzable coordination-requirement networks and calculates the metrics as previously defined. To this end, for the projects BusyBox, Git, LLVM, and OpenSSL, we randomly removed 10, 20, …, 90 percent of the e-mail threads, performed 25 iterations for better randomization, calculated mean values, and analyzed the final results. In short, we found for the selected projects that the removal of 10% of all e-mail threads produces a relative error of about 15% for frac_cr, across all revision ranges and for all motifs and source-code artifacts. With 20% of all e-mail threads being randomly removed, the metric exhibits a relative error of about 25%, on average. Results can be opened/displayed in the table below. These results indicate that the absence of crucial developers may have an immediate and extensive effect on most projects and emphasize that any coordination effort is important to fulfill coordination requirements.

Project	Triangle motif m_▵	Square motif m_□
BusyBox	#	#
Git	#	#
LLVM	#	#
OpenSSL	#	#

Hypothesis RH₂

After downloading and extracting the statistical results, the raw results for this hypothesis can be found in the folder stats/hypo2/empirical/. There are both input and output data available, alongside with all plots presented in this section.

Figure 7. Fraction of fulfilled coordination requirements frac_cr(a, m) per motif m and artifact a as violin plot

Figure 8. Fraction of fulfilled coordination requirements frac_cr(a, m) per motif m and artifact a as violin plot

**Table 4.** Results regarding Hypothesis RH₂: Paired Wilcoxon signed-rank test for frac_cr(a, m_▵ and frac_cr(a, m_□, paired by subject project and artifact a.
Paired Wilcoxon signed-rank test
H₀:	frac_cr(a, m_▵) ≤ frac_cr^null(a, m_▵)
	frac_cr(a, m_▵)
frac_cr(a, m_□)	N = 30, W = 325, p ≈ 0.9721167
N = number of pairs, W = test value W, p = p-value

Hypothesis RH₂: Rejected. The comparison of empirical data on the fulfillment of coordination requirements for the triangle motif m_▵ and the corresponding data for the square motif m_□ shows that the fulfillment of the identified coordination requirements is not higher for the triangle motif.

Hypothesis RH₃

After downloading and extracting the statistical results, the raw results for this hypothesis can be found in the folder stats/hypo3/empirical/. There are both input and output data available, alongside with all plots presented in this section. In particular, all data regarding unique coordination requirements per artifact abstraction level are available for all subject systems in this folder as well (file hypo3-setdiffs.txt).

Figure 9. Fraction of fulfilled coordination requirements frac_cr(a, m) per motif m and artifact a as violin plot

**Table 5.** Results regarding Hypothesis RH₃: Paired Wilcoxon signed-rank test for the metrics frac_cr(file, m), frac_cr(function, m), and frac_cr(feature, m), paired by subject project and motif m.
Paired Wilcoxon signed-rank test
H₀:	frac_cr(file, m) ≥ frac_cr(⏺, m)	frac_cr(function, m) ≥ frac_cr(⏺, m)
frac_cr(function, ⏺)	N = 10, W = 152, p ≈ 0.12
frac_cr(feature, ⏺)	N = 10, W = 64, p ≈ 0.98	N = 10, W = 52, p ≈ 0.98
N = number of pairs, W = test value W, p = p-value

**Table 6.** Number of unique coordination requirements per artifact abstraction level for Apache HTTP and the triangle motif m_▵, identified only for the abstraction level of the column (subcolumns cr_found), but not for the abstraction level of the row. The columns cr_full indicate how many of these unique coordination requirements are fulfilled.
not identified by …	File		Function		Feature
not identified by …	cr_found	cr_full	cr_found	cr_full	cr_found	cr_full
File	—	—	0	0	91	30
Function	1074	438	—	—	183	72
Feature	1718	753	736	357	—	—
Combined	982	396	0	0	91	30

Hypothesis RH₃: Rejected. The hypothesis that coordination requirements at the feature level are significantly more often fulfilled than for the other artifact abstractions is not supported by our data.

Hypothesis RH₄

After downloading and extracting the statistical results, the data of the statistical tests and fractal dimension D can be found in the folder stats/hypo4/empirical/. The history plots for all subject systems and motifs are available in the download section separately. After extracting the downloaded file, see folder history-plots/.

Apache HTTP

BusyBox

FFmpeg

Git

LLVM

OpenSSL

PostgreSQL

QEMU

U-Boot

Wine

Figure 10. Fraction of fulfilled coordination requirements frac_cr(a, m_▵) (triangle motif) for all subject projects (only revision ranges with sent e-mails shown)

Apache HTTP

BusyBox

FFmpeg

Git

LLVM

OpenSSL

PostgreSQL

QEMU

U-Boot

Wine

Figure 11. Fraction of fulfilled coordination requirements frac_cr(a, m_□) (square motif) for all subject projects (only revision ranges with sent e-mails shown)

**Table 7.** Fractal-dimension values D for all subject projects and motifs, sorted by D_{m_▵} and grouped by similar values.
The groups of values are derived in combination with the plots in Figure 10 and Figure 11.
Project	D_{m_▵}	D_{m_□}
QEMU	1.39	1.43
U-Boot	1.40	1.46
FFmpeg	1.49	1.55
LLVM	1.51	1.54
PostgreSQL	1.51	1.54
Wine	1.57	1.64
BusyBox	1.59	1.60
Git	1.59	1.58
Apache HTTP	1.65	1.71
OpenSSL	1.67	1.69

Hypothesis RH₄: Rejected. The hypothesis that the fraction of fulfilled coordination requirements frac_cr(a, m) improves over time cannot be shown across the complete set of subject projects. Instead, we found several different patterns in the organizational evolution indicating that there are very project-specific reasons leading to fulfilled and unfulfilled coordination requirements, such as a change of maintainers or even an attempted project-takeover.

Downloads

Note on raw-data availability

For reasons of data privacy and data size, we cannot distribute the raw data that we gathered for our subject systems but only processed data and results. Please refer to the tools Codeface and coronet to produce a set of data for yourself. You can find more information on the selected set of subject systems, the analyzed time ranges, and all needed further information in our subject-system list available above.

Downloadable assets:

Subject-system details: subject-systems-details.ods
Codeface configuration files: network-coordination-requirements_configurations.zip
Analysis scripts: network-coordination-requirements_scripts.zip
Statistical results: network-coordination-requirements_stats.zip
History plots (Hypothesis RH₄): network-coordination-requirements_plots.zip
Plots for sensitivity analysis (Hypothesis RH₁): network-coordination-requirements_plots-sensitivity.zip

Analysis Scripts

To reproduce the data for an individual subject project, the data needs to be processed by Codeface and codeface-extraction first – please use the configuration files provided above. Afterwards, the output data can be processed using our analysis scripts. The main script of our analysis is analysis.R and needs to be used in each and every stage of the analysis (show/hide command-line interface).

usage: analysis.R [-h] [-s SELECTION_PROCESS] [-w SPLIT_WINDOW]
                  [--sliding-window] [-d CODEFACE_DATA]
                  [--null-model NULL_MODEL] [--no-remove-core]
                  [--complete-rerun] [--bootstrap-packrat]
                  [--loglevel LOGLEVEL]
                  script casestudy artifact artifact_relation

positional arguments:
  script                The subscript to execute
  casestudy             Casestudy name as in Codeface-data folder name
  artifact              Artifact type to use in the analysis
  artifact_relation     Artifact-relation type to use in the analysis

optional arguments:
  -h, --help            show this help message and exit
  -s SELECTION_PROCESS, --selection-process SELECTION_PROCESS
                        The selection process for revision windows [default
                        "releases"]
  -w SPLIT_WINDOW, --split-window SPLIT_WINDOW
                        The time-window length used for data splitting (e.g.,
                        "3 months" and "2 weeks") [default "3 months"]
  --sliding-window      Do you want overlapping time windows (i.e., sliding-
                        window approach)? [default "False"]
  -d CODEFACE_DATA, --codeface-data CODEFACE_DATA
                        Path to Codeface data [default "/local/hunsen/projects
                        /codeface-data/"]
  --null-model NULL_MODEL
                        The specific null model to use
                        ('rewire'|'random'|'errg') [default "rewire"]
  --no-remove-core      Remove core authors from null model? [default "False"]
  --complete-rerun      Do a complete re-run of the analysis? [default
                        "False"]
  --bootstrap-packrat   Bootstrap packrat library?
  --loglevel LOGLEVEL   Log level

Figure 12. Command-line interface of analysis.R

Overall, there are five different stages in our analysis, which need to be run consecutively and which (mostly) run for one subject system at a time (for each, data output is cached appropriately):

data: run the empirical analysis by constructing coordination-requirement networks and searching for motifs
null: construct null-model networks and search for motifs
sensitivity: construct sensitivity model following Kossinets (2006) and search for motifs
stats: run all statistical tests and construct corresponding plots (independent of configured subject system)
history-plots: construct the history plots for Hypothesis RH₄

Based on the configuration files provided by us, the following parameters should be given to each analysis stage (see command-line interface above):

script: a stage as described above
casestudy: subject-system name as in Codeface configuration
artifact: the artifact abstraction level to use
artifact_relation: must to be set to "cochange" to reproduce the study
-d CODEFACE_DATA: set to path where Codeface output is placed
-w "3 months": use three-month revision ranges
-s "threemonth" (this is an abstraction level in Codeface to structure analyses)
--null-model="rewire": use the null model based on rewiring
-no-remove-core: Do not remove core developers in null-model analysis

As a consequence, an exemplary call to gather feature-based empirical data for Apache HTTP looks like this:

Rscript analysis.R data apache-http feature cochange -d "/path/to/codeface-data/" -s "threemonth" -w "3 months" --loglevel "DEBUG"

Contact

If you have any questions regarding this paper or any other related project, please do not hesitate to contact us:

Claus Hunsen (University of Passau, Passau, Germany)
Janet Siegmund (Chemnitz University of Technology, Chemnitz, Germany)
Sven Apel (Saarland University, Saarbrücken, Germany)