Skip to the content.

Subject Projects

Our experiments were run on the following open-source C/C++ projects:

  Domain LOC Commits Authors Revision
bison Parser 591687 26281 253 849ba01
brotli Compression 34833 1030 87 ce222e3
curl Web 195685 26949 871 1803be5
grep UNIX utils 619598 23794 262 7051705
gzip Compression 622480 22193 242 7d3a3c0
htop Visualization 25775 2243 156 44d1200
libpng Image codec 74571 4098 58 a37d483
libssh Library 95235 5126 115 cd15043
libtiff Image codec 88561 3470 45 1373f8d
lrzip Compression 19215 935 25 465afe8
lz4 Compression 18813 2541 130 bdc9d3b
opus Audio codec 70267 4077 107 7b05f44
xz Compression 38441 1298 22 e7da44d

The metrics for bison, grep, and gzip include submodules.


Central Code

These plots relate the number of insertions of each commit to its node degree in the commit interaction graph. The margins show kernel density estimations for the variables. The horizontal and vertical lines mark the 80- or 20-percentile of their variable, meaning that small commits that modify central code fall into the upper-left quadrant. The red crosses highlight commit 5e4b182 of htop and commit 348e694 of opus that discussed as examples in section 5 in the paper.

You can click on an image to view a larger version.

bisonbison
brotlibrotli
curlcurl
grepgrep
gzipgzip
htophtop
libpnglibpng
libsshlibssh
libtifflibtiff
lrziplrzip
lz4lz4
opusopus
xzxz


Author Interactions

These plots relate the number of commits of each author to the number of other authors they interact with. The margins show kernel density estimations for the variables. Most projects have a small number of authors that are responsible for most commits. However, there are also authors with only few commits that still interact with many other authors.

bisonbison
brotlibrotli
curlcurl
grepgrep
gzipgzip
htophtop
libpnglibpng
libsshlibssh
libtifflibtiff
lrziplrzip
lz4lz4
opusopus
xzxz


The following plots visualize changes between CI-based and file-based author interactions. Each column represents an individual author. Orange boxes depict links to other authors that are produced by the file-based approach but are not inferred by CI, as there exists no data-flow indicating a connection between the two authors. Blue boxes depict additional links that CI discovers through data-flow information that a file-based approach does not.

bisonbison
brotlibrotli
curlcurl
grepgrep
gzipgzip
htophtop
libpnglibpng
libsshlibssh
libtifflibtiff
lrziplrzip
lz4lz4
opusopus
xzxz


Here, we can see changes between CI-based and call-graph-based author interactions. Again, each column represents an individual author. Orange boxes depict links to other authors that are produced by the call-graph-based approach, but not by the CI based approach, as there is no data flow indicating a connection between the two authors. Blue boxes depict additional links that CI discovers through data-flow information that a call-graph-based approach does not.

bisonbison
brotlibrotli
curlcurl
grepgrep
gzipgzip
htophtop
libpnglibpng
libsshlibssh
libtifflibtiff
lrziplrzip
lz4lz4
opusopus
xzxz


Commit–Author Interactions

The following plot shows for each commit its number of interacting authors normalized by the number of distinct authors per project that have, at least, one commit participating in a commit interaction. That is, it visualizes how many other authors are affected by a commit.

commit--author interactions