Subject Projects
Our experiments were run on the following open-source C/C++ projects:
Domain | LOC | Commits | Authors | Revision | |
---|---|---|---|---|---|
bison | Parser | 591687 | 26281 | 253 | 849ba01 |
brotli | Compression | 34833 | 1030 | 87 | ce222e3 |
curl | Web | 195685 | 26949 | 871 | 1803be5 |
grep | UNIX utils | 619598 | 23794 | 262 | 7051705 |
gzip | Compression | 622480 | 22193 | 242 | 7d3a3c0 |
htop | Visualization | 25775 | 2243 | 156 | 44d1200 |
libpng | Image codec | 74571 | 4098 | 58 | a37d483 |
libssh | Library | 95235 | 5126 | 115 | cd15043 |
libtiff | Image codec | 88561 | 3470 | 45 | 1373f8d |
lrzip | Compression | 19215 | 935 | 25 | 465afe8 |
lz4 | Compression | 18813 | 2541 | 130 | bdc9d3b |
opus | Audio codec | 70267 | 4077 | 107 | 7b05f44 |
xz | Compression | 38441 | 1298 | 22 | e7da44d |
The metrics for bison, grep, and gzip include submodules.
Central Code
These plots relate the number of insertions of each commit to its node degree in the commit interaction graph. The margins show kernel density estimations for the variables. The horizontal and vertical lines mark the 80- or 20-percentile of their variable, meaning that small commits that modify central code fall into the upper-left quadrant. The red crosses highlight commit 5e4b182 of htop and commit 348e694 of opus that discussed as examples in section 5 in the paper.
You can click on an image to view a larger version.
Author Interactions
These plots relate the number of commits of each author to the number of other authors they interact with. The margins show kernel density estimations for the variables. Most projects have a small number of authors that are responsible for most commits. However, there are also authors with only few commits that still interact with many other authors.
The following plots visualize changes between CI-based and file-based author interactions. Each column represents an individual author. Orange boxes depict links to other authors that are produced by the file-based approach but are not inferred by CI, as there exists no data-flow indicating a connection between the two authors. Blue boxes depict additional links that CI discovers through data-flow information that a file-based approach does not.
Here, we can see changes between CI-based and call-graph-based author interactions. Again, each column represents an individual author. Orange boxes depict links to other authors that are produced by the call-graph-based approach, but not by the CI based approach, as there is no data flow indicating a connection between the two authors. Blue boxes depict additional links that CI discovers through data-flow information that a call-graph-based approach does not.
Commit–Author Interactions
The following plot shows for each commit its number of interacting authors normalized by the number of distinct authors per project that have, at least, one commit participating in a commit interaction. That is, it visualizes how many other authors are affected by a commit.