Replication evaluation, details and sources
Modification and developer metrics at the function level:
Metrics for the study of the evolution of a software project
by Gregorio Robles (1), Israel Herraiz (2), Daniel M. Germán (3) and Daniel Izquierdo-Cortázar (1)
(1) Universidad Rey Juan Carlos (Madrid, Spain); (2) Universidad Politécnica de Madrid (Madrid; Spain); (3) University of Victoria (Victoria, Canada)
Based on the criteria proposed in On the reproducibility of empirical software engineering studies based on data retrieved from development repositories (Open Access - Empirical Software Engineering, Volume 17, Numbers 1-2, 75-89), the attributes of this study are given in following table:
Details
Data Source
-
Identification:
-
Description:
- GNOME repository: detailed description at http://git-scm.com. Note that at the time of the analysis, we used the GNOME Subversion repository (see Study Parameters). Previously migrated from CVS, January 2007, preserving history of commits
and committers.
- Apache repository: detailed description at http://subversion.apache.org/.
-
Availability: Public.
-
Persistence: Yes.
-
Identification: Carnarvon
-
Description:
- Carnarvon (http://carnarvon.tigris.org/) analyses how old the software system is on a per-line basis and extracts figures and indexes that make it possible to identify how "old" the software is, how much it has been maintained and how much effort it may suppose to maintain it in the future.
-
Availability: Public
-
Persistence: Yes.
-
Flexibility: Yes. Carnarvon 0.7.4c is released under the GPLv2 or later.
Raw Dataset
-
Identification: Databases
-
Description:
- Function evolution data for Apache 1.3
- Function evolution data for Evolution
- Carnarvon blame output for Apache 1.3
- Carnarvon blame output for Evolution
-
Availability: Public.
-
Persistence: Yes.
-
Flexibility: Yes.
Extraction Methodology
-
Identification: Carnarvon
-
Description:
- Carnarvon (http://carnarvon.tigris.org/) analyses how old the software system is on a per-line basis and extracts figures and indexes that make it possible to identify how "old" the software is, how much it has been maintained and how much effort it may suppose to maintain it in the future.
-
Availability: Public
-
Persistence: Yes.
-
Flexibility: Yes. Carnarvon 0.7.4c is released under the GPLv2 or later.
Study Parameters
-
Identification: Date of data retrieval.
-
Description: Date when the repositories were retrieved. Can be obtained from the database dumps using the MySQL MAX() function.
Processed Dataset
-
Identification: Databases
-
Description:
- The Carnarvon blame output for Apache 1.3 and Evolution has been enriched
with several additional tables with the help of the analysis Python scripts. In order to avoid duplication, only one database has been released and is available under the Raw Dataset.
-
Availability: Public. See Raw Dataset.
-
Persistence: Yes.
-
Flexibility: Yes. All of them are SQL files.
Analysis Methodology
-
Identification: Scripts
-
Description:
- Set of SQL, R and perl scripts to test hypotheses #1 and #2.
- Set of Python scripts used to query the database and analyse the data
to test hypotheses #3 and #4 of the paper.
-
Availability: Public
-
Persistence: Yes.
-
Flexibility: Yes. All Python scripts have been released under the GPLv3.
Results Dataset
-
Identification: Results tables.
-
Description: for Apache and Evolution, following files are available:
- permonth.txt: Statistics on a per month basis
- changesHist.txt: Statistics per number of changes
- freqAuthors.txt: Statistics per committer
- *FunctionsStats.txt: Statistics per function
- statsFunctions.txt: (more) Statistics per function
-
Availability: Public.
-
Persistence: Yes.
-
Flexibility: Yes. All of them are text files with structure.
Comments and suggestions: Gregorio Robles < grex at gsyc.urjc.es >.
Last modified: Feb 27th 2012.