Replication evaluation, details and sources
Intensive Metrics for the Study of the Evolution of Open Source Projects
Case studies from Apache Software Foundation projects
by Santiago Gala-Pérez (1), Gregorio Robles (2), Jesús M. González-Barahona(2) and Israel Herraiz (3)
Submitted to the MSR 2013
(1) Apache Foundation; (2) Universidad Rey Juan Carlos (Madrid, Spain); (3) Universidad Politécnica de Madrid (Madrid; Spain) (Victoria, Canada)
Based on the criteria proposed in On the reproducibility of empirical software engineering studies based on data retrieved from development repositories (Open Access - Empirical Software Engineering, Volume 17, Numbers 1-2, 75-89), the attributes of this study are given in following table:
Details
Data Source
-
Identification:
-
Description:
-
Availability: Public.
-
Persistence: Yes.
-
Identification: get_data_from_apache.sh
-
Description: Shell script to be run the mboxes replicated from people.apache.org.
-
Availability: Public
-
Persistence: Yes.
-
Flexibility: Yes. The script is released under the Apache v2.0 License.
Raw Dataset
-
Identification: mails.csv (133 Kb)
-
Description: Data with the traffic on a monthly basis per project in CSV (comma-sepparated values) format.
-
Availability: Public.
-
Persistence: Yes.
-
Flexibility: Yes.
Extraction Methodology
-
Identification: break_dataset.py
-
Description: Beads mails.csv and separates it into
mails-<project>.csv counts
-
Availability: Public
-
Persistence: Yes.
-
Flexibility: Yes. The script is released under the Apache v2.0 License.
Study Parameters
-
Identification: Date of data retrieval.
-
Description: Date when the repositories were retrieved: January 2013.
Processed Dataset
-
Identification: None.
-
Description: None.
-
Availability: No.
-
Persistence: No.
-
Flexibility: No.
Analysis Methodology
-
Identification: Scripts
-
Description:
- parse_commits.R: reads and saves in R
- adapt_zim_paths.sh: ready for publication in LaTeX
-
Availability: Public
-
Persistence: Yes.
-
Flexibility: Yes. All scripts have been released under the Apache v2.0 License.
Results Dataset
-
Identification: None.
-
Description: None.
-
Availability: No.
-
Persistence: No.
-
Flexibility: No.
Comments and suggestions: Gregorio Robles < grex at gsyc.urjc.es >.
Last modified: Feb 21st 2013.