=== === Working dataset for paper: Replicating MSR === Author: Gregorio Robles === Location: http://gsyc.es/~grex/msr2010 === (scripts available at this URL as well) === Date: January 18th 2010 === License: Creative Commons Attribution-ShareAlike 3.0 === MSR: 2004 Title: Preprocessing CVS Data for Fine-Grained Analysis Length: Short Type: Non-experimental === MSR: 2004 Title: The perils and pitfalls of mining SourceForge Length: Short Type: Empirical/Descriptive Public data: Yes. Public source: Sourceforge Working dataset: No. Tool/scripts: No. Analysis scripts available from the author on request (footnote) === MSR: 2004 Title: Research Infrastructure for Empirical Science of F/OSS Length: Short Type: Non-experimental === MSR: 2004 Title: Mining CVS repositories, the softChange experience Length: Short Type: Empirical/ToBol Public data: Yes. Evolution, Mozilla, GCC, PostgreSQL. CVS data. No dates specified. Public source: Evolution Public source: Mozilla Public source: GCC Public source: PostgreSQL Working dataset: Yes. The case study is really only on Evolution. Available at http://sourceforge.net/projects/sourcechange/files/ (footnote: "You can find a copy of the cvs log command for each of the reviewed projects in http://view.cs.uvic.ca/softChange/mining2004/)"... but URL gives 404!! Tool/scripts: Yes. http://sourceforge.net/projects/sourcechange/files/ === MSR: 2004 Title: Text is Software Too Length: Short Type: Non-experimental === MSR: 2004 Title: GlueTheos: Automating the Retrieval and Analysis of Data from Publicly Available Software Repositories Length: Short Type: Empirical/Tool Public data: No. Working dataset: No. Tool/scripts: Yes. GlueTheos. No URL of the tool available in the paper. Available at: http://tools.libresoft.es/gluetheos === MSR: 2004 Title: Using CVS Historical Information to Understand How Students Develop Software Length: Short Type: Empirical/Case Study Public data: No. This paper is really an application of CVS-mining tools on a specific environment (monitoring students). The data on the students' projects is not available. Working dataset: No. Tool/scripts: No. They refer to tools in the paper, but do not specify which of them has been used. === MSR: 2004 Title: Database techniques for the Analysis and Exploration of Software Repositories Length: Short Type: Empirical/Mining + Case study Public data: Yes. The project that is studied is the Apache web server project, especifically the mailing lists. No interval dates of the study are provided, although some results are given. Public source: Apache web server Working dataset: No. Database is not available. Not even database schema. Tool/scripts: No. Tool presented is not linked (it seems its name is Minero from Figure 2!). After googliung, I have found "http://www.db.cs.ucdavis.edu/minero" which seemed to exist, but now gives a 404! === MSR: 2004 Title: Empirical Project Monitor: A Tool for Mining Multiple Project Data Length: Short Type: Empirical/Tool Public data: No. Figure 6: scatterplot with 100 ramdomly SF projects selected from the most active projects (no date given). Public source: Sourceforge Working dataset: No. Tool/scripts: No. Empirical Project Monitor (EPM, part of Empirical Software Engineering Environment, ESEE). Figure 5: comparison between two projects (SPARS and EASE). There is a reference for SPARS in the paper with URL; the URL gives a 404. Googling after "spars software product archiving" we get http://sel.ist.osaka-u.ac.jp/SPARS/index.html.en, also 404! EASE has another reference with URL, but a 404 as well. "EASE osaka" throws in Google a paper "EASE: Evolutional Authoring Support Environment" with other research fellows from the Technical University of Eindhoven. Another search with "EASE software tool eindhoven technical university" does not provide any satisfactory result.. If you google for the tool "Empirical Project Monitor" you get to a project page (http://www.empirical.jp/research/j_download.html) in Japanese. The download link provides a 404. There seem to have been some time in the past a tool, as a workshop even took place: "EPM Training Course" http://www.empirical.jp/research/index.html === MSR: 2004 Title: Mining Version Control Systems for FACs (Frequently Applied Changes) Length: Short Type: Empirical/Mining Technique + Case Study Public data: Yes. Tomcat. They say after three years of Tomcat development. Public source: Tomcat Working dataset: No. Tool/scripts: Partially. Uses, among others, CC-Finder in the process. No reference to the rest of scripts used. === MSR: 2004 Title: Mining the Software Change Repository of a Legacy Telephony System Length: Short Type: Public data: No. Switching software system (a PBX) developed by Mitel Networks corporation. Not publicly available Working dataset: No. Tool/scripts: No. === MSR: 2004 Title: Four Interesting Ways in Which History Can Teach Us About Software Length: Short Type: Empirical/Case studies Public data: Yes. Linux, GCC and 6 FLOSS projects (for the last, the paper specifies "period studying from", but not "until when") Public source: Linux Public source: GCC Public source: midworld Public source: mycore Public source: Apache Ant Public source: kepler Public source: PostgreSQL Working dataset: No. Tool/scripts: Partially. Beagle. Reference given to two papers. But no URL. A Google search for "Beagle software tool" gives results for another tool called Beagle as well! Googling for "Beagle software tool waterloo" you get to http://www.swag.uwaterloo.ca/tools.html#beagle. And from there you can go to a page where you can download Beagle 1.0.1 (http://swag.uwaterloo.ca/~lzou/beagle/download/beagle1.0.1.tar.gz). There is anything said in the paper about Beagle's version used. No Tool/scripts for cloning: used CC-Finder in part and their own metrics-based tool based on the design of others. === MSR: 2004 Title: Predicting Source Code Changes by Mining Revision History Length: Short Type: Non-experimental Public Data: Paper not included in the proceedings per author's request! === MSR: 2004 Title: Mining Software Usage Data Length: Short Type: Non-experimental === MSR: 2004 Title: Bug Driven Bug Finders Length: Short Type: Empirical/New Technique + Case study Public data: Yes. Apache web server 2.0 branch bug database. 200 first bugs marked as FIXED and CLOSED. Also CVS commits were researched. Public source: Apache web server Working dataset: No. Tool/scripts: No. The checker tool that is talked about is not referenced. It does not even have a name (other than the generic "return value checker"). Googling after "chad williams "return value checker"" provides no result. === MSR: 2004 Title: Mining Repositories to Assist in Project Planning and Resource Allocation Length: Short Type: Empirical/Case study Public data: No. Working dataset: Yes. NASA's Metrics Data Program Tool/scripts: No. No tool for the analysis is provided. === MSR: 2004 Title: Bug Report Networks: Varieties, Strategies, and Impacts in a F/OSS Development Community Length: Short Type: Empirical / New method + Case study Public data: No. "one [FLOSS] community we are studying", but do not tell exactly which one. They selected a) a "random sample of 385 BRs was systematically Figure 1. Bug report repository elements drawn from a population of more than 182,000 bug reports opened in a five year period" and b) a snapshot of over 130,000 bug reports originating from the same bug report repository" Working dataset: No. Tool/scripts: No (no extraction too, no analysis tool) === MSR: 2004 Title: A Tool for Mining Defect-Tracking Systems to Predict Fault-Prone Files Length: Short Type: Tool Public data: No. Working dataset: No. Tool/scripts: No. From the summary, "The tool design is presently at a very early stage". No URL or reference to the tool. This paper just provides the design === MSR: 2004 Title: Towards Understanding the Rhetoric of Small Changes Length: Short Type: Empirical / New method + case study Public data: No. Lucent Technologies 5ESS(TM) switching software Working dataset: No Tool/scripts: No. === MSR: 2004 Title: Data Mining for Software Process Discovery in Open Source Software Development Communities Length: Short Type: Non-experimental === MSR: 2004 Title: Applying Social Network Analysis to the Information in CVS Repositories Length: Short Type: Empirical / New method + case study Public data: Yes. Apache, GNOME and KDE. CVS data. No study dates specified. Public source: Apache Public source: GNOME Public source: KDE Working dataset: No Tool/scripts: No === MSR: 2004 Title: Mining a Software Developer's Local Interaction History Length: Short Type: Empirical / New idea + Review of the state of the art Public data: No. No case study. Working dataset: No. No case study Tool/scripts: No. Tools are presented, but they are not used for this paper. === MSR: 2004 Title: LASER: A Lexical Approach to Analogy in Software Reuse Length: Short Type: Empirical / Method + Case study Public data: Yes. JRefactory. "contains 1259 packages". No further study dates are given. Public source: JRefactory Working dataset: No. Tool/scripts: No. Extend WordNet with Java words. Extensions are not provided. Analysis scripts are not given either. === MSR: 2004 Title: A Case Study on Recommending Reusable Software Components using Collaborative Filtering Length: Short Type: Empirical / Method + Case study Public data: No. "[40] random GUI Java applications from SourceForge" Public source: Sourceforge Working dataset: No Tool/scripts: Yes. They have build a software tool (collector) that uses the "Byte Code Engineering Library". Googling for this library, we see that it can be found at http://jakarta.apache.org/bcel/ === MSR: 2004 Title: TEMPLATE MINING IN SOURCE-CODE DIGITAL LIBRARIES Length: Short Type: Empirical / Method + Case study Public data: No. "a Java repository of 114 MB has been built by storing 30 packages in it." Working dataset: No Tool/scripts: No === MSR: 2004 Title: Multi-Project Software Engineering: An Example Length: Short Type: Non-experimental === MSR: 2005 Title: Understanding Source Code Evolution Using Abstract Syntax Tree Matching Length: Short Type: Empirical / Method + Case Study Public data: Yes. several FLOSS projects (the versions under study are indicated!) Public source: OpenSSH Public source: Vsftpd Public source: Linux Public source: BIND Public source: Apache web server Working dataset: No. Tool/scripts: No. Not available, but they talk about "our tool" all the time. "Our tool is constructed using CIL, an OCaml framework for C code analysis". Analysis times are given, even the system used for the analysis: "Times are the average of 5 runs. The system used for experiments was a dual Xeon@2GHz with 1GB of RAM running Fedora Core 3." === MSR: 2005 Title: Recovering System Specific Rules from Software Repositories Length: Short Type: Empirical / Method + Case study Public data: Yes. Wine (but no specific dates/versions are indicated) Public source: Wine Working dataset: No. Tool/scripts: Partially. * Analysis tool, no. "The tool we produced is merely a prototype to support this preliminary study. It is based on the Edison Design Group C parser." There is a reference with URL of the Edison Design Group. * Neither extraction tool. * But they use TouchGraph LinkBrowser [8] (and provide reference with URL) === MSR: 2005 Title: Mining Evolution Data of a Product Family Length: Short Type: Empirical / Method + Case study Public data: Yes. 8.5 GB. FreeBSD, NetBSD, and OpenBSD (no date of study provided) Public source: FreeBSD Public source: NetBSD Public source: OpenBSD Working dataset: No Tool/scripts: No. There is the talk about PFEvo, but googling after it gives no results. === MSR: 2005 Title: Using a Clone Genealogy Extractor for Understanding and Supporting Evolution of Code Clones Length: Short Type: Empirical / Method + Empirical Study Public data: Yes. Carol and DNSJava. URL given for both projects. Some versions have been studied, but it is not said which ones. Public source: Carol Public source: DNSJava Working dataset: Yes. Available after googling for it (there is no reference in the paper) at http://users.ece.utexas.edu/~miryung/software.html Tool/scripts: No. At the beginning of section 3 there are several paragraphs about the tool, but it even has no name. Uses CC-Finder. But: they use a model, and it is referenced as [1]. The URL given http://www.cs.washington.edu/homes/miryung/cge gives a 400. It seems the owner of the web page moved to another university. After googling for it, it can be found at http://users.ece.utexas.edu/~miryung/software.html === MSR: 2005 Title: When Do Changes Induce Fixes? Length: Short Type: Empirical / Method + Case Study Public data: Yes. Eclipse and Mozilla. All changes and bugs until Jan 20, 2005. Public source: Eclipse Public source: Mozilla Working dataset: No. Tool/scripts: No. === MSR: 2005 Title: Error Detection by Refactoring Reconstruction Length: Short Type: Empirical / Method + Case Study Public data: Yes. Jedit and Tomcat. Explicit transactions are specified. Public source: Jedit Public source: Tomcat Working dataset: No. Tool/scripts: No. While talking about the process, there are references to other papers, but these papers are not about tools, nor contain tools referenced either. === MSR: 2005 Title: Software Repository Mining with Marmoset: An Automated Programming Project Snapshot and Testing System Length: Short Type: Empirical / Tool + Case study Public data: No. Data from 73 students' projects at University of Maryland Working dataset: No. Tool/scripts: No. Marmoset. No URL given. After googling: http://marmoset.cs.umd.edu/. There you can find information "to register an account with our demo server and then sample our software." But this link is broken. No software download available anywhere. === MSR: 2005 Title: Mining Student CVS Repositories for Performance Indicators Length: Short Type: Empirical / Case study Public data: No. "hundreds of completely independent repositories, one for each student." Working dataset: No. Tool/scripts: Yes. "This code, including our modifications to the ViewCVS parser, is freely available at www.cs.utoronto.ca/~keir/slurp-1.0.0.tar.gz." === MSR: 2005 Title: Toward Mining "Concept Keywords" from Identifiers in Large Software Projects Length: Short Type: Empirical / Method + case study Public data: Yes. udos. There is a reference to it in the paper with an URL that works! Public source: udos Working dataset: No. Tool/scripts: Yes. Identifer Exploratory Framework (IEF). There is a reference to the source code with a valid URL. === MSR: 2005 Title: Source code that talks: an exploration of Eclipse task comments and their implication to repository mining Length: Short Type: Empirical / Method + Case Study Public data: No. IBM's Architects Workbench (AWB) Working dataset: No. Tool/scripts: No. === MSR: 2005 Title: Text Mining for Software Engineering: How Analyst Feedback Impacts Final Results Length: Short Type: Empirical / Method + Case Study Public data: No. Working dataset: Yes. MODIS dataset from NASA. Is available in the PROMISE repository Tool/scripts: No tool described. === MSR: 2005 Title: Analysis of Signature Change Patterns Length: Short Type: Empirical / Method + Case study Public data: Yes. 8 FLOSS projects. Public source: Apache Portable Runtime Public source: Apache web server Public source: Subversion Public source: CVS Public source: Linux Public source: GCC Public source: Sendmail Working dataset: No Tool/scripts: No. Kenyon. Reference with URL in the paper. The URL gives a 404. === MSR: 2005 Title: Improving Evolvability through Refactoring Length: Short Type: Empirical / Method + Case study Public data: No. Large industrial system. Picture Archiving and Communication System (PACS) Working dataset: No. Tool/scripts: No. === MSR: 2005 Title: Linear Predictive Coding and Cepstrum coefficients for mining time variant information from software repositories Length: Short Type: Empirical / Method + Case study Public data: Yes. 211 Linux kernel releases (i.e., from 1.0 to 1.3.100) Public source: Linux Working dataset: No. Tool/scripts: No. "The tools used in each phase are summarized in Table 1. These are all open source software integrated together allowing an almost fully automated analysis." The name of the tools is not given ("Perl Scripts", "C program"..., only "GNUplot") and no reference or URL is given! === MSR: 2005 Title: Repository Mining and Six Sigma for Process Improvement Length: Short Type: Non-experimental === MSR: 2005 Title: Mining Version Histories to Verify the Learning Process of Legitimate Peripheral Participants Length: Short Type: Empirical / Method + Case study Public data: Yes. Several FLOSS projects. No dates of study or versions are given Public source: awstats Public source: phpmyadming Public source: moodle Public source: filezilla Public source: gallery Public source: bzflag Working dataset: No. Tool/scripts: No. === MSR: 2005 Title: Towards a Taxonomy of Approaches for Mining of Source Code Repositories Length: Short Type: Non-experimental === MSR: 2005 Title: A Framework for Describing and Understanding Mining Tools in Software Development Length: Short Type: Non-experimental === MSR: 2005 Title: SCQL: A formal model and a query language for source control repositories Length: Short Type: Empirical / Method + Case Study Public data: Yes. Evolution, Gnumeric, OpenSSL, Samba and modperl Public source: Evolution Public source: Gnumeric Public source: OpenSSL Public source: Samba Public source: modperl Working dataset: No. Tool/scripts: No. "We have built an implementation for SCQL. In order to demonstrate the effectiveness of SCQL". But no reference/URL. === MSR: 2005 Title: Developer identification methods for integrated data from various sources Length: Short Type: Empirical / Method + Case study Public data: Yes. GNOME project. Public source: GNOME Working dataset: No Tool/scripts: No. === MSR: 2005 Title: Accelerating Cross-Project Knowledge Collaboration Using Collaborative Filtering and Social Networks Length: Short Type: Empirical / Method + Case Study Public data: Yes. "[O]ver 90,000 projects and about 130,000 developers2 at SourceForge in February 2005." Public source: Sourceforge Working dataset: No. Tool/scripts: Yes. * Data collection: "autopilot tool for SourceForge.net" (footnote: "available from the third author upon your requests"). * Graphmania and NAIST. A footnote links to http://sourceforge.jp/projects/ncfe/. Version not specified === MSR: 2005 Title: "Collaboration Using OSSmole: a repository of FLOSS data and analyses" Length: Short Type: Non-experimental === MSR: 2006 Title: Mining Large Software Compilations over Time: Another Perspective of Software Evolution Length: Long Type: Empirical / Method + Case Study Public data: Yes. Debian specific versions. Public source: Debian Working dataset: Not directly. Complete results are offered in a website, but not ready to download Tool/scripts: Partially. * Glue scripts are not provided. * Analysis: SLOCCOunt === MSR: 2006 Title: Scenarios for Mining the Software Architecture Evolution Length: Short Type: Non-experimental === MSR: 2006 Title: Productivity Analysis of Japanese Enterprise Software Development Projects Length: Short Type: Empirical / Method + Case Study Public data: No. 253 Japanese enterprise projects. Working dataset: No. Tool/scripts: No. === MSR: 2006 Title: Coupling and Cohesion Measures for Evaluation of Component Reusability Length: Short Type: Empirical / Method + Case Study Public data: No. "For each case, 20 Java components were retrieved from a repository of about 10,000 Java components retrieved form the internet." Working dataset: No. Tool/scripts: No. === MSR: 2006 Title: TA-RE: An Exchange Language for Mining Software Repositories Length: Short Type: Non-experimental === MSR: 2006 Title: The Evolution Radar: Visualizing Integrated Logical Coupling Information Length: Long Type: Empirical / Method + Case Study Public data: Yes. Mozilla, ThunderbirdTinderbox, PhoenixTinderbox Public source: Mozilla Public source: ThunderbirdTinderbox Public source: PhoenixTinderbox Working dataset: No. Tool/scripts: No. Not downloadable. No license specified. No reference/URL given in the paper. Googling you get at http://www.moosetechnology.org/tools/evolutionradar with a 404. After searching in the moosetechnology.org site form, you get to http://www.moosetechnology.org/tools/vw/evolutionradar === MSR: 2006 Title: An Open Framework for CVS Repository Querying, Analysis and Visualization Length: Long Type: Empirical / Method + Case Study Public data: Yes. ArgoUML, PostgreSQL. No dates/versions specified. Public source: ArgoUML Public source: PostgreSQL Working dataset: No. Tool/scripts: Yes. CVSgrab. Reference to conf paper given. After googling: http://cvsgrab.sourceforge.net/. But this is not the same tool as specified in the paper as it does other stuff. No page for this CVSgrab found. It says it is part of the "Visual Code Navigator" toolset. Googling for it, we get to this page (http://www.win.tue.nl/~lvoinea/VCN.html). There CVSgrab can be downloaded! === MSR: 2006 Title: Micro Pattern Evolution Length: Long Type: Empirical / Method + Case Study Public data: Yes. ArgoUML, Columba, jEdit. Period under study specified Public source: ArgoUML Public source: Columba Public source: Jedit Working dataset: No. Tool/scripts: Partially. * For SCM: Kenyon + other scripts. * Not for "Bug-introducing changes are identified by mining change logs and project history data using techniques described in [13]" === MSR: 2006 Title: Mining Sequences of Changed-files from Version Histories Length: Long Type: Empirical / Method + Case Study Public data: Yes. Log Information of the KDE Source-Code Repository. Dates of study given. Public source: KDE Working dataset: No. Tool/scripts: No. * svn2inseqs. No reference/URL. No Google document found (besides this paper).. * sqminer. No reference/URL. No Google document found for "sqminer download" - for "sqminer" there are some more results, but not related to the paper (besides this paper, of course).. * Other tasks (listed in the paper) not available either === MSR: 2006 Title: MAPO: Mining API Usages from Open Source Repositories Length: Short Type: Empirical / Method + Tool + Case Study Public data: Yes, not referenced in the paper. But available with the tool web page Working dataset: Partially, not referenced in the paper. But available with the tool web page. Missing is the source code "So far we manually download the source code files from www.koders.com." used. Tool/scripts: Yes. MAPO. No reference/URL in the paper. Googling for "mapo mining api usages download" leads to http://research.csc.ncsu.edu/ase/tools/mapo/. This is a sort of lab-book of the paper. Great! === MSR: 2006 Title: Program Element Matching for Multi-Version Program Analyses Length: Long Type: Empirical / Comparing Tools & Methods Public data: No. Working dataset: No. Tool/scripts: No. === MSR: 2006 Title: Detecting Similar Java Classes Using Tree Algorithms Length: Long Type: Empirical / Method + Case Study Public data: Yes. "[F]ive construted test cases and real-world Java project" (org.eclipse.compare-plug-in versions 3.0 and 3.1). Public source: org.eclipse.compare Working dataset: No. Tool/scripts: No. The paper talks about a tool, called Coogle. Googling for it, you get at http://seal.ifi.uzh.ch/43/. No download possible. Partially, the ASTParser of Eclipse is available, but the AST2FAMIX is not available. Googling for AST2FAMIX produces two results, both papers from the authoring research group. === MSR: 2006 Title: Mining Version Archives for Co-changed Lines Length: Short Type: Empirical / Method + Case Study Public data: Yes. Eclipse project, snapshot 2005-11-23 Public source: Eclipse Working dataset: No. Tool/scripts: No. They talk about the hardware in a footnote ("All experiments were run on an Opteron cluster using eight processors, each with 2 Mhz and 2 GB memory.") but nothing about tools/scripts. General: According to reference [15], there is an extended version of this paper somewhere at http://www.st.cs.uni-saarland.de/softevo/, which is the research group's main page. === MSR: 2006 Title: Constructing Universal Version History Length: Short Type: Empirical / Method + Case Study Public data: No. Avaya's source code repositories Working dataset: No. Tool/scripts: No. "The algorithms are implemented in the Perl programming language and were run on a Sun V40Z machine that contains 16G RAM and dual Opteron CPUs, running SunOS 5.10. for processing more than 6M versions of more than 2M files. (Each file had its version history.)". === MSR: 2006 Title: Concern-Based Mining of Heterogeneous Software Repositories Length: Long Type: Empirical / Method + Case Study Public data: No. Industrial system (from Nokia) Working dataset: No. Tool/scripts: No. There is a tool called MADE. "Eclipse-based [6] pattern driven development environment called MADE (Modelling and Architecting Development Environment)." But googling after "eclipse MADE (Modelling and Architecting Development Environment)" throws no result on this tool. === MSR: 2006 Title: Software Engineering Applications of Logic File System Length: Long Type: Empirical / Method + Case Study Public data: Yes. Awt package Public source: awt Working dataset: No. Although not referenced in the paper, available at http://lisfs2008.irisa.fr/index.php?page=demo_en#java_browser, but all the links are 404s. It seems the correct URL is: http://lisfs2008.irisa.fr/demo-area/awt-source. Tool/scripts: Yes. The tool that is presented is lisfs. Googling for "lisfs download", you get to http://lisfs2008.irisa.fr/index.php?page=demo_en#java_browser. There is a Download page at http://lisfs2008.irisa.fr/index.php?page=dl_en#lisfs. The version used in this paper is not specified. === MSR: 2006 Title: Mining Eclipse for Cross-Cutting Concerns Length: Short Type: Empirical / Method + Case Study Public data: Yes. Eclipse. Specific transactions are noted. Public source: Eclipse Working dataset: No. Tool/scripts: No. === MSR: 2006 Title: A Lightweight Approach to Technical Risk Estimation via Probabilistic Impact Analysis Length: Long Type: Empirical / Method + Case Study Public data: No. Working dataset: No. Tool/scripts: No. Eclipse plug-in built. "The technique has been implemented as a plugin to the Eclipse IDE and deployed to our industrial collaborators for initial evaluation." Googling for "technical risk eclipse tre" does not give any result for a tool === MSR: 2006 Title: Fine Grained Indexing of Software Repositories to Support Impact Analysis Length: Long Type: Empirical / Method + Case Study Public data: Yes. gedit, ArgoUML and Firefox. Public source: ArgoUML Public source: Firefox Working dataset: No. Tool/scripts: Yes. "We have developed an Eclipse plug-in, named Jimpa". URL is provided (http://cise.rcost.unisannio.it/updates/) and gives a 403. After googling and following some links, the correct page is http://rcost.unisannio.it/cerulo/jimpa/. === MSR: 2006 Title: Are Refactorings Less Error-prone Than Other Changes? Length: Long Type: Empirical / Method + Case Study Public data: Yes. JEdit, Junit and ArgoUML. Date interval under study is specified. Public source: Jedit Public source: Junit Public source: ArgoUML Working dataset: No. Tool/scripts: No. === MSR: 2006 Title: Predicting Defect Densities in Source Code Files with Decision Tree Learners Length: Long Type: Empirical / Method + Case Study Public data: Yes. Seven releases of the Mozilla open source project (modules specified) Public source: Mozilla Working dataset: No. Tool/scripts: No. * Retrieval and cleaning: No. * Analysis: Imagix-4D C/C++ analysis tool. URL referenced in footnote in paper. The tool is only available as free trial! === MSR: 2006 Title: Information Theoretic Evaluation of Change Prediction Models for Large-Scale Software Length: Long Type: Empirical / Method + Case Study Public data: Yes. 6 FLOSS projects. Dates specified Public source: OpenBSD Public source: FreeBSD Public source: KDE Public source: Koffice Public source: NetBSD Public source: PostgreSQL Working dataset: No. Tool/scripts: No. === MSR: 2006 Title: Tracking Defect Warnings Across Versions Length: Short Type: Empirical / Method + Case Study Public data: No. Sun Java Development Kit (JDK). rt.jar 116 sequential builds starting from version 1.0.2 Public source: JDK Working dataset: No. Tool/scripts: Yes. FindBugs. After googling: http://findbugs.sourceforge.net/. Version used is not specified. === MSR: 2006 Title: Mining Email Social Networks Length: Long Type: Empirical / Method + Case Study Public data: Yes. Apache and GCC developer mailing list. Dates specified. Public source: GCC Public source: Apache web server Working dataset: No. Tool/scripts: No. === MSR: 2006 Title: Geographic Location of Developers at SourceForge Length: Long Type: Empirical / Method + Case Study Public data: No. Working dataset: No. Available under request from the University of Notre Dame. See http://www.nd.edu/~oss/Data/data.html Tool/scripts: No. Just some hints about helping scripts. === MSR: 2006 Title: Textual Allusions to Artifacts in Software-related Repositories Length: Short Type: Empirical / Method + Case Study Public data: No. Microsoft Windows OS. Dates of study given. Working dataset: No. Tool/scripts: No. === MSR: 2006 Title: Enriching Revision History with Interactions Length: Short Type: Non-experimental === MSR: 2006 Title: Using Evolutionary Annotations from Change Logs to Enhance Program Comprehension Length: Short Type: Empirical / Method + Case Study Public data: Yes. Apache. Although the case study is too vague to be replicated. Public source: Apache web server Working dataset: No. Tool/scripts: No. In the paper they talk about a "prototype of Eclipse Java environment." and even provide a snapshot, but no more information, even the exact name of the prototype. === MSR: 2006 Title: A Study of the Contributors of PostgreSQL Length: Short Type: Empirical / Mining Challenge Public data: Yes. PostgreSQL (CVS data). Mined twice; dates provided. Public source: PostgreSQL Working dataset: No. Tool/scripts: Yes. softChange used. Available at http://sourceforge.net/projects/sourcechange/files/. I am not sure the complete analysis and the results shown in the paper can be performed with softChange === MSR: 2006 Title: Co-Change Visualization Applied to PostgreSQL and ArgoUML Length: Short Type: Empirical / Mining Challenge Public data: Yes. PostgreSQL and ArgoUML. Extraction date: Feb 8 2006 Public source: PostgreSQL Public source: ArgoUML Working dataset: No. Tool/scripts: Yes. CCVisu. Reference to conf paper given. Googling for it, you can find the project page at http://code.google.com/p/ccvisu/ === MSR: 2006 Title: Mining Software Repositories with CVSgrab Length: Short Type: Empirical / Mining Challenge Public data: Yes. PostgreSQL and ArgoUML. CVS data. No specific date provided. Public source: PostgreSQL Public source: ArgoUML Working dataset: No. Tool/scripts: Yes. CVSgrab. Reference to conf paper given. After googling: http://cvsgrab.sourceforge.net/. But this is not the same tool as specified in the paper as it does other stuff. No page for this CVSgrab found. It says it is part of the "Visual Code Navigator" toolset. Googling for it, we get to this page (http://www.win.tue.nl/~lvoinea/VCN.html). There CVSgrab can be downloaded! === MSR: 2006 Title: Mining Additions of Method Calls in ArgoUML Length: Short Type: Empirical / Mining Challenge Public data: Yes. ArgoUML. CVS data. The one provided for the MSR 2006 challage. It says recent version of the CVS repository, but no specific date provided. Actually the http://seal.ifi.uzh.ch/~pinzger/argouml.tar.gz link gives a 404! Public source: ArgoUML Working dataset: No. Tool/scripts: No. === MSR: 2006 Title: Using Software Birthmarks to Identify Similar Classes and Major Functionalities Length: Short Type: Empirical / Mining Challenge Public data: Yes. ArgoUML release 0.20 Public source: ArgoUML Working dataset: No. Tool/scripts: No. === MSR: 2006 Title: How Long Did It Take To Fix Bugs? Length: Short Type: Empirical / Mining Challenge Public data: Yes. ArgoUML and PostgreSQL. Bug data. Period of study provided Public source: ArgoUML Public source: PostgreSQL Working dataset: No. Tool/scripts: No. Use of Kenyon: not found after googling. === MSR: 2006 Title: Mining Refactorings in ARGOUML. CVS archive and issuezilla. Dates provided. Length: Short Type: Empirical / Mining Challenge Public data: Yes. ArgoUML Public source: ArgoUML Working dataset: No. Tool/scripts: No. === MSR: 2006 Title: Applying the Evolution Radar to PostgreSQL Length: Short Type: Empirical / Mining Challenge Public data: Yes. PostgreSQL. Dates provided. Public source: PostgreSQL Working dataset: No. Tool/scripts: No. Not downloadable. No license specified. No reference/URL given in the paper. Googling you get at http://www.moosetechnology.org/tools/evolutionradar with a 404. After searching in the moosetechnology.org site form, you get to http://www.moosetechnology.org/tools/vw/evolutionradar === MSR: 2006 Title: Examining the Evolution of Code Comments in PostgreSQL Length: Short Type: Empirical / Mining Challenge Public data: Yes. PostgreSQL. CVS data. No dates provided. Public source: PostgreSQL Working dataset: No. Tool/scripts: No. C-REX extractor. Reference to working paper given. But googling for it, I haven't found a page where to download it. === MSR: 2006 Title: Analyzing OSS Developers' Working Time Using Mailing Lists Archives Length: Short Type: Empirical / Mining Challenge Public data: Yes. PostgreSQL. Mailing lists. URL given. Dates of study provided. Public source: PostgreSQL Working dataset: No. Tool/scripts: Yes. Irvine. URL provided in the paper: http://hp.vector.co.jp/authors/VA024591/. The page is in Japanese! But following the blue buttons with some Japanese texto on it, you get to a zip file === MSR: 2006 Title: Where is Bug Resolution Knowledge Stored? Length: Short Type: Empirical / Mining Challenge Public data: Yes. ArgoUML. CVS and Bugzilla. No exact dates provided ("currently"!) Public source: ArgoUML Working dataset: No. Tool/scripts: Yes. After googling and following some links, we can find it at http://rcost.unisannio.it/cerulo/jimpa/. === MSR: 2006 Title: Mining Email Social Networks in Postgres Length: Short Type: Empirical / Mining Challenge Public data: Yes. PostgreSQL. Mailing list. Dates of study provided. Public source: PostgreSQL Working dataset: No. Tool/scripts: No. === MSR: 2007 Title: How Long will it Take to Fix This Bug? Length: Long Type: Empirical / Method + Case Study Public data: Yes. 567 issues from JBoss and four of its subprojects. Not really clear how the selection has been done. Have all bugs until 2006-05-05 (Table 1) retrieved? Which ones does Lucene take? Public source: JBoss Working dataset: No. Tool/scripts: Partially. * Retrieval: Not specified.. * Analysis: Partially with the Lucene framework. There is a reference to another paper about Lucene. A fast Google search gives http://lucene.apache.org/java/docs/ === MSR: 2007 Title: Determining Implementation Expertise from Bug Reports Length: Long Type: Empirical / Method + Case Study Public data: Yes. Eclipse bugs. The criteria for further selection of bugs is described in the paper. Public source: Eclipse Working dataset: Partially. They have a list of heuristics used in the paper available at http://www.cs.ubc.ca/labs/spl/projects/bugTriage/assignment/heuristics.html Tool/scripts: No. No tool required. Experts' Recommendation === MSR: 2007 Title: Defect Data Analysis Based on Extended Association Rule Mining Length: Long Type: Empirical / Method + Case Study Public data: No. Industrial system. No name given. Working dataset: No. Tool/scripts: No. They talk about "a prototype implementation of the proposed method" === MSR: 2007 Title: Spam Filter Based Approach for Finding Fault-Prone Software Modules Length: Short Type: Empirical / Method + Case Study Public data: Yes. ArgoUML and Eclipse BIRT. ArgoUML is a Public dataset; for Eclipse BIRT date of database is given. Public source: ArgoUML Public source: Eclipse BIRT Working dataset: Partially, ArgoUML's bug databse is available from the MSR Challenge 2006 (reference in the paper). Tool/scripts: Partially. * Retrieving: FPFinder. Googling for "fpfinder mizuno" and "fpfinder tool" throws only papers, no tool. * Analysis: CRM114 spam filtering software. Sourceforge project referenced in the paper. === MSR: 2007 Title: Recommending Emergent Teams Length: Long Type: Empirical / Method + Tool + Case Study Public data: Yes. Eclipse, Firefox and Bugzilla. Evolution is also talked about in the paper. No further dates or versions are indicated. Public source: Eclipse Public source: Firefox Public source: Bugzilla Public source: Evolution Working dataset: No. Tool/scripts: Partially. * Retrieval: cvs2svn. Publicly available. Link given in the paper.. * Analysis: Emergent Expertise Locator (EEL). Java Plug-in for Eclipse. No reference or link in the paper. Googling for "Emergent Expertise Locator (EEL)" provides no result about a site where it could be downloaded. === MSR: 2007 Title: Open Borders? Immigration in Open Source Projects Length: Long Type: Empirical / Method + Case Study Public data: Yes. Apache web server, Postgres, Python. Dates under study are given, but not very clearly. Public source: Apache web server Public source: PostgreSQL Public source: Python Working dataset: Partially. This can be obtained from the URL http://macbeth.cs.ucdavis.edu/hazard/ given in the paper. Missing are the sources with all the messages. Only the intermediate results are provided. Tool/scripts: Partially. This can be obtained from the URL http://macbeth.cs.ucdavis.edu/hazard/ given in the paper. === MSR: 2007 Title: Correlating Social Interactions to Release History During Software Evolution Length: Long Type: Empirical / Method + Case Study Public data: Yes. LSEdit and Apache Ant. Specific versions studied as noted in the paper. LSEdit is a tool by the SWAG research group. I have verified that you can download all the versions under study from here: http://www.swag.uwaterloo.ca/lsedit/downloads/index.html Public source: LSEdit Public source: Apache Ant Working dataset: No. Tool/scripts: No. "The first immediate extension would be to implement our approach as a tool" === MSR: 2007 Title: Mining CVS Repositories to Understand Open-Source Project Developer Roles Length: Short Type: Empirical / Method + Case Study Public data: Yes. ORAC-DR and Mediawiki. No dates/versions are specified. Public source: ORAC-DR Public source: Mediawiki Working dataset: No. Tool/scripts: No. === MSR: 2007 Title: Visual Data Mining in Software Archives To Detect How Developers Work Together Length: Long Type: Empirical / Method + Case Study Public data: Yes. Junit and Tomcat3. Dates of study specified. Public source: Junit Public source: Tomcat Working dataset: No. Tool/scripts: No. "Currently we have independent prototypes for each of the three visualizations. As these visualizations work hand-in-hand we are planning to integrate these into one single tool that can be used e.g. by project managers for analyzing the social networks in their project." === MSR: 2007 Title: Mining Software Repositories with iSPARQL and a Software Evolution Ontology Length: Long Type: Empirical / Method + Case Study Public data: Yes. 206 versions of org.eclipse.compare plug-in. No further detail at what specific releases is given. Public source: org.eclipse.compare Working dataset: No. Tool/scripts: Yes. iSPARQL. Available at http://www.ifi.uzh.ch/ddis/isparql.html (URL given in the paper) === MSR: 2007 Title: Mining Workspace Updates in CVS Length: Short Type: Empirical / Method + Case Study Public data: Yes. GCC, JBoss, Jedit, Python. Dates under investigation specified. Public source: GCC Public source: JBoss Public source: Jedit Public source: Python Working dataset: No. Tool/scripts: No. === MSR: 2007 Title: Finding Relevant Applications For Prototyping Length: Short Type: Non-experimental === MSR: 2007 Title: Lightweight Risk Mitigation for Software Development Projects Using Repository Mining Length: Short Type: Empirical / Method + Case Study Public data: No. It is called "Pocahontas" in this paper. ("Pocahontas is a real project but its name is fictitious. To protect the confidentiality of our industrial clients and their businesses, all of the identifying information about the Pocahontas project, including quantitative information, has been disguised, in ways that do not affect the conclusions of this paper.".) Working dataset: No. Tool/scripts: No. === MSR: 2007 Title: Identifying Changed Source Code Lines from Version Repositories Length: Long Type: Empirical / Method + Case Study Public data: Yes. ArgoUML. Interval of study specified. Assessment involves the manual inspection of 100 samples. No further information about this. Public source: ArgoUML Working dataset: No. Tool/scripts: No. === MSR: 2007 Title: Mining a Change-Based Software Repository Length: Long Type: Empirical / Method + Case Study Public data: No. Project X (named as such "to preserve privacy") and Spyware. The data on which the analysis is based on is not available publicly. Public source: Spyware Working dataset: No. Tool/scripts: Yes. "We have implemented our approach in an IDE plug-in for the Squeak Smalltalk environment, under the moniker "SpyWare"." URL provided as well: http://romain.robb.es/spyware.html. Used version is not specified. === MSR: 2007 Title: Studying Versioning Information to Understand Inheritance Hierarchy Changes Length: Short Type: Empirical / Method + Case Study Public data: Yes. jEdit and ArgoUML. Versions specified. Public source: Jedit Public source: ArgoUML Working dataset: No. Tool/scripts: No. === MSR: 2007 Title: Combining Single-Version and Evolutionary Dependencies for Software-Change Prediction Length: Short Type: Non-experimental === MSR: 2007 Title: Evaluating the harmfulness of cloning: a change based experiment Length: Short Type: Empirical / Method + Case Study Public data: Yes. DnsJava. No further details about dates/releases is given. Public source: DNSJava Working dataset: No. Tool/scripts: Partially:. * CloneTracker. Link given in the paper: http://mcs.open.ac.uk/alr242. Although in the download page it says it is free software, you are not allowed to modify it ("It does not permit either any type of modifications.")!. * CCFinder: soon MIT-licensed version available. * CTAGS: available as free software. * CVS commands: available as free software. * No glue scripts given === MSR: 2007 Title: Release Pattern Discovery via Partitioning: Methodology and Case Study Length: Long Type: Empirical / Method + Case Study Public data: Yes. MySQL. Studied versions speficied. Public source: MySQL Working dataset: No. Tool/scripts: Partially. * Extraction: softChange for CVS repositories and bt2csv for BitKeeper repositories. * Analysis: Hiraldo-Grok + R + gnuplot. I haven't been able to find a place from which to download Hiraldo-Grok from the Internet.. * Glue scripts are not available as well === MSR: 2007 Title: Comparing Approaches to Mining Source Code for Call-Usage Patterns Length: Long Type: Empirical / Method + Case Study Public data: Yes. Linux v2.6.14 Public source: Linux Working dataset: No. Tool/scripts: No. callextractor and sqminer. No reference/URL for callextractor. Googling for "callextractor" does not provide any meaningful result. No reference/URL for sqminer. No Google document found for "sqminer download" - for "sqminer" there are some more results, but not related to the paper === MSR: 2007 Title: Towards a theoretical model for software growth Length: Long Type: Empirical / Method + Case Study Public data: Yes. FreeBSD. Detailed version specified (16,037 ports in a FreeBSD 6.0-RELEASE system), although not sure exactly the same list of ports could be obtained again. Public source: FreeBSD Working dataset: No. Tool/scripts: Partially:. * SLOCCount: URL provided in the paper. * metrics: URL provided in the paper, http://libresoft.es/Tools/metrics, throws a 404. Googling after it, "metrics libresoft", you find that the new URL is http://tools.libresoft.es/cmetrics. * exuberant-ctags: well-known free software.. * No analysis scripts provided === MSR: 2007 Title: Analysis of the Linux Kernel Evolution Using Code Clone Coverage Length: Short Type: Empirical / Method + Case Study Public data: Yes. 136 versions of Linux. The specific versions should have been listed in Table 3, but there is no table 3 in the paper! Public source: Linux Working dataset: No. Tool/scripts: No. After googling for D-CCFinder I haven't found a place where downloading it.. * Uses du command for measuring size (although size is secondary in this paper) === MSR: 2007 Title: What can OSS mailing lists tell us? A preliminary psychometric text analysis of the Apache developer mailing list Length: Long Type: Empirical / Method + Case Study Public data: Yes. Apache mailing list. Dates are specified. Public source: Apache web server Working dataset: No. Tool/scripts: Partially. Tool used: LIWC. After googling, you get at http://www.liwc.net/. You can by the lite version for 29$, the standard version for 89$. No data retrieval/extraction tools provided. === MSR: 2007 Title: Using software distributions to understand the relationship among free and open source software projects Length: Long Type: Empirical / Method + Case Study Public data: Yes. Fink repository. URL given. Date specified. Public source: Fink Working dataset: No. Tool/scripts: No. === MSR: 2007 Title: Using Software Repositories to Investigate Socio-technical Congruence in Development Projects Length: Short Type: Non-experimental === MSR: 2007 Title: Detecting Patch Submission and Acceptance in OSS Projects Length: Short Type: Empirical / Method + Case Study Public data: Yes. Apache, Python, Postgres and MySQL (with limitations). No dates of research given. Public source: Apache Public source: Python Public source: PostgreSQL Public source: MySQL Working dataset: No. Tool/scripts: No. "We use a series of regular expressions to detect any known forms of headers in the bodies of all messages from the mailing lists." === MSR: 2007 Title: Prioritizing Warning Categories by Analyzing Software History Length: Short Type: Empirical / Method + Case Study Public data: Yes. Columba and JEdit. Period of study specified. Public source: Columba Public source: Jedit Working dataset: No. Tool/scripts: Partially. * Kenyon: not found after googling . * FindBugs. After googling: http://findbugs.sourceforge.net/. * JLint. Referenced in the paper.. * PMD. After googling: http://pmd.sourceforge.net/ === MSR: 2007 Title: Impact of the Creation of the Mozilla Foundation in the Activity of Developers Length: Short Type: Empirical / Mining Challenge Public data: Yes. Mozilla project. Period under study specified. Public source: Mozilla Working dataset: No. Tool/scripts: Yes. CVSAnalY. Googling after it provides http://tools.libresoft.es/cvsanaly === MSR: 2007 Title: Predicting Eclipse Bug Lifetimes Length: Short Type: Empirical / Mining Challenge Public data: Yes. Eclipse Bugzilla database. Dates of study are specified Public source: Eclipse Working dataset: Yes. "I am grateful to Thomas Zimmermann for providing an extracted Eclipse Bugzilla data set". I guess they used the one available for the Challenge: http://www.st.cs.uni-sb.de/softevo/challenge2007/eclipse-bugs.zip Tool/scripts: Yes. WEKA. Reference to URL: http://www.cs.waikato.ac.nz/ml/weka/index.html === MSR: 2007 Title: Mining Eclipse Developer Contributions via Author-Topic Models Length: Short Type: Empirical / Mining Challenge Public data: Yes. Eclipse 3.0 source code. They also refer to the Eclipse bug data available in [6], which after googling for the name of the paper, I've found that can be retrieved from http://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse Public source: Eclipse Working dataset: Yes. "Complete list of topics with word and author distributions": http://sourcerer.ics.uci.edu/msr2007/index.html. And the Eclipse bug data (see Public data). Tool/scripts: No. It contains performance measures, but no reference to a tool/script. === MSR: 2007 Title: Predicting Defects and Changes with Import Relations Length: Short Type: Empirical / Mining Challenge Public data: Yes. Eclipse. Version given 3.0 Public source: Eclipse Working dataset: No. Tool/scripts: No. === MSR: 2007 Title: Forecasting the number of changes in Eclipse using time series analysis Length: Short Type: Empirical / Mining Challenge Public data: Yes. Eclipse. Date given: until January 2007 Public source: Eclipse Working dataset: No. Tool/scripts: Partially. CVSAnalY. Googling after it provides http://tools.libresoft.es/cvsanaly. Scripts for ARIMA model are not provided. === MSR: 2007 Title: Local and Global Recency Weighting Approach to Bug Prediction Length: Short Type: Empirical / Mining Challenge Public data: Yes. Eclipse. 70,000 bugs from January 2001 to January 2007. 32 components specified in the MSR Challenge 2007 were studied. Public source: Eclipse Working dataset: No. Tool/scripts: No. === MSR: 2008 Title: Determinism and Evolution Length: Long Type: Empirical / Method + Case Study Public data: Yes. 3,821 FLOSS projects. Public source: Sourceforge Working dataset: No. The dataset is said to be available at http://libresoft.es/Results/CVSAnalY_SF, but it throws a 404! Tool/scripts: No. === MSR: 2008 Title: FAVE - Factor Analysis Based Approach for Detecting Product Line Variability from Change History Length: Long Type: Empirical / Method + Case Study Public data: No. Automotive Engine Control Software. Working dataset: No. Tool/scripts: No. They use R and provide some hints about factors, but not much more detail beyond this. === MSR: 2008 Title: Branching and Merging in the Repository Length: Short Type: Empirical / Method + Case Study Public data: Yes. ArgoUML. Dates of study are given. Public source: ArgoUML Working dataset: No. Tool/scripts: No. "We did modify the DiffJ tool to provide an API interface rather than running it via the command line." But no link is given. They use cvs2svn as well. === MSR: 2008 Title: Deep Intellisense: A Tool for Rehydrating Evaporated Information Length: Short Type: Empirical / Method + Case Study Public data: No. Seven months of Windows Vista. Working dataset: No. Tool/scripts: No. "Visual studio plugin called Deep Intellisense". Googling after it, you get to http://research.microsoft.com/en-us/um/people/abegel/di/index.html. No download available. === MSR: 2008 Title: Extracting Structural Information from Bug Reports Length: Short Type: Empirical / Method + Case Study Public data: Yes. Eclipse bug reports. No dates specified. Public source: Eclipse Working dataset: No. Tool/scripts: No. The paper is about a tool called "infoZilla", but no tool can be found if you google for it. === MSR: 2008 Title: A Change-Aware Development Environment by Recording Editing Operations of Source Code Length: Short Type: Empirical / Method + Case Study Public data: No. Java game with no further reference. Working dataset: No. Tool/scripts: No. "Eclipse plug-in which is called OperationRecorder". Googling for "OperationRecorder Eclipse" throws another tool with the same name! Performance is provided. === MSR: 2008 Title: On the Relation of Refactoring and Software Defects Length: Short Type: Empirical / Method + Case Study Public data: Yes. ArgoUML, JBoss Cache, Liferay, Spring, XDoclet. Although no further details Public source: ArgoUML Public source: JBoss Cache Public source: Liferay Public source: Spring Public source: XDoclet Working dataset: No. "For each project we developed between 10 and 20 SQL queries to mark modifications as refactorings." But not available. Tool/scripts: No. WEKA is used as analysis tool === MSR: 2008 Title: Understanding Bug Fix Patterns in Verilog Length: Short Type: Empirical / Method + Case Study Public data: Yes. Four FLOSS projects from www.opencores.org. No date of study provided. Working dataset: No. Tool/scripts: No. === MSR: 2008 Title: Mining Software Effort Data: Preliminary Analysis of Visual Studio Team System Data Length: Short Type: Empirical / Method + Case Study Public data: No. Visual Studio Team System (VSTS) Working dataset: No. Tool/scripts: No. === MSR: 2008 Title: The FOSSology Project Length: Short Type: Empirical / Method + Case Study Public data: Yes. Abiword. Version: the one in Fedora 8. Public source: Abiword Working dataset: No. Tool/scripts: Yes. No reference in the paper, but googling for it gives http://fossology.org/. No version specified. === MSR: 2008 Title: Improving Change Descriptions with Change Contexts Length: Long Type: Empirical / Method + Case Study Public data: Yes. Cecil and ZedGraph. Number of studied versions are given, but no specific versions. Dates are also hinted, but not exactly. Public source: Cecil Public source: ZedGraph Working dataset: No. Tool/scripts: No. "We have developed a tool CILDiff which operates on C# bytecode and reports annotated byte differences. These differences are queried by our tool CILQuery." After googling for "cildiff" I get http://se.ninlabs.com/cilsuite/index.html, which I don't know if it is the same tool or a different tool with the same name. "cilquery" provides no meaningful result in Google. === MSR: 2008 Title: Evaluation of Source Code Copy Detection Methods on FreeBSD Length: Short Type: Empirical / Method + Case Study Public data: Yes. FreeBSD. No version or date are provided. Public source: FreeBSD Working dataset: No. Tool/scripts: Partially.. * Yes: Nilsimsa hash. No reference in the paper, but googling after it, you get at http://ixazon.dynip.com/~cmeclax/nilsimsa.html.. * No: AST "especially, Prof. A. Hassan for providing us AST approximation code." === MSR: 2008 Title: Small Patches Get In! Length: Long Type: Empirical / Method + Case Study Public data: Yes. Flac and OpenAFS. Mailing lists and CVS repository. Dates of study provided Public source: Flac Public source: OpenAFS Working dataset: No. Tool/scripts: No. "Additionaly, as now e-mail patches are available for us as a new data source, we plan to integrate it into our existing analysis tools [5] to improve the quality of the results." I haven't found the tools, so probably they are not publicly available. === MSR: 2008 Title: AMAP: Automatically Mining Abbreviation Expansions in Programs to Enhance Software Maintenance Tools Length: Long Type: Empirical / Method + Case Study Public data: Yes. 5 FLOSS Java programs. Liferay, OpenOffice.org Portable, iText.NET, Tiger Envelopes, Azureus. No version/date specified Public source: Liferay Public source: OpenOffice.org Portable Public source: iText.NET Public source: Tiger Envelopes Public source: Azureus Working dataset: Partially. "The word lists used in our implementation as well as descriptions of how they were derived are available online." http://www.cis.udel.edu/~gibson/amap/. There is other information there too. Tool/scripts: No. "Our tehcnique is fully automatic and is implemented as a Java Eclipse plugin with command line scripts for the MFE calculations." At http://www.cis.udel.edu/~gibson/amap/ you can read "We are still actively developing AMAP. The AMAP research prototype is currently implemented as an Eclipse plug-in and perl scripts. Please e-mail Emily for the code." === MSR: 2008 Title: An Extension of Fault-Prone Filtering Using Precise Training and a Dynamic Threshold Length: Long Type: Empirical / Method + Case Study Public data: Yes. Eclipse BIRT plugin, Eclipse Modelling Framework (EMF). Date of study given. Public source: Eclipse BIRT Public source: Eclipse Modelling Framework Working dataset: No. Tool/scripts: Partially. * CRM114 spam filtering software. Reference provided.. * FPTrainer and FPClassifier. Not Found. === MSR: 2008 Title: What Do Large Commits Tell Us? A taxonomical study of large commits Length: Long Type: Empirical / Method + Case Study Public data: Yes. 9 FLOSS projects. No dates/releases provided Public source: Boost Public source: Egroupware Public source: Englightment Public source: Evolution Public source: Firebird Public source: MySQL Public source: PostgreSQL Public source: Samba Public source: Spring Working dataset: No. Tool/scripts: No. === MSR: 2008 Title: SpotWeb: Detecting Framework Hotspots via Mining Open Source Repositories on the Web Length: Short Type: Empirical / Method + Case Study Public data: Yes. JUnit and Log4j. No dates/relases of study given Public source: Junit Public source: Log4j Working dataset: No. Tool/scripts: No. SpotWeb. "Therefore, our approach, called SpotWeb, leverages a code search engine (CSE) to gather related code examples from APIs of the input framework". Googling for "spotweb" and "spotweb tool" throws no results. === MSR: 2008 Title: Talk and Work: a Preliminary Report Length: Short Type: Empirical / Method + Case Study Public data: Yes. Ant, Python, Apache, Postgres. Mailing lists and SCM repositories. No URLs and dates given. Public source: Apache Ant Public source: Python Public source: Apache web server Public source: PostgreSQL Working dataset: No. Tool/scripts: No. === MSR: 2008 Title: Correctness of Data Mined from CVS Length: Short Type: Empirical / Case study Public data: No. Study of 17 student projects. Working dataset: No. Tool/scripts: No. === MSR: 2008 Title: Mining Usage Expertise from Version Archives Length: Short Type: Empirical / Method + Case Study Public data: Yes. Eclipse Public source: Eclipse Working dataset: No. Tool/scripts: No. Pre-processing: Apfel Eclipse plug-in. Googling for it, we get to http://www.st.cs.uni-saarland.de/softevo/apfel/, which throws a 404. Looking in Google cache, we see that there is no way of downloading the plugin. === MSR: 2008 Title: Expertise Identification and Visualization from CVS Length: Short Type: Empirical / Method + Case Study Public data: Yes. Apache web server CVS repository. Apache 2.0 from 1996 to 2004. Public source: Apache web server Working dataset: No. Tool/scripts: No. "In this paper, we provide an exploratory tool that allows the examination of the data about expertise and contribution to open-source projects in more details.". No more detail about the tool is given. === MSR: 2008 Title: Measuring Developer Contribution from Software Repository Data Length: Short Type: Empirical / Method + Tool Public data: No. No case study Working dataset: No. No case study Tool/scripts: Yes. SQO-OSS system. From the project page, http://www.sqo-oss.org/xwiki/bin/view/Main/, you can download the software from this link http://www.sqo-oss.org/sqo-oss.tar.gz === MSR: 2009 Title: The Promises and Perils of Mining Git Length: Long Type: Empirical / Method + Case Study Public data: No. The paper tells about 30 FLOSS projects at http://git.or.cz/gitwiki/Gitprojects. Gives a "This page does not exist" in the wiki. Some of the projects are named, but not all of them. I haven't found them anywhere. Neither in the code repository (see tools) Working dataset: No. Tool/scripts: Yes. Link in the paper given: http://github.com/cabird/git_mining_tools === MSR: 2009 Title: Amassing and indexing a large sample of version control systems: towards the census of public source code history Length: Long Type: Empirical / Method + Case Study Public data: Yes. Several large and notable repositories (see Table II). No date specified. Working dataset: No. Tool/scripts: No. === MSR: 2009 Title: MapReduce as a General Framework to Support Research in Mining Software Repositories (MSR) Length: Long Type: Empirical / Method + Case Study Public data: Yes. Eclipse, BIRT and Datatools. No date/version specified. Public source: Eclipse Public source: Eclipse BIRT Public source: Eclipse Datatools Working dataset: No. Tool/scripts: Partially. Hadoop is available as FLOSS - no date/version specified. J-rex is available as FLOSS - no date/version specified. But D-JrexN (N=1,2,3) couldn't be found. === MSR: 2009 Title: A Platform for Software Engineering Research Length: Long Type: Empirical / Tool + Case Study Public data: Yes. 48 projects from GNOME. Not a full list of the projects. No version/date specified. Public source: GNOME Working dataset: No. Tool/scripts: No. Alitheia Core. Googling for it links to http://www.sqo-oss.org/xwiki/bin/view/About/What+Is+SQO-OSS. There following the link "Alitheia Core Download" (http://www.sqo-oss.org/builds) gives a 404. === MSR: 2009 Title: Evaluating the Relation Between Coding Standard Violations and Faults Within and Across Software Versions Length: Long Type: Empirical / Method + Case Study Public data: No. NXP TV Platform Working dataset: No. Tool/scripts: No. === MSR: 2009 Title: Tracking Concept Drift of Software Projects Using Defect Prediction Quality Length: Long Type: Empirical / Method + Case Study Public data: Yes. Eclipse, OpenOffice.org, Netbeans, Mozilla. For all of them CVS and Bugzilla data. Dates of study given. Public source: Eclipse Public source: OpenOffice.org Public source: Netbeans Public source: Mozilla Working dataset: No. Tool/scripts: No. === MSR: 2009 Title: Does Calling Structure Information Improve the Accuracy of Fault Prediction? Length: Long Type: Empirical / Method + Case Study Public data: No. Industrial software system. Working dataset: No. Tool/scripts: No. === MSR: 2009 Title: Mining Source Code to Automatically Split Identifiers for Software Analysis Length: Long Type: Empirical / Method + Case Study Public data: No. Randomly selected tokens from 9000 open source Java programs in SourceForge. Public source: Sourceforge Working dataset: Partially. Link in the paper to common prefixes and postfixes. Tool/scripts: No. The Samurai tool is presented. There is a link in the paper to the tool web page. No download link available. === MSR: 2009 Title: Code siblings: technical and legal implications of copying code between applications Length: Long Type: Empirical / Method + Case Study Public data: Yes. Linux, FreeBSD and OpenBSD. Date of retrieval (snapshot/release) specified. Public source: Linux Public source: FreeBSD Public source: OpenBSD Working dataset: No. Tool/scripts: Partially:. * Fossology 1.0.0. Version number included! But they found some error/ommisions, and corrected them. But this is not available in the original tool!. * CCFinder. === MSR: 2009 Title: Author Entropy vs. File Size in the GNOME Suite of Applications Length: Short Type: Empirical / Mining Challenge Public data: Yes. 10 projects from GNOME (SCM data). Names and versions/dates are not specified. Public source: GNOME Working dataset: No. Tool/scripts: No. They state that "In order to calculate author entropy for each file we used the original Python script implemented by Taylor et al. in their study, which introduced author entropy [3]". But I couldn't find that script. Neither the Java tool that they "implemented a custom Java program to aggregate the output of each Python script execution" === MSR: 2009 Title: Evaluating Process Quality in GNOME based on Change Request Data Length: Short Type: Empirical / Mining Challenge Public data: Partially. 25 of the largest GNOME projects (Bugzilla data). Snapshot date given. Public source: GNOME Working dataset: No. Tool/scripts: Yes. * BugzillaMetrics (FLOSS). Link is given to bugzillametrics.org.. * QMetric (FLOSS). Reference paper is given, but no URL. Googling, you land at bugzillametrics.org (whose title page is QMetric), and I guess it is "QualityModel tools". No more hints are given in the paper. === MSR: 2009 Title: Mining the Coherence of GNOME Bug Reports with Statistical Topic Models Length: Short Type: Empirical / Mining Challenge Public data: Yes. GNOME (Bugzilla data). No specific date given (but number of total bugs) Public source: GNOME Working dataset: No. The URL with "complete results" provided (http://sourcerer.ics.uci.edu/msr2009/gnome_coherence.html) gives a 404! Tool/scripts: No. === MSR: 2009 Title: Visualizing Gnome With The Small Project Observatory Length: Short Type: Empirical / Mining Challenge Public data: Yes. GNOME (SCM data). No specific date specified (although it says 2009 as last year studied in the paper) Public source: GNOME Working dataset: No. Tool/scripts: Yes. Tool website linked: http://spo.inf.unisi.ch/ and software available for download. === MSR: 2009 Title: On the use of Internet Relay Chat (IRC) meetings by developers of the GNOME GTK+ project Length: Short Type: Empirical / Mining Challenge Public data: No. #gtk-devel IRC channel. Working dataset: No. Tool/scripts: No. === MSR: 2009 Title: Mining Search Topics from a Code Search Engine Usage Log Length: Short Type: Empirical / Case Study Public data: No. Koders usage log. Working dataset: No. Tool/scripts: No. === MSR: 2009 Title: From Work to Word: How Do Software Developers Describe Their Work? Length: Long Type: Empirical / Method + Case Study Public data: Partially. MyComp Work Management System (No), Apache 73 projects SCM log messages (Yes), Eureka Study (No) Public source: Apache Working dataset: No. Tool/scripts: No. === MSR: 2009 Title: Assigning Bug Reports using a Vocabulary-Based Expertise Model of Developers Length: Long Type: Empirical / Method + Case Study Public data: Yes. Eclipse (Bugzilla and SCM data). Dates of study specified. Public source: Eclipse Working dataset: No. Tool/scripts: Yes. Develect. Link provided: http://smallwiki.unibe.ch/develect. But I could not find it. And the link "Please follow the instructions on how to install Moose from scgStore." (http://www.moosetechnology.org/download/scgstore) gives a 404! Searching for "develec" at www.moosetechnology.org provides no results. === MSR: 2009 Title: Mining the History of Synchronous Changes to Refine Code Ownership Length: Long Type: Empirical / Method + Case Study Public data: No. Speed, a commercial project. Working dataset: No. Tool/scripts: Partially. Syde. "Syde is a client-server application that can manage and store object-oriented software systems implemented in Java." Googling for "syde hattori", you get at http://www.inf.usi.ch/phd/hattori/syde.html. There you can find a link to download Syde from Eclipse, using http://www.inf.unisi.ch/phd/hattori/syde/update/. This is the client. I haven't found how to download the server. === MSR: 2009 Title: Using Association Rules to Study the Co-evolution of Production & Test Code Length: Short Type: Empirical / Method + Case Study Public data: Partially. Checkstyle (FLOSS) and industrial software system. No dates given Public source: Checkstyle Working dataset: No. Tool/scripts: No. === MSR: 2009 Title: On What Basis to Recommend: Changesets or Interactions? Length: Short Type: Empirical / Method + Case Study Public data: Yes. Eclipse Mylyn project (bug reports). Dates provided. Public source: Eclipse Mylyn Working dataset: No. Tool/scripts: No. === MSR: 2009 Title: Mining the Jazz Repository: Challenges and Opportunities Length: Short Type: Empirical / Method + Case Study Public data: Yes. Jazz project. "Jazz repository data set available so far is the Jazz development repository itself". Dates are provided. Public source: Jazz Working dataset: No. "We thank the Jazz team for providing and approving the Jazz data and constructing the legal agreement." I assume that this dataset is not publicly available. Tool/scripts: No. === MSR: 2009 Title: Using Latent Dirichlet Allocation for Automatic Categorization of Software Length: Short Type: Empirical / Method + Case Study Public data: Yes. 41 FLOSS projects (first) and 43 other FLOSS projects (second). As stated in a footnote, full results can be found in http://www.cs.wm.edu/~denys/data/msr09/msr09-appendix.htm. From there the exact projects and versions can be obtained. Public source: gib Public source: hotpop Public source: jwsmtp Public source: anyterm Public source: cgterm Public source: putty Public source: tatelnet Public source: exb Public source: OpenJMail Public source: j2ssh Public source: jalita Public source: pircd Public source: Mercury Public source: Web2IRC Public source: emailer Public source: squirrelmail Public source: ajaxphpterm Public source: phpterm Public source: gcells Public source: BubbleBreaker Public source: deep Public source: npp Public source: nomic Public source: Netrisk Public source: ILKMail Public source: freedroid Public source: fuzzy Public source: AkelPad Public source: pdfedit Public source: DBEdit Public source: smallsql Public source: jedit42source Public source: rtext Public source: VietPad Public source: SequelExplorer Public source: Kephra Public source: fortune Public source: phpMyAdmin Public source: FCKeditor Public source: ontext Public source: SharpTTT Public source: cses Public source: PostgreSQL Public source: bingo Public source: btechmux Public source: cinag Public source: faile Public source: gbatnav Public source: gchch Public source: icsDrone Public source: libgmonopd Public source: netships Public source: nettoe Public source: nngs Public source: Sjeng Public source: ttt Public source: clisp Public source: csl Public source: freewrapsrc53 Public source: gbdk Public source: gprolog Public source: gsoap2 Public source: jcom223 Public source: nasm Public source: pfe Public source: sdcc Public source: centrallix Public source: emdros Public source: firebird Public source: gtm Public source: leap Public source: MySQL Public source: PostgreSQL Public source: gedit Public source: gmas Public source: gnotepad+ Public source: molasses Public source: peacock Public source: dv2jpg Public source: libcu30 Public source: mjpgTools Public source: mpegsplit Public source: R Working dataset: No. Tool/scripts: Partially:. * MUDABlue. Reference to paper given. Googling for it, you get at http://www.empirical.jp/EASE_DVD/Tools/MUDABlue.html. No download link available. The authors did not use the tool. They used the results of using that system by the original authors (and published in a paper) to compare results.. * GibbsLDA++. Footnote in the paper provides following URL: http://gibbslda.sourceforge.net/ === MSR: 2009 Title: Evolution of the core team of developers in libre software projects Length: Short Type: Empirical / Method + Case Study Public data: Yes. GIMP (CVS data). No dates of study given. Public source: GIMP Working dataset: No. Tool/scripts: Yes. Googling after it provides http://tools.libresoft.es/cvsanaly === MSR: 2009 Title: On Mining Data across Software Repositories Length: Short Type: Empirical / Method + Case Study Public data: Yes. Web pages (linked in the paper). Working dataset: No. Tool/scripts: No. "We used libraries from HTMLScraper9, an open source project, to implement the HTML downloader and HTML parser components of the tool. [9 is a footnote to the HTMLScraper web page, http://sourceforge.net/projects/HTMLscraper/]" === MSR: 2009 Title: Automatic Labeling of Software Components and their Evolution using Log-Likelihood Ratio of Word Frequencies in Source Code Length: Short Type: Empirical / Method + Case Study Public data: Yes. JUnit. 14 releases. Version numbers given in the paper. Public source: Junit Working dataset: No. Tool/scripts: Partially. "We implemented -2log[Sigma] analysis in a Java prototype which is available on the Hapax website." === MSR: 2009 Title: Learning from Defect Removals Length: Short Type: Empirical / Method + Case Study Public data: Yes. Eclipse Text Editor plugin, Eclipse Launching plugin (CVS and Bugzilla data), Groovy and Cherry (SVN and issue tracking system). No versions/dates specified specifically, although "Reports were restricted to those that occurred 6 months before or after a major release, with most occurring before the release." Public source: Eclipse Text Editor Public source: Eclipse Launching Public source: Groovy Public source: Cherry Working dataset: No. Tool/scripts: No. === MSR: 2009 Title: SourcererDB: An Aggregated Repository of Statically Analyzed and Cross-Linked Open Source Java Projects Length: Short Type: Empirical / Method + Case Study Public data: No. "local snapshots of 2,852 Java projects taken from Sourceforge, Apache and Java.net." Public source: Sourceforge Working dataset: No. "SourcererDB currently indexes 2,852 Java projects recently taken from Sourceforge, Apache and Java.net (as of early 2009). This pre-processed dataset is freely available for the research community to use." Later on: "For those who wish to use SourcererDB, we provide direct read-only access to the MySQL database." Website provided. http://sourcerer.ics.uci.edu/db/. But no data available! Tool/scripts: Partially. SourcererDB. Googling after it, you get at http://sourcerer.ics.uci.edu/. There is a web front-end where you can use the tool, but no way of downloading it. === MSR: 2009 Title: On the Transfer of Evolutionary Couplings to Industry Length: Short Type: Empirical / Method + Case Study Public data: No. Industrial system at Philips Healthcare MRI Working dataset: No. Tool/scripts: No. The author talks about their tool called CouplingViewer. Googling for "couplingviewer philips" only throws the paper at MSR as result. ===