Reproduction Package
A Reflection on the Impact of Model Mining from GitHub
This is the reproduction package of the "A Reflection on the Impact of Model Mining from GitHub" paper we have sent to the IST journal. It contains all the scripts and data we have analyzed, and should serve both to verify our results and to build on top of them.
Google Scholar analysis
A tarball (gscholar.tar.gz) with following files:
- 0*.html: Google Scholar page with papers citing "The quest for open source projects that use UML: mining GitHub"
- 1*.html: Google Scholar page with papers citing "An extensive dataset of UML models in GitHub"
- expand.py: Python Script that outputs the number of citations and the title of the paper
- output*: Output of the script (in different formats, unmerged, sorted, cleaned)
Paper Coding
Spreadsheet (PapersCitingOurPapers-2022-11-16.xlsx) with the coding done by the authors. The spreadsheet contains four tabs:
- List of papers (as obtained from Google Scholar)
- Categories for indirect use
- Indirect-use
- Categories for Type of Direct-Use
DBLP Analysis
Scripts that have been used to analyze DBLP.
The scripts use the DBLP XML database dump as of Dec 7th 2022, which is converted into JSON with the help of DBLPParser (files included).
The resulting JSON file is parsed then to find the papers citing our research and their authors (dblp2reflection.py)
From the list of authors, we obtain the co-author and non-co-author networks (dblp2reflection2.py)
In the root directory:
- papers.txt List of papers that cite the Lindholmen papers
- dblp.xml.gz DBLP database dump (as of Dec 7th 2022, 713MB)
- dblp.json.gz DBLP database dump converted to JSON with the help of DBLPParser (591 MB)
- papers_in_dblp.json JSON file with the papers from papers.txt found in DBLP
- missing.txt Papers in papers.txt not found in DBLP
- network.txt Set B (non-co-authors). There are 2 entries repeated, so 181 in that set
- dblp_parser.tar.gz Tarball with the dblp_parser_python directory (see below)
In the dblp_parser_python directory:
- dblp2reflection.py Prints the list of papers in DBLP
- dblp2reflection2.py Creates the networks of co-authors and of non-co-authors
- dblp_parser.py Creates JSON output from XML DBLP dump (DBLPParser project)
- main.py Main program of dblp_parser (DBLPParser project)
- LICENSE License of the DBLPParser project
- README.md README.md file of the DBLPParser project
MSR and MODELS intersection script
This script calculates the intersection of the set of authors who have submitted to MODELS and MSR from 2016 to 2022.
Therefore it downloads all conferences papers from DBLP, obtains the authors and looks for the intersection.
Contact: grex at gsyc.urjc.es
Last Modified: May 27th 2023