Reproduction Package

A Reflection on the Impact of Model Mining from GitHub

This is the reproduction package of the "A Reflection on the Impact of Model Mining from GitHub" paper we have sent to the IST journal. It contains all the scripts and data we have analyzed, and should serve both to verify our results and to build on top of them.


Google Scholar analysis

A tarball (gscholar.tar.gz) with following files:


Paper Coding

Spreadsheet (PapersCitingOurPapers-2022-11-16.xlsx) with the coding done by the authors. The spreadsheet contains four tabs:


DBLP Analysis

Scripts that have been used to analyze DBLP.

The scripts use the DBLP XML database dump as of Dec 7th 2022, which is converted into JSON with the help of DBLPParser (files included).

The resulting JSON file is parsed then to find the papers citing our research and their authors (dblp2reflection.py)

From the list of authors, we obtain the co-author and non-co-author networks (dblp2reflection2.py)

In the root directory:

In the dblp_parser_python directory:


MSR and MODELS intersection script

This script calculates the intersection of the set of authors who have submitted to MODELS and MSR from 2016 to 2022.

Therefore it downloads all conferences papers from DBLP, obtains the authors and looks for the intersection.


Contact: grex at gsyc.urjc.es
Last Modified: May 27th 2023