This is the reproduction package of our paper. The reproduction package contains over 30,000 files with over 1,3Gb of information. In addition to the raw and processed information, we provide the scripts used to retrieve, clean and analyze the data.
We use GrimoireLab Perceval for the retrieval and gambit for disambiguation.
A. Raw data sources (as retrieved with Perceval)
Gzipped JSON files obtained by running Perceval on repositories from the four ecosystems under study.
Per repository, we have three JSON files: one for commits, another for pull requests and a third one for issues.
- CoAP
- LwM2M
- NB-IoT
- Zigbee
B. Raw data sources (as retrieved from the GitHub API)
Repository meta-information obtained by mining for repositories from the four ecosystems under study.
Per repository, a JSON file is given, with metainformation such as owner, license, among others.
- CoAP
- LwM2M
- NB-IoT
- Zigbee
C. Python scripts
Python scripts used for the analysis of the ecosystems.
- CoAP
- LwM2M
- NB-IoT
- Zigbee
In particular the scripts do the following:
- analyze-authors-jsons.py: Prints information on the ecosystems and returns several CSV files for further analysis.
- analyze-jsons.py: Returns a CSV with information on commits, issues and pull-requests for the repositories.
- analyze-projects-csv.py:
- get_*_projects.py: where * is the name of the ecosystem (coap, lwm2m, nb-iot, zigbee). Calls the GitHub API to obtain meta-data on repositories from the ecosystem (and store them in the repos-info subdirectory, see above)
- get_github_repo_information.py: Extracts information from the JSONs retrieved from the GitHub API. Requires GitHub API Token
- get_licenses.py: Extracs license information from the GitHub API retrieved JSONs
- github-open-iterate.py: Calls github-open-with-error-log.py iteratively.
- github-open-with-error-log.py: Calls Perceval to retrieve JSONs with commit, issue and pull request data from repositories. Requires GitHub tokens and Perceval.
- git-retrieve-log.py: Calls perceval to retrieve commit log information from repositories.
D. Companies information
Company information and scripts
E. Other files
Other files used for the analysis of the ecosystems.
- CoAP
- LwM2M
- NB-IoT
- Zigbee
In particular the files do the following:
- run.sh: Obtains the list of repositories related to the standard
- github-repos-*-date-number.json: Result of run.sh. Repositories related to the * ecosystem (coap, lwm2m, nb-iot, zibgee)
- projects.csv: CSV with information on the repositories under study