PhD thesis

My PhD thesis is devoted to the analysis of both temporal and behavioral patterns in the use of Wikipedia and I want to continue this line of research by studying several topics concerning the interaction between users and the different Wikipedia edition. Among others, I am studying the temporal series resulting from the observation of the evolution over time of the different types of requests, the dynamics governing the promotion to the featured status in several editions of the Encyclopedia and the possibilities brought by the geolocation of users' requests.


Wikipedia stands as the most important wiki-based platform and continues providing the overall society with a vast set of contents and media resources related to all the branches of knowledge. Undoubtedly, Wikipedia constitutes one of the most remarkable facts in the evolution of encyclopedias and, also, a complete revolution in the area of knowledge management. Perhaps, its most innovative aspect is the underlying approach that promotes the collaboration and cooperation of users in the building of contents in a voluntary and altruistic manner.

The growth of Wikipedia has never stopped since its beginning as well as its popularity. In fact, the number of visits to its different editions has placed its web site within the top-six most visited pages all over the Internet. Such kind of success has spread the use of Wikipedia beyond typical academic environments and has made it become a complete mass phenomenon.

Due to this significant relevance, Wikipedia has revealed as a topic of increasing interest for the research community. However, most of the developed research is concerned with the quality and reliability of the offered contents. This previous research focuses on subjects such as reputation and trust, or addresses topics related to the evolution of Wikipedia and its growth tendencies. By contrast, this thesis is aimed to provide and empirical study and an in-depth analysis about the manner in which the different editions Wikipedia are being used by their corresponding communities of users. In this way, our main objective is the finding of temporal and behavioral patterns describing the different kinds of contents and interactions requested by Wikipedia users.

Users' requests are expressed in the form of URLs submitted to Wikipedia as a part of the traffic directed to its supporting servers. The analysis presented here, basically, consists in the characterization of this traffic and has been developed by parsing and filtering the information elements extracted from the URLs contained in it. As we, necessarily, have had to work with a sample of all the requests to Wikipedia due to their incommensurable volume, we have, first, validated our results comparing them with trusted sources.

After having analyzed the traffic to Wikipedia during a whole year, this study presents a complete characterization of the different types of requests that make part of it. Furthermore, we have found several patterns related to the temporal distributions of such kind of requests as well as to the actions and contents involved in them. The influence of the most frequently searched topics and other contents positively considered by the community, as the featured articles, in the attention that articles get is also considered as a matter of interest. Finally, we have also analyzed the different categories of articles that attract more visits and search operations in the considered editions of Wikipedia.

Most of the objectives accomplished here are based on the results provided by the application developed ad-hoc to feed this study. The software engineering of this tool has been undertaken under the WikiSquilter project. We expect that this application can serve as a useful tool to characterize the traffic directed to wiki-based sites, particularly to any project supported by the Wikimedia Foundation.

Up to this work, no other analysis had been undertaken to study the use of Wikepedia in such a wide and thoroughgoing way. We hope that our efforts and results can serve as a significant contribution in the examination of the dynamics of use when interacting with knowledge management platforms like Wikipedia.

Related publications

  • Temporal characterization of the requests to Wikipedia

  • 5th International Workshop on new Challenges in Distributed Information Filtering and Retrieval (DART'11)

    dart11pdf  Pdf   dart11bib Bibtex citation

  • A quantitative examination of the impact of featured articles in Wikipedia

  • International Conference on Software and Data Technologies (ICSOFT'11)

    dart11pdf  Pdf   dart11bib Bibtex citation

  • A statistical approach to the impact of featured articles in Wikipedia

  • International Conference on Knowledge Engineering and Ontology Development (KEOD'10)

    dart11pdf  Pdf   dart11bib Bibtex citation

  • A quantitative approach to the use of the Wikipedia

  • IEEE Symposium on Computers and Communications (ISCC'09)

    dart11pdf  Pdf   dart11bib Bibtex citation

  • Quantitative analysis and characterization of Wikipedia requests

  • ACM WikiSym 2008: 4th International Symposium on Wikis (WikiSym'08)

    dart11pdf  Pdf   dart11bib Bibtex citation

  • Workshop on interdisciplinary research on Wikipedia and Wiki communities

  • ACM WikiSym 2008: 4th International Symposium on Wikis (WikiSym'08)

    dart11pdf  Pdf   dart11bib Bibtex citation


You can download the Pdf document from this link: Doctoral thesis


Given the case that you consider to cite my work you can download the corresponding Bibtex entry from the following link: Bibtex citation

Thesis advisor

Jesús M. González Barahona (jgb_at_libresoft_dot_es)


Committee President: Carlos Delgado Kloos (cdk_at_it_dot_uc3m_dot_es)

Member: Rocío Muñoz Mansilla (rmmunoz_at_dia_dot_uned_dot_es)

Member: Israel Herraiz Tabernero (israel_dot_herraiz_at_upm_dot_es)

Member: Eloisa Vargiu (vargiu_at_diee_dot_unica_dot_it)

Committee Secretary: Gregorio Robles Martínez (grex_at_libresoft_dot_es)

Dissertation slides

You can download the dissertation slides from this link: Dissertation slides