Scalable Relevant Project Recommendation on GitHub


GitHub, one of the largest social coding platforms, fosters a flexible and collaborative development process. In practice, developers in the open source software platform need to find projects relevant to their development work to reuse their function, explore ideas of possible features, or analyze the requirements for their projects. Recommending relevant projects to a developer is a difficult problem considering that there are millions of projects hosted on GitHub, and different developers may have different requirements on relevant projects. In this paper, we propose a scalable and personalized approach to recommend projects by leveraging both developers’ behaviors and project features. Based on the features of projects created by developers and their behaviors to other projects, our approach automatically recommends top N most relevant software projects to developers. Moreover, to improve the scalability of our approach, we implement our approach in a parallel processing frame (i.e., Apache Spark) to analyze large-scale data on GitHub for efficient recommendation. We perform an empirical study on the data crawled from GitHub, and the results show that our approach can efficiently recommend relevant software projects with a relatively high precision fit for developers’ interests.

Proceedings of the 9th Asia-Pacific Symposium on Internetware