Macro-level software evolution: a case study of a large software

  • Jesus M. Gonzalez-Barahona, Universidad Rey Juan Carlos
  • Gregorio Robles, Universidad Rey Juan Carlos
  • Martin Michlmayr, Open Source Program Office, HP
  • Juan Jose Amor, Universidad Rey Juan Carlos
  • Daniel M. German, University of Victoria


Software evolution studies have traditionally focused on individual products. In this study we scale up the idea of software evolution by considering software compilations composed of a large quantity of independently developed products, engineered to work together. With the success of libre (free, open source) software, these compilations have become common in the form of `software distributions', which group hundreds or thousands of software applications and libraries into an integrated system. We have performed an exploratory case study on one of them, Debian GNU/Linux, finding some significant results. First, Debian has been doubling in size every 2 years, totalling about 300 million lines of code as of 2007. Second, the mean size of packages has remained stable over time. Third, the number of dependencies between packages has been growing quickly. Finally, while C is still by far the most commonly used programming language for applications, use of the C++, Java, and Python languages have all significantly increased. The study helps not only to understand the evolution of Debian, but also yields insights into the evolution of mature libre software systems in general.



Gonzalez-Barahona, J. M., Robles, G., Michlmayr, M., Amor, J. J., German, D. M. (2009). Macro-level software evolution: a case study of a large software compilation. Empirical Software Engineering, 14(3). 262–285.