Two Engineers Explain How the EdX Code Is Evolving

edX’s code is 90 % open source and 10 % proprietary, Ned Batchelder, an edX key engineer, explained in an interview for Pycharm Blog.

“Over 90% of what edX develops is open source. Most of the proprietary code concerns how edX markets and lists its courses, and Open edX includes an open source version used by our community members,” elaborated.

There are 206 repositories on Github, including Django packages and other libraries. The edX core team is comprised of 80 engineers in Cambridge, Massachusetts, and another 70 considering contractors –most of them from Arbisoft, in Pakistan, and regular contributors to the community.

LANGUAGES AND TECHNOLOGIES

  • “Python is used for the vast majority of code for Open edX. We use Django for our web applications, including the Django REST framework.
  • On the front-end, we have a lot of legacy Backbone.js and Underscore.js, but are slowly moving more and more to React. We also use Sass and Bootstrap.
  • EdX.org is hosted on AWS. Some example technologies we host there include Memcached, ElasticSearch, MySQL, and Mongo. We use a mix of CloudFlare and CloudFront for CDN.
  • For development, Continuous Integration, and deployments we use a mix of Docker, GitHub, Jenkins, GoCD, Asgard, and Terraform, among others.
  • In addition to using Python with Django, we also use Python for various scripts, linters, testing frameworks, and data analysis. My colleague Cale tells me he used a combination of ipython notebooks, pandas, and ggplot for his analysis work.
  • Finally, like any undertaking this large, we’ve got our special snowflakes like our Ruby-based discussion service that no one wants to work on except to rewrite it in Python, which still hasn’t happened.”


MOVING TO PHYTON 3

More than 95% of the entire codebase of the edX platform is in Python. One of the reasons is that edX started as an MIT project, and MIT’s teaching language is Python.

“We are still using Python 2. We’ve been making some advances toward Python 3 where and when we can. As part of our Django upgrade process, we recently introduced tox in most of our repos to test against these various combinations. We’ll likely be switching to Python 3 with the rest of the Django community as they drop support for Python 2,” said Ned Batchelder.

LEGACY SOFTWARE

“One of the main challenges is keeping track of such a large codebase. We are trying to introduce more and more best practices as we can, and move more and more of the codebase in the right direction, but we have a lot of legacy to work with at this point. Like many development efforts, we have a big monolith as one of the many components of our architecture, and we are trying to work towards an architecture that is split enough, but not too much. It is a balancing act that is difficult to get right,” explained Robert Raposa.

PYCHARM

Regarding PyCharm, Robert Raposa said:

“We probably have about 40 developers using PyCharm. Many of the other developers end up using some combination of technologies like sublime, vim, and pdb.”

“Many people choose PyCharm for its debugging capabilities, as well as having an editor that understands Python. When you watch someone debug in a modern IDE, it is hard not to want to be able to do the same. For PyCharm users, we often use debugging, refactoring, autocompletion, version control, find definition or class, and PyCharm has great support for these technologies.”

DEVELOPMENT ENVIRONMENT

On the development environment, “we’ve migrated our development environment from Vagrant to Docker. It was in tandem with PyCharm adding more and more Docker support. There have been some hiccups on this front, but it is nice to still be able to debug,” explained Mr. Raposa.