Lately I've consulted for several clients where I was asked to drop into
an existing large Django project code base to triage issues and build
enhancements. Working on an established code base is common whether you're
a new developer on a team or like me working on consulting projects.
There are many attributes of an existing project: the code (and
documentation), infrastructure, deployment procedures, and third-party
services. While reviewing each one of these areas I get either a good feeling
I'll be able to accomplish my work quickly, or a terrible feeling that
the pre-work to triage issues will dwarf the scope of the work requested
by the client.
Here is the checklist I now use (and add to when I uncover something new)
after having gone through this process of evaluating established projects
many times.
Code base
The code repository is where most developers will dive in and try to
understand the project. A README file that screams "start here!" is a must.
README
- A brief (one paragraph or bullet points) explanation of what the project
is, its purpose, and goals to provide context for developers
- Important links to development wikis, testing, staging, and production
servers
- Contact information for anyone critical to the project.
- "Getting Started" section that bootstraps developers from a blank
slate to a functional development environment. Included even if you're
using Vagrant (to show a new developer how to use Vagrant)
- Instructions for special software required for a fully-functioanl site.
For example, how to set up Solr and build the search indexes, or how to
locally replicate the database from a test or production environment
- Basic deployment information and/or a link to detailed deployment
instructions (even if this is just how to use the included Fabric or
Capistrano scripts)
Project Structure
- Is this a flat 1.3 (and previous versions) Django project structure?
- Is this a standard Django 1.4+ project structure with urls.py, wsgi.py,
and settings.py in a subdirectory under the project's root directory?
- Does this project use a custom layout with a mangled manage.py?
For example, do the individual app directories live in the project's root
directory or are they found in a subdirectory such as "apps"?
Dependencies
- Is the requirements.txt located in the project's root folder or under
a subdirectory such as env?
- Does the requirements.txt file itself point to another text file such
as production.txt located in a subdirectory?
- Are all the requirements pegged to specific version numbers? If not,
how fast can you run away from this project?
Configuration
- Is the settings.py file located in the root directory or under the
project name's subdirectory as standard in Django 1.4+?
- Is there a local_settings.py template file?
- Are environment variables required? What's the proper way to set those
variables?
Data
- Which database is the project tested to run with? MySQL, PostgreSQL?
- Are there custom SQL scripts that can only be used with a specific
database? For example, if the project is designed to run on MySQL are
there SQL primary keys with autoincrement that would be confusing to
translate to PostgreSQL?
- Does this project have test data? Is the test data per app, or loaded
for the entire project?
- How is test data loaded? Data data generation scripts, fixtures, database
dump SQL scripts, or a database replication scheme?
- If by fixtures or unsure, use "grep -r fixtures.json *.py" and
"find . -name '*.json'" to find one or more JSON fixture files.
Tests
- Are there any tests? If so, do the tests.py files contain anything more
than the boilerplate 1+1=2 example test function?
- Is there a custom test runner? Is django-nose or django-jenkins used to
run the tests?
- Are tests grouped into a single tests.py file or are they under a tests/
subdirectory under each app?
- Are there coverage reports generated for the existing code?
Infrastructure
Infrastructure is the actual physical or virtualized hardware, or
platform-as-a-service, that the project is deployed to. Production, staging,
testing, and development servers not on an individual developer's machine
are included in this category.
Note that there are certain things I'm leaving out here. I firmly believe
there is little to no value to "enterprise architecture" type diagrams that
are not generated dynamically. The amount of man hours I've witnessed
ivory tower architects waste on drawing UML diagrams makes my blood boil.
Environments
- Is there a master "environments document" that is autogenerated with
information on the infrastructure?
- What environment is the project designed to run in?
- Linux? What distribution and version? What packages need to be installed?
- Is this project meant to run on a platform-as-a-service? Which one? Heroku?
- Is there a content delivery network necessary for static and media files?
How are the files synced when there is a static files change during
deployment?
- Are there separate servers for the database and web? Are there separate
caching servers? (Obviously this question can scale to many other questions
depending on the infrastructure size.)
- What are the IP addresses of existing servers?
- Are the testing, staging, and production environments in sync? How do you
know if they go out of sync?
Access Rights
- Are the environemnts accessible with a username and password?
- Is a public encryption key in the authorized_keys file needed to login?
Who already has access to add a new key or is there a general key?
- Is root access to a box by SSH denied (as it should be)?
Deployment
- Generally speaking, how is a deployment done?
- Are Fabric, Capistrano (railsless-deploy), or shell scripts used?
- Is Ansible, Salt Stack, Puppet, or Chef used?
- What are the purposes of various users? For example, if there is a
deployment user as well as another general purpose user, which one
should be used for debugging?
Third-party services
Most Django projects combine custom apps with third party services,
such as Twilio, Stripe, New Relic, and Intercom.io, to create a complete
product. Which ones are used in the project, do they fail gracefully when
their APIs are down or inaccessible, and who has admin access rights to
the services?
- What third party services are used with this project?
- How are the third party services tested locally?
- What are the usernames and passwords for the services?