Steven E. Newton
Crater Moon Development
A manufactured product, say a chair or a pitcher, is fixed in its form when delivered for sale. Although the maker may, over time, make improvements to the design of the manufactured object, the existing objects of the type are rarely retrofitted with the new features. It is simply too little gain for the expense.
Software, conversely, is not fixed when delivered. There are certainly costs associated with building, shipping, and delivering the patches or upgrade. When those are done, the user is, for all practical purposes, actively working with exactly the same instance of software as before. Compared to the cosmetic changes it’s possible to make to a manufactured item, such as painting it a different color, changes made to software can be radical re-buildings, yet the functioning and use remains the same, in essence, but with the changes – improvements, we hope.
Because most programming is reading and modifying existing code, programming requires ways to understand it. Programmers may spend 80% of their time working with existing code. Perhaps surprisingly, studies suggest about %20 of their time is spent fixing bugs. The majority of maintenance time is spent adding or modifying functionality.
Code Reading: The Open Source Perspective by Diomidis Spinellis
Be able to build and run the program. You’ll eventually need to be able to do it anyway, so start with it first.
Use a code comprehension tool. Your IDE’s browser. CScope. Anything giving a visual high level overview of the code. Sometimes a simple count of the number of source files and a count of each file will reveal where the interesting and complex parts of the program are. If it’s an object-oriented program and you have a UML tool supporting reverse engineering the language, it may be useful to generate UML.
Reformat and cleanup the code according to a single coding style. Fix comments. Rename variables, methods, anything with s a confusing or unclear name. Preferably use a good refactoring tool to ensure correctness of the changes. However, some languages/code are limited to a simple search/replace. Be careful obviously.
Run the unit tests. No tests? Write some before you make changes.
Add or extend any logging or instrumentation available. Adding “test leads” to the running production code may be the best investment of effort. The JMX APIs are a good example of how to do this.
Grok the dependencies. Libraries, filesystem layout, app containers. Whatever the program depends on having in its environment is probably important. Recently, in trying to build a small library of C code using sockets and fifos, I discovered a set of assumptions tied to GNU/linux conventions, and it was a struggle to make it build on OS X, even though both are ostensibly Unix, and the program was not otherwise platform-dependent.
Find the entry point. There may be a “main” method, or if it’s an extension or library, there will be some protocol or interface the code adheres to provide the primary entry point.
For libraries and components without a main, start by understanding how it implements the protocol, interface, or API necessary. A simple example might be a Java servlet. A servlet is defined by its implementation of methods like doGet, doPut, and so forth. Those methods would be the first candidates for examination.
- Who calls this method? - Gives clues to the role(s) it plays
- Who implements this interface or subclasses this class? – gives clues to the variability in the system.
- What are this class’ super-classes? Not only the immediate, but up the hierarchy. – gives clues to the responsibilities of the class and hierarchy
- Where are instances of this class created, held, passed as arguments or returned? - helps understand the dynamics of the system.
- Where might methods of this class be called polymorphically, through a superclass or interface? - aids in understanding dependencies and variability
Things to instrument:
- Entry/exit to key sections of code.
- Assert assumptions.
- Changes to important variables.
- State changes.
- “Impossible” errors. (especially empty exception catch blocks)
- resource allocation and release.
- security challenges, successes and failures.
Automate building a set of documents from the source. Doxygen, eg. works with many languages. A tool like lxr will index and cross-reference the code.
General Cleanup Steps
Read through and reformat as you go. Fix to conform to coding standards. Fix comments and spacing. Remember to check in the reformatting before doing real work, to keep change log comments and diffs separate and clear.
Working Effectively with Legacy Code by Michael Feathers
Refactoring: Improving the Design of Existing Code by Martin Fowler