Chapter 8

Put your logo here!

The Essence of Object Oriented Programming with Java and UML

Chapter 8

Refactoring
 
Up to this point, we've focused on using object orientation to design and develop new programs. Understanding objects, and using object-oriented techniques is one of the best ways to develop programs that are easy to understand, easy to write, and perhaps most importantly, easy to modify and maintain.

The fact is, most programming involves maintaining and modifying existing code. If the existing code is not written in an object-oriented language, then using object orientation probably won't help much with changing that code. If the existing code is written in an object-oriented language, then there is hope. Unfortunately, as we've noted before, just because a program uses Java or C++ doesn't mean it uses OO techniques. And even if the program started out with a decent OO design, if it has been maintained and modified over time, it is likely that it has lost some of its initial elegance.

So, what are you going to do? One of the most recent object-oriented techniques to be formalized and developed into an essential programming tool is called Refactoring. While programmers have always spent at least some time cleaning up their code, refactoring takes this a bit farther. Refactoring is a specific, disciplined approach to improving the design of existing code. With refactoring, the overall design and structure of an existing program is improved, while its observable functionality remains unchanged. Once its design has been improved, it will be easier to maintain.

Software maintenance usually has one of two goals. The first is to fix bugs, and the second is to add features. While the goal of refactoring is neither of these, it can greatly improve how easy it is to do either. At first it may seem a waste to refactor code without adding any new functionality, but trying to modify a poorly constructed program can take far more time and effort than first refactoring, and then modifying. Refactoring can reveal flaws in the structure of the existing code that are the underlying causes of bugs or incorrect behavior.

Ward Cunningham and Kent Beck were two of the first software experts to recognize the importance of refactoring, and to help develop it into a formal technique. The principle refactoring resource is Refactoring, Improving the Design of Existing Code by Martin Fowler. While refactoring can be useful for almost any kind of object-oriented programming, it is an essential part of the eXtreme Programming (XP) methodology (see Chapter 9).

What is Refactoring?

Refactoring should be considered a basic principle of programming, and does not require any special methodology. Refactoring can improve the design of existing code. Over time, as code is changed, it tends to deteriorate. Changes are often made on the fly, under time pressures, without regard for the overall structure of the code. This can lead to code entropy. Refactoring can help undo code entropy.

The Basic Refactoring Process

While the basic refactoring process is not that complicated, experience with programming helps (especially object-oriented programming). The main goal of refactoring is to improve the overall design and structure of an existing program without changing its observable behavior. This means that you don't refactor and add functionality at the same time. Once the refactoring has been done, it will be easier to add functionality.

The first step of refactoring is to start to understand the existing code, usually by reading the code. As part of this process, you will almost certainly find problems with the code that can be improved. Refactoring isn't just about finding problems and making small improvements to code. It is a well-defined and structured technique for improving the code. Each individual refactoring may make only a small difference, but the cumulative effect of applying many refactorings can result in greatly improved overall quality, readability, and design of the code.

Refactorings

The developers of refactoring have identified a list and descriptions of known refactorings that can help improve code. Many of these refactorings are listed and described in Martin Fowler's Refactoring book, and more can be found on the refactoring web site at www.refactoring.com. As you become experienced with refactoring, the many specific refactorings will become familiar, and you will start to recognize cases in code that can benefit from refactoring.

Reduce Risk of Change

Any time you change code, it is risky. You can introduce new bugs. You can change the behavior of the program. You can break things. By using disciplined refactoring, you can reduce the risk of making changes. This is one of the major differences between a simple code clean up and the formal refactoring process. By carefully following the refactoring process, you reduce the risk of making changes to code, while at the same time improving its design and making it easier to change in the future.

Don't Change Functionality

One of the first rules of refactoring is that you do not change functionality of the existing code. By functionality in this context, we mean the outside observed behavior of the code. The program should behave exactly the same before and after the refactoring. If the behavior changes, then it will be impossible to know for sure that the refactoring hasn't broken other things as well. If you need to change the behavior, do that as a separate step. Improve the code with refactoring, then make the change.

One Thing at a Time

In order to be sure that you don't change behavior, it is important to apply only one refactoring at a time. While you are going over the code, you will often find several things that can benefit from refactoring. But to reduce the risk of making changes, refactoring requires that you make only one change at a time.

Test Each Step

Perhaps the single most important principle of refactoring is to thoroughly test the program after each refactoring. This is the one way you have to reduce the risk of change is by ensuring that you haven't changed functionality, or broken other parts of the program. Besides the identification of a large set of known refactorings, the combination of not changing functionality, making only one change at a time, and testing after each step is one most important contributions of refactoring.

Summary

The following list summarizes the refactoring process:

1. Review code to identify refactorings.

2. Apply only one refactoring at a time without changing functionality.

3. Test the refactoring.

4. Repeat to find more refactorings.

When Do You Refactor?

To use refactoring effectively, it is important to know just when to refactor. You don't always need to refactor working code. There are some guidelines for deciding when to refactor.

First, when you are going to add some functionality to a program, you should be prepared to refactor. As we've noted before, one important benefit of object-oriented program design is that the code is easier to maintain and modify. So, when it comes time to add new functionality to a program, it is important that code be as well-designed as it can be. And this is precisely the goal of refactoring. Refactoring should be applied to the code until its design has improved enough to make it easy to modify. Then, after the refactoring, new functionality will be much easier to add.

Refactoring is also useful when you need to find bugs. Part of finding a bug means understanding a program. The fact that a program has bugs often means that the code isn't clear enough to spot them in the first place. Refactoring while you are going over code to hunt for bugs will improve the quality of the code and even reduce the number of bugs.

One important part of almost every project is the code review. For code reviews with just two or three programmers (probably the most productive kind), refactorings can be suggested and applied as the group goes over the code and gets better understanding of its design. In fact, the pair programming of eXtreme Programming can be considered pair code reviews, and refactoring is an important part of XP.

Code Smells

Kent Beck and Martin Fowler have also developed a list of what they call "code smells" to help to determine when to refactor. These are things to look for in existing code that indicate that refactoring is in order. Here's a brief list of some of the smells they've identified:

Duplicate Code - duplicate code means you need to extract some methods.
Long Method - too long is hard to understand, extract methods.
Large Class - a class that does too much needs to be split.
Long Parameter List - makes it hard to read, consider passing objects.
Divergent Change - code degradation as a result of too many chaotic changes to a class.
Shotgun Surgery - too many undisciplined changes to classes and attributes.
Feature Envy - one class is interested in too many details of another class.
Data Clumps - data that is used together everywhere should be in a class of its own.
Primitive Obsession - a program can use too many primitive data types that should really be part of a class.
Switch Statements - switch statements can mean you are not using polymorphism effectively.
Parallel Inheritance Hierarchies - repeating class definitions in parallel classes is more duplication to eliminate.
Lazy Class - a class should do enough to pay its own way or be eliminated.
Speculative Generality - designing for future flexibility before it is needed can increase complexity unnecessarily.
Message Chain - too many messages in a chain are hard to follow.
Middle Man - sometimes it is better to work with an object directly.
Inappropriate Intimacy - classes shouldn't need to know too much about each other.
Incomplete Library Class - sometimes you can't get it all and need to do some of yourself.
Data Class - classes need something to do.
Refused Bequest - subclasses should use most of what their parents give them.
Comments - could a comment be eliminated by providing a better name for a method or variable?

When not to refactor

Knowing when not to refactor is also important. One of the main reasons not to refactor is when the code is so bad that it needs to be rewritten from scratch. Eventually, code can become so outdated, so difficult to understand, or so buggy, that it would be more cost effective to start over than try to fix it or add new features to it.

This decision can also apply to code that is written in a non-object-oriented language. Most refactorings apply to object-oriented languages. Obviously, these refactorings will have limited use for non-OO languages. It may be time to rewrite the program using an OO language.

Some Refactorings

The identification of specific refactorings gives you a catalog of things to look for in existing code. Each refactoring has been given a name, much like the design patterns we discussed in Chapter 7. There are many more refactorings than there are design patterns, and most refactorings are much simpler and easier to understand.

In this section, we will go over a few refactorings to give you the flavor of just what they are. It is not the goal here to make you into a master of refactoring, but to give you an idea of some of the specifics. Just as design patterns belong in every good programmer's toolbox, refactoring has its place there, too.

Refactoring Categories

Over 70 specific refactorings have been identified in the Refactoring book, and many more are identified on the refactoring web site, with more being added all the time. The refactorings have been organized into the following categories. Specific individual refactorings are show in italics. This summary only mentions a small fraction of the total number of refactorings.

Composing Methods

One common problem comes from code with methods that are too long. The Composing Methods group of refactorings are intended to help reduce the size of methods, and help to improve the readability of the code by replacing sequences of code with calls to methods that are built from the original code. Refactorings in this category include Extract Method, Inline Method, and Replace Temp with Query.

Moving Features Between Objects

During object design, one important decision is where to place various responsibilities. Sometimes, a given responsibility will be placed in the wrong class. Some classes will end up with too many responsibilities. Such refactorings as Move Method, Move Field, and Extract Class can be used to help put responsibilities where they belong.

Organizing Data

Sometime objects can be used instead of simple data items. Refactorings such as Replace Data Value with Object or Replace Array with Object can be used to make working with a class easier. They can also make it clearer what the data item is being used for, and easier to work with.

Simplifying Conditional Expressions

Conditional expressions can be some of the most complicated and confusing parts of any program to understand. Such refactorings as Decompose Conditional or Consolidate Duplicate Conditional Fragments can be used to simplify code.

Making Method Calls Simpler

Defining the interface to a class can be difficult. Just what the methods are named, and how they are called can lead to confusion or simplicity. Refactorings such as Rename Method, Add Parameter, and others from this category can help improve the interface to a class.

Dealing with Generalization

One of the guidelines we discussed for good object design was to move methods as high up the inheritance hierarchy as possible. Getting methods and subclasses in just the right place is the goal of this category of refactorings. Some of them include Pull Up Method, Push Down Method, Extract Subclass, and Extract Superclass. All are meant to help refine the inheritance hierarchy.

Some Specific Refactorings

In Refactoring, each of the individual refactoring descriptions includes the name of the refactoring, a short description of the problem, a short description of the solution, followed by a more detailed discussion of the motivation for using the refactoring, and a discussion of the mechanics for carrying out the refactoring.

Currently, refactoring is a mostly manual operation. It is up to the programmer to identify specific refactorings, and then actually rewrite the code. As this book is being written, a few refactoring software tools are beginning to appear that can help with some of the mechanical aspects of the different refactorings. The refactoring web site, www.refactoring.com, keeps information about the latest refactoring tools.

The following descriptions of some specific refactorings are neither complete, nor intended to imply that these are the most important refactorings. They were chosen simply to illustrate some typical refactorings from each category.

Extract Method

Extract Method is used when you have a code fragment that can be grouped together. That code is extracted and turned into a method whose name will clearly explain the purpose of the method. Short, well-named methods can make code clearer. A well-named method can eliminate the need for a comment. Sometimes you will even find duplicated code that belongs in a method.

Replace Temp with Query

Replace Temp with Query is used when you find a temporary variable being used to hold the results of an expression. By extracting the expression into a method, and then replacing all references to the temp with the method call, the meaning can be clearer, and you will be able to reuse the method in other places.

Move Method

If you find that a method is being used more often by another class than the one where it is defined, Move Method can be used to move the method to the other class. You remove the original definition, and invoke the new method from the original class.

Extract Class

If you find that you have one class that is doing work that should really be done by two, use Extract Class. You will create a new class and move the relevant methods and attributes from the old class into the new one.

Decompose Conditional

One way to improve complicated conditional statements is to extract the code that makes up the then and else parts into methods with meaningful names. This will reduce the complexity, make the statements more meaningful to read, and often result in methods that can be reused.

Rename Method

Rename Method is one of the simplest refactorings, yet is can lead to code that is much easier to understand. If the name of a method (or even a variable) does not indicates its purpose, then it should be renamed so that it is meaningful.

Pull Up Method

If you find methods in different subclasses that have identical results, then you can use Pull Up Method to move them to the superclass. Eliminating this duplicate behavior will make the code easier to maintain and understand.

Extract Subclass

If a class has methods that are used by only some of the instances of that class, it indicates that those instances should have their own subclass. Extract Subclass is used to extract those features into a new subclass.

Chapter Summary

Refactoring is a programming tool that can improve the design of existing code.
A major goal of refactoring is to reduce the risk of change by providing a well-defined approach to improving code.
Several things indicate you need to refactor, including "code smells."
There are have been many specific refactorings discovered in several categories.

Resources

Refactoring, Improving the Design of Existing Code, Martin Fowler, Addison-Wesley, 1999, ISBN 0-201-48567-2.

Refactoring web site: www.refactoring.com.

The Essence of Object-Oriented Programming with Java and UML (DRAFT)

Copyright © 2001 by Addison-Wesley

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the publisher. You may view this document on a web browser, but you must not mirror or make local copies.