Friday, July 29, 2011

Find bad coding practices using regular expressions

I have been using eclipse for last 8 years and very glad to be a witness of its continuous improvement. Specifically I mean its compilation warnings. Number of warnings it is able to produce is growing from version to version and it is great. I typically enable most of them and believe that it improves my code.

Unfortunately some warnings are still absent. Sometimes people use concrete classes or too specialized interfaces where interfaces should (and almost must) be used. Here are the code examples that irritate me:

// Using concrete class to the left of the assignment
ArrayList<String> a = new ArrayList<String>();

// Using concrete class into the generics
new ArrayList<ArrayList<String>>

// Using concrete class as an method parameter or return type
HashSet<Integer> foo(ArrayList<String> list){}

I personally never write such code but I know people that do.
When I see such code I want to fix it. But how find all places? Eclipse does not create warnings about any of these problems. Fortunately Eclipse supports search using regular expressions. I decided to write several regular expressions that can help to solve this problem.

Here are the expressions:

// left side of assignment
// [A-Z]\w+(List|Set|Map|Bean|Impl)\s*(<.*?>)?\s+\w+\s*=

// nested generics
// <\w+(List|Set|Map|Bean|Impl)\b[\w,\s<>]*>

// method argument
// \s+\w+\s*\([^)]*[A-Z]\w+(List|Map|Set)\b

// return type
// \w+(List|Set|Map|Bean|Impl)\s*(<.*?>)?\s+\w+\s*\(.*\{

I used them on pretty large codebase and found useful. I tried to write pattern that will be as short as it is possible and works without false negatives and with minimum false positives.


Obviously regular expression cannot solve 100% of problems. The method assumes specific naming conventions. If for example I call my interface Worker and the class that implements it WorkerImpl this method will find expression like WorkerImpl worker = new WorkerImpl(). But it will not work for the following code sample:

class Producer implements Runnable {
    public void Runnable() {
        // some code

Producer p = new Producer();
new Thread(p).start();

Obviously that this code sample uses the instance of Producer as Runnable only and therefore should be written:

Runnable p = new Producer();
new Thread(p).start();

Unfortunately no regular expression can find this situation. 

Here is yet another limitation. Sometimes people use "wrong" interface. Here is the code sample:

List<String> list = new ArrayList<String>();
for (String s : list) {/*do somthing*/}

List is not needed here. It should be replaced by Collection:
Collection<String> list = new ArrayList<String>();
for (String s : list) {/*do somthing*/}

Why it is important? Probably in future you will decide to store the elements in Set and still be able to iterate over elements. In this case only the right part of assignment should be chaged:

Collection<String> list = new LinkedHashSet<String>();

for (String s : list) {/*do somthing*/}

It is not a problem when code that declares variable collection and iterates over it is in the same class or even method. But if the collection is create somewhre and then is passed over several layers to code that iterates over it changing method signature of 50 methods from List to Set (or, better to Collection) may take a lot of time.


Regular expression can help us to locate bad coding practices. Although the method cannot find all problems and sometimes produces "false negatives" it was tested on large code base an worked well enough. But "real" solution can be implemented only on IDE level. Here is a link to bug report that I created I hope eclipse team will implement this suggestion.

No comments:

Post a Comment