Friday, July 29, 2011

Find bad coding practices using regular expressions

I have been using eclipse for last 8 years and very glad to be a witness of its continuous improvement. Specifically I mean its compilation warnings. Number of warnings it is able to produce is growing from version to version and it is great. I typically enable most of them and believe that it improves my code.

Unfortunately some warnings are still absent. Sometimes people use concrete classes or too specialized interfaces where interfaces should (and almost must) be used. Here are the code examples that irritate me:

// Using concrete class to the left of the assignment
ArrayList<String> a = new ArrayList<String>();

// Using concrete class into the generics
new ArrayList<ArrayList<String>>

// Using concrete class as an method parameter or return type
HashSet<Integer> foo(ArrayList<String> list){}

I personally never write such code but I know people that do.
When I see such code I want to fix it. But how find all places? Eclipse does not create warnings about any of these problems. Fortunately Eclipse supports search using regular expressions. I decided to write several regular expressions that can help to solve this problem.

Here are the expressions:

// left side of assignment
// [A-Z]\w+(List|Set|Map|Bean|Impl)\s*(<.*?>)?\s+\w+\s*=

// nested generics
// <\w+(List|Set|Map|Bean|Impl)\b[\w,\s<>]*>

// method argument
// \s+\w+\s*\([^)]*[A-Z]\w+(List|Map|Set)\b

// return type
// \w+(List|Set|Map|Bean|Impl)\s*(<.*?>)?\s+\w+\s*\(.*\{

I used them on pretty large codebase and found useful. I tried to write pattern that will be as short as it is possible and works without false negatives and with minimum false positives.


Obviously regular expression cannot solve 100% of problems. The method assumes specific naming conventions. If for example I call my interface Worker and the class that implements it WorkerImpl this method will find expression like WorkerImpl worker = new WorkerImpl(). But it will not work for the following code sample:

class Producer implements Runnable {
    public void Runnable() {
        // some code

Producer p = new Producer();
new Thread(p).start();

Obviously that this code sample uses the instance of Producer as Runnable only and therefore should be written:

Runnable p = new Producer();
new Thread(p).start();

Unfortunately no regular expression can find this situation. 

Here is yet another limitation. Sometimes people use "wrong" interface. Here is the code sample:

List<String> list = new ArrayList<String>();
for (String s : list) {/*do somthing*/}

List is not needed here. It should be replaced by Collection:
Collection<String> list = new ArrayList<String>();
for (String s : list) {/*do somthing*/}

Why it is important? Probably in future you will decide to store the elements in Set and still be able to iterate over elements. In this case only the right part of assignment should be chaged:

Collection<String> list = new LinkedHashSet<String>();

for (String s : list) {/*do somthing*/}

It is not a problem when code that declares variable collection and iterates over it is in the same class or even method. But if the collection is create somewhre and then is passed over several layers to code that iterates over it changing method signature of 50 methods from List to Set (or, better to Collection) may take a lot of time.


Regular expression can help us to locate bad coding practices. Although the method cannot find all problems and sometimes produces "false negatives" it was tested on large code base an worked well enough. But "real" solution can be implemented only on IDE level. Here is a link to bug report that I created I hope eclipse team will implement this suggestion.

Tuesday, July 26, 2011

Automatic detection of debugger

Yesterday I was debugging test case that consists of a few similar unit tests. All unit tests call some API that executes tasks with background thread. I can verify the test results only when thread is done.

The "right" solution is to call the asynchronous API and then call wait(). The thread should call notify() when it is done. The test thread will continue and validate the results. In practice I cannot modify the background thread and add notify() there. It is too deep into the application I am working on. Moreover I do not want to modify production code to help myself writing unit tests. So I decided to use "simple" solution: each my test calls the asynchronous API, then invokes Thread.sleep(100L), then verifies the results. 100 milliseconds are enough for the asynchronous task to complete.

But when I am debugging my code that is invoked by background thread I do not want the test to be terminated. Actually I want my test to sleep infinitely. I changed the parameter of sleep() many times and thought that actually I need automatic mechanism that understands that code is being debugged now and chooses sleep period automatically. I checked system properties and did not see any difference when test is running normally or being debugged.

Other idea was to try JMX. Really, Runtime MBean can help here:

ManagementFactory.getRuntimeMXBean().getInputArguments(). According to javadoc this method:

Returns the input arguments passed to the Java virtual machinewhich does not include the arguments to the main method.

So this is exactly what I need!

Here are 2 examples of return value of this method:

  • Running program with remote debugger:
[-Xdebug, -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=n]

  • Debugging program under eclipse:
[-agentlib:jdwp=transport=dt_socket,suspend=y,address=localhost:49709, -Dfile.encoding=UTF-8]

The following utility method returns true if program is running under debugger and false otherwise:

public static boolean isDebugging() {
    Pattern p = Pattern.compile("-Xdebubg|jdwp");
    for (String arg : ManagementFactory.getRuntimeMXBean().getInputArguments()) {
        if (p.matcher(arg).find()) {
            return true;
    return false;

I hope that regular expression I used here is general enough to support other IDEs.

Now we can use this code when we need different behavior of our code being executed normally or being debugged.

Tuesday, July 12, 2011

Performance of method invocation by reflection

One day discovering code in project I am working on I found the following code:

try {
    module.getClass().getMethod(methodName, Serializable.class).invoke(module, message);
} catch (Exception e) {
    throw e;

Theoretically I knew that reflection works slower than direct invocation. But how slower?
Due to this code was found in the very performance critical part of the system I decided first to perform some benchmarking. I wrote class that contains one method foo() that does nothing and implemented 3 scenarios of invocation and ran them 1 million times:

  1. direct invocation
  2. invocation using reflection when getMethod() was called once
  3. invocation using reflection when getMethod() was called on each loop iteration. 
And here are the results.
  1. direct invocation took 5 ms
  2. reflection took 38 ms
  3. getMethod() + reflection invocation too 435 ms
This means that reflection generally can be used even in systems that are required to perform fast if getMethod() is not done for each invocation separately. 

Thursday, July 7, 2011

File access: stream vs nio channel

Java provides 2 ways to access files: using streams and NIO channels. Streams implement blocked IO: the read() method is blocked until some content is available. Channels give us ability to read content without being blocked, It allows for example to avoid allocating special thread per data source (file, socket etc.)

While this advantage is very important when communicating over sockets it probably less relevant when reading and writing files. At least the code looks pretty the same. I decided to compare performance of streams an NIO when reading and writing files sequentially.

I wrote simple program that copies file using streams and NIO. When using NIO I used 2 types of buffers: regular and direct. The utility is able to read file without writing it back to disk. This allow to compare the reading and writing speed separately. The following table shows results I got when reading file of 23MB. Evaluation time is given in milliseconds.

Operation NIO NIO + Direct buffer Stream
read + write638392281
write (calculated)433315244

The results show quite clearly that good old streams work much faster when accessing files sequentially, especially for reading. Reading files using streams is almost 3 times faster than doing it using NIO event utilizing direct buffer.

I was surprised seeing such results and tried to find the reason.  The short JDK code investigation explained everything. method is declared as native. So when we call it we directly use the native mechanisms of current operating system. FileChannel used to read files is an abstract class. The real implementation is in FileChannelImpl. Its read() method calls a lot of methods implemented in java, uses synchronized blocks etc. Obviously it will work slower than native method.


NIO gives us a lot of advantages. But it cannot completely replace the good old streams. At least sequential reading and writing of regular files works much faster when implemented using streams.

Tuesday, July 5, 2011

java.util.Pattern vs regular string search

Following discussion at work where I took a role of advocate of regular expressions I decided to verify how they are really fast. I remember that I have read that once Pattern is compiled (that might take some time) it works as quickly as regular string search. So, I wrote test that calls 1 million times String.contains() and Pattern.matcher().find(). Here are the results: contains test took 102 ms while pattern test took 340.

This means that patterns are almost 3.5 times slower. Therefore although regular expressions provide very convenient way for searching within strings sometimes when pattern are very simple and code is very performance critical we should use the good old methods provided by class java.lang.String.

With that I have to say that I paid attention that method indexOf() accepts String, method contains() accepts CharSequence but is implemented as  

return indexOf(s.toString())>=0. 

Pattern at the same time works directly with CharSequence. This means that if you use StringBuilder you should be careful passing StringBuilder instance  as an argument to String methods: creating new String object may cause significant performance degradation.