Sunday, March 17, 2019

Java By Comparison: the book review

Recently I had a pleasure to read book "Java By Comparison" by Simon Harrer, Jörg Lenhard and Linus Deitz.
This book plays the role of a mentor that helps a Java beginner to pass the labyrinth of different solutions, coding styles, best practices, tools and libraries and to be ready to develop production ready software.
Java is not the first and not the last language I have learned. When I started coding Java I brought with me a lot of practices and habits I had acquired coding other languages. Fortunately, I had a couple of more experienced colleagues that helped me to become a Java programmer. This book can do the same. A reader who is familiar with the Java syntax and its main concepts but does not have enough professional development skills in Java can significantly improve these skills.
Experience is not always measured in years. It also depends on the working environment, colleagues, tasks. This book may be useful for experienced programmers as well. It refers to a lot of third party tools everyone should know. It leads the reader step-by-step from simple to tricky mistakes and traps and teaches how to avoid them. Several hours which are required to read the book will be compensated very soon by dozens of hours that will not be wasted on debugging.
I think that everyone will find a lot of new information here. I highly recommend this book for any developer who wants to improve the coding skills in Java or other languages.

Monday, March 4, 2019

New life of old Visitor design pattern

Introduction



Visitor [1, 2] is a widely known classical design pattern. There are a lot of resources that explain it in details. Without digging into the implementation I will briefly remind the idea of the pattern, will explain its benefits and downsides and will suggest some improvements that can be easily applied to it using Java programming language. 


Classical Visitor


[Visitor] Allows for one or more operation to be applied to a set of objects at runtime, decoupling the operations from the object structure. (Gang of Four book)

The pattern is based on interface typically called Visitable that has to be implemented by model class and a set of Visitors that implement method (algorithm) for each relevant model class.

public interface Visitable {
  public void accept(Visitor visitor);
}

public class Book implements Visitable {
   .......
   @Override public void accept(Visitor visitor) {visitor.visit(this)};
   .......
}

public class Cd implements Visitable {
   .......
   @Override public void accept(Visitor visitor) {visitor.visit(this)};
   .......
}

interface Visitor {
   public void visit(Book book);
   public void visit(Magazine magazine);
   public void visit(Cd cd);
}



Now we can implement various visitors, e.g.

  • PrintVisitor that prints provided Visitable
  • DbVisitor that stores it in database, 
  • ShoppingCart that adds it to a shopping cart 

etc.


Downsides of visitor pattern


  1. Return type of the visit() methods must be defined at design time. In fact in most cases these methods are void.
  2. Implementations of the accept() method are identical in all classes. Obviously we prefer to avoid code duplication. 
  3. Every time the new model class is added each visitor must be updated, so the maintenance becomes hard. 
  4. It is impossible to have optional implementations for certain model class in certain visitor. For example, software can be sent to a buyer by email while milk cannot be sent. However, both can be delivered using traditional post. So, EmailSendingVisitor cannot implement method visit(Milk) but can implement visit(Software). Possible solution is to throw UnsupportedOperationException but the caller cannot know in advance that this exception will be thrown before it calls the method. 

Improvements to classical Visitor pattern


Return value


First, let's add return value to the Visitor interface. General definition can be done using generics.

public interface Visitable {
  public <R> R accept(Visitor<R> visitor);
}


interface Visitor<R> {
   public R visit(Book book);
   public R visit(Magazine magazine);
   public R visit(Cd cd);
}


Well, this was easy. Now we can apply to our Book any kind of Visitor that returns value. For example, DbVisitor may return number of changed records in DB (Integer) and ToJson visitor may return JSON representation of our object as String. (Probably the example is not too organic, in real life we typically use other techniques for serializing object to JSON, but it is good enough as theoretically possible usage of Visitor pattern).

Default implementation


Next, let's thank Java 8 for its ability to hold default implementations inside the interface:

public interface Visitable<R> {
  default R accept(Visitor<R> visitor) {
      return visitor.visit(this);
  }
}

Now class that implements Visitable does not have to implement visit() itself: the default implementation is good enough in most cases.


The improvements suggested above fix downsides #1 and #2.

MonoVisitor


Let's try to apply further improvements. First, let's define interface MonoVisitor as following:

public interface MonoVisitor<T, R> {
    R visit(T t);
}


The name Visitor was changed to MonoVisitor to avoid name clash and possible confusion. By the book visitor defines many overloaded methods visit(). Each of them accepts argument of different type for each Visitable. Therefore, Visitor by definition cannot be generic. It has to be defined and maintained on project level. MonoVisitor defines one single method only. The type safety is guaranteed by generics. Single class cannot implement the same interface several times even with different generic parameters. This means that we will have to hold several separate implementations of MonoVisitor even if they are grouped into one class.


Function reference instead of Visitor


Since MonoVisitor has only one business method we have to create implementation per model class. However, we do not want to create separate top level classes but prefer to group them in one class. This new visitor holds Map between various Visitable classes and implementations of java.util.Function and dispatches call of visit() method to particular implementation.

So, let's have a look at MapVisitor.


public class MapVisitor<R> implements 
        Function<Class<? extends Visitable>, MonoVisitor<? extends Visitable, R>> {
    private final Map<Class<? extends Visitable>, MonoVisitor<? extends Visitable, R>> visitors;

    MapVisitor(Map<Class<? extends Visitable>, MonoVisitor<? extends Visitable, R>> visitors) {
        this.visitors = visitors;
    }

    @Override
    public MonoVisitor apply(Class clazz) {
        return visitors.get(clazz);
    }
}

The MapVisitor
  • Implements Function in order to retrieve particular implementation (full generics are omitted here for readability; have a look at the code snippet for detailed definition)
  • Receives mapping between class and implementation in map
  • Retrieves particular implementation suitable for given class
MapVisitor has a package-private constructor. Initialization of MapVisitor done using special builder is very simple and flexible:

        
MapVisitor<Void> printVisitor = MapVisitor.builder(Void.class)
        .with(Book.class, book -> {System.out.println(book.getTitle()); return null;})
        .with(Magazine.class, magazine -> {System.out.println(magazine.getName()); return null;})
        .build();


MapVisitor usage is similar to one of the traditional Visitor:

someBook.accept(printVisitor);
someMagazine.accept(printVisitor);

Our MapVisitor has one more benefit. All methods declared in interface of a traditional visitor must be implemented. However, often some methods cannot be implemented.

For example, we want to implement application that demonstrates various actions that animals can do. The user can choose an animal and then make it do something by selecting specific action from the menu.
Here is the list of animals: Duck, Penguin, Wale, Ostrich
And this is the list of actions: Walk, Fly, Swim.
We decided to have visitor per action: WalkVisitor, FlyVisitor, SwimVisitor. Duck can do all three actions, Penguin cannot fly, Wale can only swim and Ostrich can only walk. So, we decided to throw exception if a user tries to cause Wale to walk or Ostrich to fly. But such behavior is not user friendly. Indeed, a user will get error message only when he presses the action button. We would probably prefer to disable irrelevant buttons. MapVisitor allows this without additional data structure or code duplication. We even do not have to define new or extend any other interface. Instead we prefer to use standard interface java.util.Predicate:


public class MapVisitor<R> implements 
        Function<Class<? extends Visitable>, MonoVisitor<? extends Visitable, R>>, 
        Predicate<Class<? extends Visitable>> {
    private final Map<Class<? extends Visitable>, MonoVisitor<? extends Visitable, R>> visitors;
    ...............
    @Override
    public boolean test(Class<? extends Visitable> clazz) {
        return visitors.containsKey(clazz);
    }
}

Now we can call function test() in order to define whether action button for selected animal has to be enabled or shown.

Full source code of examples used here is available on github.

Conclusions


This article demonstrates several improvements that make the good old Visitor pattern more flexible and powerful. The suggested implementation avoids some boiler plate code necessary for implementation of classic Vistor pattern. Here is the brief list of improvements explained above. 
  1. visit() methods of Visitor described here can return values and therefore may be implemented as pure functions [3] that help to combine Visitor pattern with functional programming paradigm.
  2. Breaking monolithic Visitor interface into separate blocks makes it more flexible and simplifies the code maintenance. 
  3. MapVisitor can be configured using builder at runtime, so it may change its behavior depending on information known only at runtime and unavailable during development. 
  4. Visitors with different return type can be applied to the same Visitable classes.
  5. Default implementation of methods done in interfaces removes a lot of boiler plate code usual for typical Visitor implementation. 

References




Wednesday, February 13, 2019

Two ways to extend enum functionality

Preface

In my previous article I explained how and why to use enums instead of switch/case control structure in Java code. Here I will show how how to extend functionality of existing enums.


Introduction

Java enum is a kind of a compiler magic. In the byte code any enum is represented as a class that extends abstract class java.lang.Enum and has several static members. Therefore enum cannot extend any other class or enum: there is no multiple inheritance.

Class cannot extend enum as well. This limitation is enforced by compiler.

Here is a simple enum:

enum Color {red, green, blue}

This class tries to extend it:

class SubColor extends Color {}

This is the result of an attempt to compile class SubColor:

$ javac SubColor.java 
SubColor.java:1: error: cannot inherit from final Color
class SubColor extends Color {}
                       ^
SubColor.java:1: error: enum types are not extensible
class SubColor extends Color {}
^
2 errors

Enum cannot either extend or be extended. So, how is it possible to extend its functionality? The key word is "functionality". Enum can implement methods. For example enum Color may declare abstract method draw() and each member can override it:
enum Color {
    red { @Override public void draw() { } },
    green { @Override public void draw() { } },
    blue { @Override public void draw() { } },
    ;
    public abstract void draw();
}

Popular usage of this technique is explained here. Unfortunately it is no always possible to implement method in enum itself because:
  1. the enum may belong to third party library or other team in the company
  2. the enum is probably overloaded with too many other data and functions so it becomes not readable
  3. the enum belongs to module that does not have dependencies required for implementation of method draw().
This article suggests the following solutions for this problem.


Mirror enum

We cannot modify enum Color? No problem! Let's create enum DrawableColor that has exactly same elements as Color. This new enum will implement our method draw():
enum DrawableColor {
    red { @Override public void draw() { } },
    green { @Override public void draw() { } },
    blue { @Override public void draw() { } },
    ;
    public abstract void draw();
}
This enum is a kind of reflection of source enum Color, i.e. Color is its mirror.
But how to use the new enum? All our code uses Color, not DrawableColor. The simplest way to implement this transition is using built-in enum methods name() and valueOf() as following:
Color color = ...
DrawableColor.valueOf(color.name()).draw();
Since name() method is final and cannot be overridden and valueOf() is generated by a compiler these methods are always fit each other, so no functional problems are expected here. Performance of such transition is good also: method name() even does not create new String but returns pre-initialized one (see source code of java.lang.Enum). Method valueOf() is implemented using Map, so its complexity is O(1).
The code above contains obvious problem. If source enum Color is changed the secondary enum DrawableColor does not know this fact, so the trick with name() and valueOf() will fail at runtime. We do not want this to happen. But how to prevent possible failure? We have to let DrawableColor to know that its mirror is Color and enforce this preferably at compile time or at least at unit test phase. Here we suggest validation during unit tests execution. Enum can implement static initializer that is executed when enum is mentioned in any code. This actually means that if static initializer validates that enum DrawableColor fits Color it is enough to implement test like following to be sure that the code will be never broken in production environment:
@Test
public void drawableColorFitsMirror {
    DrawableColor.values();
}
Static initializer just have to compare elements of DrawableColor and Color and throw exception if they do not match. This code is simple and can be written for each particular case. Fortunately simple  open source library named enumus already implements this functionality, so the task becomes trivial:
enum DrawableColor {
    ....
    static {
        Mirror.of(Color.class);
    }
}
That's it. The test will fail if source enum and DrawableColor do not fit it any more. Utility class Mirror has other method that gets 2 arguments: classes of 2 enums that have to fit. This version can be called from any place in code and not only from enum that has to be validated.

EnumMap

Do we really have to define another enum that just holds implementation of one method? In fact, we do not have to. Here is an alternative solution. Let's define interface Drawer as following:
public interface Drawer {
    void draw();
}
Now let's create mapping between enum elements and implementation of interface Drawer:
Map<Color, Drawer> drawers = new EnumMap<>(Color.class) {{
    put(red, new Drawer() { @Override public void draw();});
    put(green, new Drawer() { @Override public void draw();})
    put(blue, new Drawer() { @Override public void draw();})
}}

The usage is simple:

drawers.get(color).draw();

EnumMap is chosen here as a Map implementation for better performance.  Map guaranties that each enum element appears there only once. However, it does not guarantee that there is entry for each enum element. But it is enough to check that size of the map is equal to number of enum elements:


drawers.size() == Color.values().length

Enumus suggests convenient utility for this case also. The following code throws IllegalStateException with descriptive message if map does not fit Color:

EnumMapValidator.validateValues(Color.class, map, "Colors map");


It is important to call the validator from the code which is executed by unit test. In this case the map based solution is safe for future modifications of source enum.


EnumMap and Java 8 functional interface


In fact, we do not have to define special interface to extend enum functionality. We can use one of functional interfaces provided by JDK starting from version 8 (Function, BiFunction, Consumer, BiConsumer, Supplier etc.) The choice depends on parameters that have to be sent to the function. For example, Supplier can be used instead of Drawable defined in the previous example:

Map<Color, Supplier<Void>> drawers = new EnumMap<>(Color.class) {{
    put(red, new Supplier<Void>() { @Override public void get();});
    put(green, new Supplier<Void>() { @Override public void get();})
    put(blue, new Supplier<Void>() { @Override public void get();})
}}

Usage of this map is pretty similar to one from the previous example:

drawers.get(color).get();

This map can be validated exactly as the map that stores instances of Drawable



Conclusions


This article shows how powerful can be Java enums if we put some logic inside. It also demonstrates two ways to expand the functionality of enums that work despite the language limitations. The article introduces to user the open source library named enumus that provides several useful utilities that help to operate enums easier. 


Featured enum instead of switch

Problem and  its solution

Switch/case is the common control structure implemented in most imperative programming languages. Switch is considered more readable than series of if/else.

Here is a simple example:
// Switch with int literal
switch (c) {
  case 1: one(); break;
  case 2: two(); break;
  case 3: three(); break;
  default: throw new UnsupportedOperationException(String.format("Operation %d is not supported", c));
}


Here is the list of the main problems in this code:

  1. Relationship between int literals (1, 2, 3) and executed code is not obvious.
  2. If one of the values (e.g. 2) becomes not supported anymore and this switch is not updated accordingly it will contain  forever the unused code.
  3. If new possible value of c (e.g. 4) is introduced and the switch is not updated accordingly the code will probably throw UnsupportedOperationException at runtime without any compile time notifications.
  4. Such switch structure tends to be duplicated several times in code that makes problems 2 and 3 even more complicated. 
The first simplest fix can be done by using int constants instead of literals. First, let's define constants:

private static int ONE = 1;
private static int TWO = 2;
private static int THREE = 3;

Now the code will look like this:

switch (c) {
  case ONE: one(); break;
  case TWO: two(); break;
  case THREE: three(); break;
  default: throw new UnsupportedOperationException(String.format("Operation %d is not supported", c));
}


(Obviously in real life the names of the constants must be self descriptive)
This snippet is more readable but all other disadvantages are still relevant. The next attempt to improve the initial code snippet uses enums introduced to Java language in version 5 in 2004. Let's define the following enum:

enum Action {ONE, TWO, THREE}


Now the switch snippet will be slightly changed:

Action a = ...
switch (a) {
  case ONE: one(); break;
  case TWO: two(); break;
  case THREE: three(); break;
  default: throw new UnsupportedOperationException(String.format("Operation %s is not supported", a));
}


This code is a little bit better: it will produce compilation error if one of the elements is removed from enum Action. However, it will not cause compilation error if additional element is added to enum Action. Some IDEs or static code analysis tools may produce warning in this case, but who is paying attention to warnings? Fortunately enum can declare abstract method that has to be implemented by each element:


enum Action {
  ONE { @Override public void action() { } }, 
  TWO { @Override public void action() { } }, 
  THREE { @Override public void action() { } }, 
  public abstract void action();
}


Now the switch statement can be replaced by single line:


Action a = ...
a.action();


This solution does not have any of disadvantages enumerated above:

  1. It is readable. The method is "attached" to enum element; one can write as many javadoc as it is needed if method meaning is unclear. The code that calls method is trivial: what can be simpler than method invocation?
  2. There is no way to remove enum constant without removing the implementation, so no unused code will remain if some functionality is no longer relevant.
  3. New enum element cannot be added without implementation of method action(). Code without implementation can't be compiled. 
  4. If several actions are required they all can be implemented in enum. As we already mentioned the code that calls specific function is trivial, so now there is no code duplication. 

Conclusion

Although switch/case structure is well known and widely used in various programming languages its usage may cause a lot of problems. Solution that uses  java enums and described above does not have these disadvantages. The next article from this series shows how to extend functionality of existing enum.

Saturday, February 2, 2019

Syntax highlighting

I have written a lot of blog posts that contain code snippets  in several programming languages (mostly Java). I separated each code snippet by empty line using monospace font to improve readability. Changing font type for code snippets is annoying and does not create the best results I want: I prefer highlighted code.

So, I searched  for tools that can do this work for me and found 2 types of tools:

  1. Tools that take your code snippet and produce HTML that can be embedded into any blog post
  2. Tools that do this transformation at runtime, so the code snippet remains clear text.

The tools of the first type are often very flexible and support various color themes but have a serious disadvantage: they generate almost not editable HTML. If  you want to change your code snippet you mostly have to regenerate its HTML representation. This also mean that you have to store your original snippet for future use, for example as a GitHub gist. It is not a show stopper but an obvious disadvantage. 

The tools of the second type do their magic at runtime. The code snippet remains human readable. The injected java script runs when page is loaded and changes color of reserved words of the programming language used for the embedded code snippet. 

The most popular and good looking syntax highlighter that I found is one created by Alex Gorbabchev.

Here is an example of code snippet highlighted by this tool:
public class MyTest {
    @Test
    public void multiply() {
        assertEquals(4, 2* 2);
    }
}

There are 2 things I had to do to make this magic happen:

  1. Include several scripts and CSS files into HTML header
  2. Write the code snippet into <pre> tag with specific style:
public class MyTest {
    @Test
    public void multiply() {
        assertEquals(4, 2* 2);
    }
}
Typically external resources (either scripts or CSS) are included by reference, i.e.
<script src='http://domain.com/path/script.js' type='text/javascript'></script> 
<link href='http://domain.com/path/style.css' rel='stylesheet' type='text/css'/> 
This works perfectly with Syntax highlighter scripts in stand alone HTML document but did not work when I added the scripts to the themes of my blog. Discovery showed that blogger.com for some reason changed absolute resource references to relative once, so they did not work. Instead of src="http://domain.com/path/script.js" that I have written the following appeared: src="//domain.com/path/script.js", i.e. the http is omitted.


So, I have downloaded all scripts to be able to put their source code directly as a body of tag <script>. For convenience and better performance I have minimized the scripts using one of online tools available in web. The code is available here. This code should be added to <head> of the HTML page.

Now I can enjoy the great syntax highlighter.





Thursday, January 4, 2018

Why Gradle is called Gradle

Today I asked myself: what does name "Gradle" mean. I asked Google and here are the first 2 answers:


Answer #1

It's not an abbreviation, and doesn't have any particular meaning.
The name came from Hans Docter (Gradle founder) who thought it sounded cool.


Answer #2

My original idea was to call it Cradle. The disadvantages of that name were:
  • to diminutive
    • not very unique
As Gradle is using Groovy for the DSL I went down the G-road and thought about calling it Gradle. Everyone I asked liked it so that became the official name. That was about 4 years ago. I'm still very happy with the name.









Conclusions: both answers are correct

  1. This name indeed was invented by the Gradle founder Hans Dockter and IMHO he thinks that the name is cool.
  2. The name has some meaning



Thursday, December 7, 2017

DB #42

DB #42 or how to choose technology :)


We have to implement a new feature. No matter what kind of feature it is. Well, we have to expose REST API, store and retrieve some dynamic data. We spent a lot of time choosing DB. The chief architect wanted MySql. Well, this idea have not pass our sanity filter. The data architect wanted AeroSpike, team leader suggested MongoDB that was blamed by DevOps that proposed Redis.

But as we all know the absolute answer to all questions is 42. So, we decided to choose the DB #42 from alphabetically sorted list of NoSQL databases. 

I opened Google and typed "list of nosql databases". Then I have chosen the following article. I had to extract a list of names of the databases and did it using the following command:

curl http://nosql-database.org/  | grep h3 | grep href > /tmp/db.hrefs.txt

(because all DB names in this list are surrounded by tag <a> with reference to their web page)

There was one case when 2 databases were written in one physical line and I fixed this manually. 
Then I ran the following command to get the alphabetical list of DB names:

cat /tmp/db.hrefs2.txt |  sed 's#</a>.*##' | sed 's/.*>//' | sort > /tmp/db.names.txt

The last phase it to print this list with attached numbers:

 i=1; cat /tmp/db.names.txt | while read l; do echo "$i,$l"; i=`expr $i + 1`; done

Here is the list:
1Accumulo
2acid-state
3Aerospike
4AlchemyDB
5allegro-C
6AllegroGraph
7Amazon SimpleDB
8AmisaDB:
9Apache Flink
10Applied Calculus
11ArangoDB
12ArangoDB
13ArangoDB
14Axibase
15Azure DocumentDB
16Azure Table Storage
17BagriDB
18BangDB
19BaseX
20BayesDB
21BergDB
22Berkeley DB
23Berkeley DB XML
24Bigdata
25BinaryRage
26BoltDB
27BrightstarDB
28BrightstarDB
29Cachelot
30Cassandra
31Chordless
32Chronicle Map
33Cloudata
34Cloud Datastore
35Cloudera
36Clusterpoint Server
37CodernityDB
38ConcourseDB
39CoreObject
40CortexDB
41Couchbase Server
42CouchDB
43Crate Data
44DaggerDB
45Datomic
46db4o
47DBreeze
48densodb
49djondb
50Druid
51DynamoDB
52Dynomite
53EJDB
54Elassandra
55Elastic
56Elliptics
57EMC Documentum xDB
58ESENT
59Eventsourcing for Java (es4j)
60Event Store
61Execom IOG
62eXist
63eXtremeDB
64eXtremeDB
65eXtremeDB Financial Edition
66EyeDB
67Faircom C-Tree
68Fallen 8
69FileDB:
70filejson
71FlockDB
72FoundationDB
73FramerD
74GemFire
75GemStone/S
76GenieDB
77Genomu
78GigaSpaces
79Globals:
80GPUdb
81GraphBase
82GridGain
83GT.M
84gunDB
85gunDB
86gunDB
87Hadoop / HBase
88Hazelcast
89Hibari
90HPCC
91HSS Database
92HyperDex
93HyperGraphDB
94Hypertable
95IBM Cloudant
96IBM Informix
97IBM Lotus/Domino
98iBoxDB
99Infinispan
100Infinite Graph
101InfinityDB
102influxdata
103InfoGrid
104Informix Time Series Solution
105Intersystems Cache
106ISIS Family
107JADE
108JasDB
109jBASE
110JEntigrator
111JSON ODM
112KAI
113kdb+
114KirbyBase
115KitaroDB
116KUDU
117LevelDB
118LightCloud
119LSM
120Magma
121MapR
122MarcelloDB
123MarkLogic Server
124Maxtable
125MemcacheDB
126MentDB:
127Meronymy
128MiniM DB
129Mnesia
130Model 204 Database
131MonetDB
132MongoDB
133Moonshadow
134Morantex
135NCache
136NDatabase
137NeDB
138NEO
139Neo4J
140nessDB
141Newt DB
142Ninja Database Pro
143NosDB
144NoSQL embedded db
145ObjectDB
146Objectivity
147Onyx Database
148OpenInsight
149OpenLDAP
150OpenLink Virtuoso
151OpenQM
152Oracle Coherence
153Oracle NOSQL Database
154OrientDB
155OrientDB
156Perst
157PickleDB
158PicoLisp
159Pincaster
160pipelinedb
161Prevayler
162Qizx
163quasardb
164Queplix
165RaptorDB
166RaptorDB
167rasdaman
168RavenDB
169RDM Embedded
170Reality
171ReasonDB
172Recutils:
173Redis
174RethinkDB
175Riak
176Riak TS
177RockallDB
178RocksDB
179Scalaris
180Scalien
181SciDB
182SCR Siemens Common Repository
183Scylla
184SDB
185Sedna
186SequoiaDB
187Serenety
188SharedHashFile
189siaqodb
190SisoDB
191Sophia
192Sparksee
193Splice Machine
194SpreadsheetDB
195Starcounter
196Sterling
197STSdb
198Symas LMDB
199Tarantool/Box
200TayzGrid
201Terrastore
202ThruDB
203TIBCO Active Spaces
204Tieto TRIP
205TigerLogic PICK
206TITAN
207Tokutek:
208Tokyo Cabinet / Tyrant
209ToroDB
210TreodeDB
211Trinity
212U2
213upscaledb
214VaultDB
215VelocityDB
216Versant
217VertexDB
218Voldemort
219Vyhodb
220weaver
221WhiteDB
222WonderDB
223Yserial
224ZODB


And the winner is .... CouchDB - number 42!

Conclusions

Every time you cannot agree which technology to choose just find the longest list of relevant technologies, sort them alphabetically and choose #42.

Thursday, March 17, 2016

Performance of try/catch

Yesterday I had a discussion with my friend and he said that according to his opinion try/catch statement in java is very performance expensive. Indeed it is always recommended to check value prior using it instead of try to use and then catch exception if wrong value caused its throwing.

I decided to check this and tried several code samples. All functions accept int and return the argument multiplied by 2.
But there were the differences:

  1. just calculate the value and return it (foo())
  2. calculate the value into try block followed by catch block (tryCatch())
  3. calculate the value into try block followed by finally block (tryFinally())
  4. calculate the value into try block followed by catch and finally blocks (tryCatchFinally())
  5. divide integer value by zero into try block followed by catch block that just returns -1 (tryThrowCatch())
  6. divide integer value by zero into try block followed by catch block that re-throws it. Outer try/catch structure catches the secondary exception and returns -1 (tryThrowCatch1())
  7. divide integer value by zero into try block followed by catch block that wraps thrown exception with another RuntimeException and re-throws it. Outer try/catch structure catches the secondary exception and returns -1 (tryThrowCatch2())
I ran each test 100,000,000 times in loop and measured elapsed time. Here are the results.


Test name Elapsed time, ms
foo 46
tryCatch 45
tryFinally 45
tryCatchFinally 44
tryThrowCatch 133
tryThrowCatch1 139
tryThrowCatch2 62293


Analysis

  1. try/catch/finally structure written in code itself does not cause any performance degradation
  2. throwing and catching exception is 3 times more expensive than simple method call.
  3. wrapping exception with another one and re-throwing it is really expensive. 


Conclusions

Catching exceptions itself does not have any performance penalty. Throwing exception is indeed expensive, so validation of values before using them is better not only from design but also from performance perspective. 

The important conclusion is that we should avoid using very common pattern in performance critical code:

try {
     // some code
} catch (ThisLayerException e) {
    throw new UpperLayerException(e);
}

This pattern helps us to use layer specific exceptions on each layer of our code. Exception thrown from lower layer can be wrapped many times that creates extremely long stack trace and causes serious performance degradation. Probably better approach is to extend our domain level exceptions from RuntimeException and wrap only checked exceptions and only once (like Spring does).


Source code

The source code can used here can be found on github.


    Wednesday, February 24, 2016

    Dangerous String.format()

    Introduction


    Static method format() that was added to class java.lang.String in java 5 became popular and widely used method that replaced MessageFormat, string concatenation or verbose calls of StringBuilder.append().

    However using this method we should remember that this lunch is not free.

    Performance issues

    1. This method accepts ellipsis and therefore creates new Object array each time to wrap passed arguments. Extra object is created, extra object must be then removed by GC. 
    2. It internally creates instance of java.util.Formatter that parses the format specification. Yet another object and a lot of CPU intensive parsing. 
    3. It creates new instance of StringBuilder used to store the formatted data.
    4. At the end it calls StringBuilder.toString() and therefore creates yet another object. The good news is that at least it does not copy the content of StringBulder but passes the char array directly to String constructor. 
    So, call of String.format() creates at least 4 short leaving objects and parses format specification. In real application it probably parses the same format millions times. 

    Solution

    Use Formatter directly. Compare the following code snippets:



    public static CharSequence str() {
     StringBuilder buf = new StringBuilder();
     for (int i = 0; i < n; i++) {
      buf.append(String.format("%d\n", 1));
     }
     return buf;
    }
    
    
    public static CharSequence fmt() {
     StringBuilder buf = new StringBuilder();
     Formatter fmt = new Formatter(buf);
     for (int i = 0; i < n; i++) {
      fmt.format("%d\n", 1);
     }
     return buf;
    
    }
    


    Method fmt() is about 1.5 times faster than method str(). Even better results may be received comparing writing directly to stream instead of creating String and then writing it to stream. 


    String.format() is Locale sensitive

    There are 2 format() methods:

    public static String format(String format, Object... args)

    and 

    public static String format(Locale l, String format, Object... args)


    Method that does not receives Locale argument uses default locale: Locale.getDefault(Locale.Category.FORMAT) that depends on machine configuration. This means that changing machine settings changes behavior of your application that may even break it. The most common problems are:
    • decimal separator
    • digits

    Decimal separator


    Programmers are so regular that decimal separator is dot (.) that sometimes forget that this depends on locale. I've written simple code snippet that iterates over all available locales and checks what character is used as a decimal separator:

    Decimal separator Number of locales
    Dot (.) 71
    Comma (,) 89

    If produced string is then parsed the parsing may be broken by changing default locale of current machine. 

    Digits

    Everyone knows that digits are 1,2,3,... This is right. But not in any locale. Arabic, Hindi, Thai and other languages use other characters that represent the same digits. Here is a code sample:




    for (Locale locale : Locale.getAvailableLocales()) {
     String one = String.format(locale, "%d", 1);
     if (!"1".equals(one)) {
       System.out.println("\t" +locale + ": " + one);
     }
    }
    


    And this is its output when it is running on Linux machine with java 8:

            hi_IN: १
            th_TH_TH_#u-nu-thai: ๑


    Being executed on Android this code produces 109 lines long output. It includes:

    1. all versions of Arabic locales, 
    2. as, bn, dz, fa, ks, mr, my, ne, pa, ps, uz with dialects.
    This may easily break application on some locales. 



    Conclusions

    1. Since java formatting is locale dependent it should be used very carefully. Probably in some cases it is better to specify locale explicitly, e.g. Locale.US
    2. Be careful when calling String.format() in performance critical sections of code. Using other API (e.g. direct invocation of class Formatter) may significantly improve performance. 

    Acknowledgements

    I'd like to thank Eliav Atoun that inspired discussion about this issue and helped me to try the code sample on Android. 

    Source code 

    Code snippets used here may be found on github




    Thursday, December 17, 2015

    Creating a self-extracting tar.gz

    Motivation


    Winzip can create self-extracting executable. Actually this is the unzip utility together with content of zip file that is being extracted when running the executable.

    Recently I was working on distribution package of our software for Linux. On Windows we have a huge zip file and a small script that we run once zip is extracted. Scripts for Linux are a little bit more complicated due to necessity of granting permissions, creating user and group etc. So, I wrote script that does all necessary actions including the archive extracting. The disadvantage of this is that now the distribution consists of at least 2 files: the archive and the script.

    So, I decided to check how to create self extracting archive on Linux.


    Used commands

    I started from the following exercise:

    #!/bin/sh
    echo $0
    exit
    foo bar


    This script runs ignoring the last line "foo bar". 
    Then I created tar.gz file and appended it to this script:

    cat script.sh my.tar.gz >script.with.tar.sh

    Although now the script contains binary content of tar.gz it runs as expected. 
    But I want to create self extracting script. Therefore I need a way to separate script from its "attachment". Command "dd" helps to implement this:

    dd bs=1 skip=$SCRIPT_PREFIX  if=$SELF_EXTRACTING_TAR

    But how can script with attachment extract size of its scripting part? If script contains known number of lines (e.g. 3) we can do the following:

    head -3 $SCRIPT | wc -c


    Script can access its own name using variable $0
    Taking both commands together we can write line that extracts attachment appended to script:

    dd bs=1 skip=`head -3 $0 | wc -c` if=$0

    Extracting tar.gz can be achieve using command 

    gunzip -c  | tar -x

    So, this command extracts the attached tar.gz to current directory:

    dd bs=1 skip=`head -3 $0 | wc -c` if=$0 | gunzip -c | tar -x


    Script

    It is a good idea to create script that takes regular tar.gz and creates self-extracting archive.
    The script is available here. It accepts the following arguments:

    1. mandatory path to tar.gz 
    2. optional command that is automatically executed right after the archive extracting. Typically it is script packaged into the tar. 
    The name of resulting executable is as name of tar.gz with suffix ".self".



    Usage Examples: 

    Create self-extracted executable: 
    ./selftar.sh my.tar.gz

    Create self-extracted executable with command executed after archive extracting: 
    ./selftar.sh my.tar.gz "folder/install.sh"

    Both examples create executable file my.tar.gz.self that extracts files initially packaged to my.tar.gz to current directory:

    ./my.tar.gz.self


    Usage

    This script and article were inspired by my work on distribution package based on simple archive. Generally this technique is wrong by definition. Distribution package depends on the target platform and should be created utilizing the platform specific tools: .msi for MS Windows, .rpm for RedHat, .deb for Debian etc. Self-extracting executable however allows creating simple package that can be used on most Unix based platforms that is very convenient especially when the packaged application is written in cross-platform language like java. 

    Wednesday, December 10, 2014

    Why not to develop code on Windows


    I have been developing code professionally for the last 17 years. Most of these years I used MS Windows because this operating system is mostly used in Israeli Hi-Tech industry. However years spent with Linux were them most productive and fun for me. 

    So, I decided to write the reasons why not to develop code on Windows. Obviously this is irrelevant for people that develop applications for Windows. 
    1. windows is not free
    2. you have to install office that is not free too
    3. shell is owfull
    4. even linux-like shells (cygwing, gitshell etc) do not work exactly as Linux shells. 
    5. security issues make mad. Often you cannot remove file although you are administrator. For example on Windows 8.1 java method File.canWrite() returns false unless file is under user home although the file is indeed writable. 
    6. Often files remain locked even if process that locked them is killed. 
    7. Back slash as a delimiter instead of forward slash causes lots of mistakes.
    8. "Program Files" folder is very important. Space in its name causes many bat files that do not wrap each path with quotes to fail.
    9. Most open source tools are developed and tested on Linux. Even if the tools are cross platform some issues often happen on Windows only.
    10. It is not Linux


    Thursday, January 23, 2014

    Usage trends of JavaScript MVC Frameworks

    Analysis

    JavaScript MVC frameworks became very popular during the last 5 years. Naturally there are a lot of such frameworks. I spent most of my time as a developer at server side, so I am not very familiar with those framework but want to learn them. But where to start? I do not want to spend time on framework that will be obsolete in a year because I do not want to look like people that start learning J2ME, applet programming, Log4J API and configuration or start new project with Ant and CVS these days.

    So, I googled "JavaScript MVC framewors" and found the following articles:

    I know that it the list is not full but if it is good enough for authors it is good enough for me :).
    Then I wanted to check whether these libraries are good for other people using google trends. Unfortunately Google trends does not allow to compare more than 5 search targets, so I splitted the list into groups and then compared the best libraries from the first round. 

    Here are links to the results.

    Semi final


    Final




    According to this graph AngularJS wins.

    Conclusions

    It seems that AngularJS is the most popular JS MVC framework now and its popularity is growing very fast, so I am going to learn it. 

    Thursday, July 11, 2013

    Performance of checking computer clock

    Very often we check computer clock using either System.currentTimeMillis() or System.nanoTime(). Often we call these methods to check how long certain part of our program runs to improve performance. But how much does the call of mentioned methods cost? Or by other words

    How long does it take to ask "What time is it now?"

    I asked myself this question and wrote the following program.

    public static void main(String[] args) {
    long tmp = System.nanoTime();
    long before = System.nanoTime();
    for (int i = 0; i < 1000_000_000; i++) {
    // do the call
    }

    long after = System.nanoTime();
    System.out.println((after - before) / 1000_000);
    }

    Then I replaced the comment "do the call" with interesting code fragments and measured the time. Here are my results.

    Code Elapsed time, ms
    nothing 5
    call of foo() {return 0;} 5
    f+=f 320
    call of foo() {return f+=f;} where f is a class level static variable initiated to System.nanoTime() 325
    call of System.nanoTime() 19569
    call of System.currenTimeMillis() 22639

    This means that:

    1. method that just returns constant is not executed at all. Call of method that returns 0 takes exactly the same time as doing nothing.
    2. call of method itself does not take time. Execution of f+=f and call of method that does the same take exactly the same time. We have to say "thanks" to JVM that optimizes code at runtime and  uses JIT.
    3. Call of currentTimeMillis() is about 10% heavier than nanoTime()
    4. Both methods of taking time are comparable with ~65 arithmetic operations. 

    Conclusions

    1. Checking computer clock itself can take time when it is used for measurement of performance of relatively small pieces of code. So, we should be careful doing this. 
    2. Using nanoTime() is preferable when checking time period not only because it gives higher precision and is not sensitive to changing of computer clock but also because it runs faster. Moreover this method returns more correct results because it is using monotonic clock. It guaranties that if you perform 2 consequent calls the second call returns number greater than previous that is not guaranteed when executing currentTimeMillis().
    3. Do not try to optimize code by manual inlining of your logic. JVM does it for us at runtime. Indeed running arithmetic operation directly or by calling method that contains only this operation take exactly the same time. 

    Acknowledgements

    I would like to thank Arnon Klein for his valuable comments.