The programmer's notes

Saturday, October 16, 2021

Coding conventions

Preface

The video version of this article can be found on YouTube in English, Russian and Hebrew. The presentation is available here.

Introduction

Let's start with the definition.

Coding conventions are a set of guidelines for a specific programming language that recommend programming style, practices, and methods for each aspect of a program written in that language.

(Wikipedia)

Why do we need conventions? It is believed that a programmer writes code for a computer that does not care whether the code is nice or ugly. However, 40-80% of time a programmer spends not on the development of the new code but on maintenance and refactoring of the old one. What is maintenance and refactoring? This is a process when a programmer (human) is reading existing code developed by himself or another programmer. The faster he does it, the quality of the product is higher, its complexity is lower and the team work is simpler.

Robert Martin, known as Uncle Bob that has written the famous book "Clean Code" said: "It doesn't require awful lot of skill to write a program that computer understands. The skill is writing a programs that human understand."

Objections

I have often heard various objections:

I am a professional with many years of experience
I have found my own conventions and tricks that make me more productive
Everyone can write code as he/she wants.
I have always worked this way and will work this way.

The case is that coding conventions play a role of laws. Why does humankind need laws? We need laws to regulate relationship among people. Robinson Crusoe could live without laws while he was alone on the island. So a programmer that is working alone could theoretically work without any conventions and be independent software vendor. However, any team member has to work according to the well known industrial standards.

Conventions source

Where did conventions come from? There are different sources.

Sometimes decisions of well-known experts or traditions may become industrial standards.

Often the source of convention is a crowd wisdom. If a lot of independent people have found the same way of usage of a digging stick as the most effective one, there is very low chance that somebody can improve this technique within reasonable time period. The same can be relevant for the computer languages usage.

Conventions and personal preferences

Sometimes personal preferences do not exactly match the well known conventions. Conventions are a set of rules that limit us when we are coding in specific language, so, if our personal rules are stronger than well known ones and do not directly contradict them, they can be used without any problem.

Common practices are changing all the time. Commonly used conventions may become less popular or even wrong. Some conventions are based on reason that became irrelevant. We should stop using such conventions.

There are general and language specific conventions. We should avoid adopting our habits that we got when coding in one language (environment, framework) to another. Respect the rules of the domain you are working in!

Conventions types

There are a lot of types of conventions. Let' take a brief look at some of them.

Code formatting

The code formatting is the first thing we are thinking about when we discuss the coding conventions.

Discussions between people that write the left curly braсe directly after the operator and those that prefer to move it to the beginning of the next line are as meaningless as the dispute between Lilliputs on which side to break the egg. The Lilliputs conflict caused the endless war that was terminated by Gulliver. The problem of the left curly brace fortunately does not lead to any war and can be resolved using the coding conventions for each language.

Naming

Names are important. Our code is written according to the rules of the language using its reserved words and the names that we invent for our variables, functions, classes etc. More readable names make the code better.

Best coding practices

There are best practices for each operator. Let's review one simple example. We have for-loop with the code that should be executed only if some condition is true. Here the straight-forward version of the code:

for (int i = 0; i < n; i++) {
    if (condition) {
        // some code
    }
}

However we can inverse the condition used in "if" statement, go to the next loop iteration and implement the code after the "if":

for (int i = 0; i < n; i++) {
    if (!condition) {
        continue;
    }
    // some code
}

Each approach has its advantages and disadvantages and the choice could be made using conventions.

Conventions are dynamic

C language was born in 1970. In those days, screens were small and keyboards were inconvenient. Therefore, people tended to shorten names of functions and variables and they often used single-character names for variables. A lot of years passed and Java was invented. Java coding conventions suggest to use long, self-descriptive names. Many years passed since that moment and Scala language appeared. Scala combines Object Oriented and Functional paradigm. When we are coding using functional style the meaning of the variable is often obvious from the surrounding code and the names could be shorten without compromising the code readability.

Thus, we returned to short names on the next phase of the languages evolution.

Conventions and agile

During the last 10 or even more years agile methodologies have become more and more popular.

Agile is the answer to the new challenge: we have to develop software according to ever-changing requirements. This cause us to refactor our code very frequently and deliver versions very fast. Once upon a time it was a common practice to deliver version every several months. These days we produce new version every week and sometimes several times a day.

Nowadays programmers do not work for the same company for decades but move from one project to another relatively often. This means that many teams may have one or more new programmers that should be integrated into the team as quickly as possible. This is very difficult if code is not written according to the well known and widely used standards.

Code ownership does not exist anymore. Each programmer can modify any part of the code written by himself or by any other team member.

Many teams had practiced remote working and during last year of COVID more and more teams joined the club.

Members of many teams that are developing the same product may live in different countries and time zones and may speak different languages. The communication in such teams is done mostly via code itself. The co-working is easier if the code is written according to the coding standards.

Coding standards for different languages

Every language has its own coding standards. Here is the list of the most popular programming languages with links to documents that define coding guidelines for each language:

Conclusions

We can significantly improve our software if we follow the coding standards.

Monday, August 23, 2021

Youtube channel

I've created my youtube channel named "The programmer's notes" exactly as this blog. I am going to post videos about various aspects of the software development.

My first clip is in Russian and it is about the coding conventions. More clips TBD.

Я создал собственный youtube канал, назвав его "The programmer's notes" в точности, как этот блог. Я собираюсь постить водосы о различных аспектах разработки ПО.

Мой первый клип на русском языке посвящён стандартному оформлению кода. Надеюсь, скоро появятся новые клипы.

Sunday, March 17, 2019

Java By Comparison: the book review

Recently I had a pleasure to read book "Java By Comparison" by Simon Harrer, Jörg Lenhard and Linus Deitz.
This book plays the role of a mentor that helps a Java beginner to pass the labyrinth of different solutions, coding styles, best practices, tools and libraries and to be ready to develop production ready software.
Java is not the first and not the last language I have learned. When I started coding Java I brought with me a lot of practices and habits I had acquired coding other languages. Fortunately, I had a couple of more experienced colleagues that helped me to become a Java programmer. This book can do the same. A reader who is familiar with the Java syntax and its main concepts but does not have enough professional development skills in Java can significantly improve these skills.
Experience is not always measured in years. It also depends on the working environment, colleagues, tasks. This book may be useful for experienced programmers as well. It refers to a lot of third party tools everyone should know. It leads the reader step-by-step from simple to tricky mistakes and traps and teaches how to avoid them. Several hours which are required to read the book will be compensated very soon by dozens of hours that will not be wasted on debugging.
I think that everyone will find a lot of new information here. I highly recommend this book for any developer who wants to improve the coding skills in Java or other languages.

Monday, March 4, 2019

New life of old Visitor design pattern

Introduction

Visitor [1, 2] is a widely known classical design pattern. There are a lot of resources that explain it in details. Without digging into the implementation I will briefly remind the idea of the pattern, will explain its benefits and downsides and will suggest some improvements that can be easily applied to it using Java programming language.

Classical Visitor

[Visitor] Allows for one or more operation to be applied to a set of objects at runtime, decoupling the operations from the object structure. (Gang of Four book)

The pattern is based on interface typically called Visitable that has to be implemented by model class and a set of Visitors that implement method (algorithm) for each relevant model class.

public interface Visitable {
  public void accept(Visitor visitor);
}

public class Book implements Visitable {
   .......
   @Override public void accept(Visitor visitor) {visitor.visit(this)};
   .......
}

public class Cd implements Visitable {
   .......
   @Override public void accept(Visitor visitor) {visitor.visit(this)};
   .......
}

interface Visitor {
   public void visit(Book book);
   public void visit(Magazine magazine);
   public void visit(Cd cd);
}

Now we can implement various visitors, e.g.

PrintVisitor that prints provided Visitable,
DbVisitor that stores it in database,
ShoppingCart that adds it to a shopping cart

etc.

Downsides of visitor pattern

Return type of the visit() methods must be defined at design time. In fact in most cases these methods are void.
Implementations of the accept() method are identical in all classes. Obviously we prefer to avoid code duplication.
Every time the new model class is added each visitor must be updated, so the maintenance becomes hard.
It is impossible to have optional implementations for certain model class in certain visitor. For example, software can be sent to a buyer by email while milk cannot be sent. However, both can be delivered using traditional post. So, EmailSendingVisitor cannot implement method visit(Milk) but can implement visit(Software). Possible solution is to throw UnsupportedOperationException but the caller cannot know in advance that this exception will be thrown before it calls the method.

Improvements to classical Visitor pattern

Return value

First, let's add return value to the Visitor interface. General definition can be done using generics.

public interface Visitable {
  public <R> R accept(Visitor<R> visitor);
}


interface Visitor<R> {
   public R visit(Book book);
   public R visit(Magazine magazine);
   public R visit(Cd cd);
}

Well, this was easy. Now we can apply to our Book any kind of Visitor that returns value. For example, DbVisitor may return number of changed records in DB (Integer) and ToJson visitor may return JSON representation of our object as String. (Probably the example is not too organic, in real life we typically use other techniques for serializing object to JSON, but it is good enough as theoretically possible usage of Visitor pattern).

Default implementation

Next, let's thank Java 8 for its ability to hold default implementations inside the interface:

public interface Visitable<R> {
  default R accept(Visitor<R> visitor) {
      return visitor.visit(this);
  }
}

Now class that implements Visitable does not have to implement visit() itself: the default implementation is good enough in most cases.

The improvements suggested above fix downsides #1 and #2.

MonoVisitor

Let's try to apply further improvements. First, let's define interface MonoVisitor as following:

public interface MonoVisitor<T, R> {
    R visit(T t);
}

The name Visitor was changed to MonoVisitor to avoid name clash and possible confusion. By the book visitor defines many overloaded methods visit(). Each of them accepts argument of different type for each Visitable. Therefore, Visitor by definition cannot be generic. It has to be defined and maintained on project level. MonoVisitor defines one single method only. The type safety is guaranteed by generics. Single class cannot implement the same interface several times even with different generic parameters. This means that we will have to hold several separate implementations of MonoVisitor even if they are grouped into one class.

Function reference instead of Visitor

Since MonoVisitor has only one business method we have to create implementation per model class. However, we do not want to create separate top level classes but prefer to group them in one class. This new visitor holds Map between various Visitable classes and implementations of java.util.Function and dispatches call of visit() method to particular implementation.

So, let's have a look at MapVisitor.

public class MapVisitor<R> implements 
        Function<Class<? extends Visitable>, MonoVisitor<? extends Visitable, R>> {
    private final Map<Class<? extends Visitable>, MonoVisitor<? extends Visitable, R>> visitors;

    MapVisitor(Map<Class<? extends Visitable>, MonoVisitor<? extends Visitable, R>> visitors) {
        this.visitors = visitors;
    }

    @Override
    public MonoVisitor apply(Class clazz) {
        return visitors.get(clazz);
    }
}

The MapVisitor

Implements Function in order to retrieve particular implementation (full generics are omitted here for readability; have a look at the code snippet for detailed definition)
Receives mapping between class and implementation in map
Retrieves particular implementation suitable for given class

MapVisitor has a package-private constructor. Initialization of MapVisitor done using special builder is very simple and flexible:

        
MapVisitor<Void> printVisitor = MapVisitor.builder(Void.class)
        .with(Book.class, book -> {System.out.println(book.getTitle()); return null;})
        .with(Magazine.class, magazine -> {System.out.println(magazine.getName()); return null;})
        .build();

MapVisitor usage is similar to one of the traditional Visitor:

someBook.accept(printVisitor);
someMagazine.accept(printVisitor);

Our MapVisitor has one more benefit. All methods declared in interface of a traditional visitor must be implemented. However, often some methods cannot be implemented.

For example, we want to implement application that demonstrates various actions that animals can do. The user can choose an animal and then make it do something by selecting specific action from the menu.
Here is the list of animals: Duck, Penguin, Wale, Ostrich
And this is the list of actions: Walk, Fly, Swim.
We decided to have visitor per action: WalkVisitor, FlyVisitor, SwimVisitor. Duck can do all three actions, Penguin cannot fly, Wale can only swim and Ostrich can only walk. So, we decided to throw exception if a user tries to cause Wale to walk or Ostrich to fly. But such behavior is not user friendly. Indeed, a user will get error message only when he presses the action button. We would probably prefer to disable irrelevant buttons. MapVisitor allows this without additional data structure or code duplication. We even do not have to define new or extend any other interface. Instead we prefer to use standard interface java.util.Predicate:

public class MapVisitor<R> implements 
        Function<Class<? extends Visitable>, MonoVisitor<? extends Visitable, R>>, 
        Predicate<Class<? extends Visitable>> {
    private final Map<Class<? extends Visitable>, MonoVisitor<? extends Visitable, R>> visitors;
    ...............
    @Override
    public boolean test(Class<? extends Visitable> clazz) {
        return visitors.containsKey(clazz);
    }
}

Now we can call function test() in order to define whether action button for selected animal has to be enabled or shown.

Full source code of examples used here is available on github.

Conclusions

This article demonstrates several improvements that make the good old Visitor pattern more flexible and powerful. The suggested implementation avoids some boiler plate code necessary for implementation of classic Vistor pattern. Here is the brief list of improvements explained above.

visit() methods of Visitor described here can return values and therefore may be implemented as pure functions [3] that help to combine Visitor pattern with functional programming paradigm.
Breaking monolithic Visitor interface into separate blocks makes it more flexible and simplifies the code maintenance.
MapVisitor can be configured using builder at runtime, so it may change its behavior depending on information known only at runtime and unavailable during development.
Visitors with different return type can be applied to the same Visitable classes.
Default implementation of methods done in interfaces removes a lot of boiler plate code usual for typical Visitor implementation.

References

Wednesday, February 13, 2019

Two ways to extend enum functionality

Preface

In my previous article I explained how and why to use enums instead of switch/case control structure in Java code. Here I will show how how to extend functionality of existing enums.

Introduction

Java enum is a kind of a compiler magic. In the byte code any enum is represented as a class that extends abstract class java.lang.Enum and has several static members. Therefore enum cannot extend any other class or enum: there is no multiple inheritance.

Class cannot extend enum as well. This limitation is enforced by compiler.

Here is a simple enum:

enum Color {red, green, blue}

This class tries to extend it:

class SubColor extends Color {}

This is the result of an attempt to compile class SubColor:

$ javac SubColor.java 
SubColor.java:1: error: cannot inherit from final Color
class SubColor extends Color {}
                       ^
SubColor.java:1: error: enum types are not extensible
class SubColor extends Color {}
^
2 errors






Enum cannot either extend or be extended. So, how is it possible to extend its functionality? The key word is "functionality". Enum can implement methods. For example enum Color may declare abstract method draw() and each member can override it:
enum Color {
    red { @Override public void draw() { } },
    green { @Override public void draw() { } },
    blue { @Override public void draw() { } },
    ;
    public abstract void draw();
}




Popular usage of this technique is explained here.
Unfortunately it is no always possible to implement method in enum itself because:

the enum may belong to third party library or other team in the company
the enum is probably overloaded with too many other data and functions so it becomes not readable
the enum belongs to module that does not have dependencies required for implementation of method draw().


This article suggests the following solutions for this problem.


Mirror enum

We cannot modify enum Color? No problem! Let's create enum DrawableColor that has exactly same elements as Color. This new enum will implement our method draw():
enum DrawableColor {
    red { @Override public void draw() { } },
    green { @Override public void draw() { } },
    blue { @Override public void draw() { } },
    ;
    public abstract void draw();
}



This enum is a kind of reflection of source enum Color, i.e. Color is its mirror.
But how to use the new enum? All our code uses Color, not DrawableColor. The simplest way to implement this transition is using built-in enum methods name() and valueOf() as following:
Color color = ...
DrawableColor.valueOf(color.name()).draw();


Since name() method is final and cannot be overridden and valueOf() is generated by a compiler these methods are always fit each other, so no functional problems are expected here. Performance of such transition is good also: method name() even does not create new String but returns pre-initialized one (see source code of java.lang.Enum). Method valueOf() is implemented using Map, so its complexity is O(1).



The code above contains obvious problem. If source enum Color is changed the secondary enum DrawableColor does not know this fact, so the trick with name() and valueOf() will fail at runtime. We do not want this to happen. But how to prevent possible failure? We have to let DrawableColor to know that its mirror is Color and enforce this preferably at compile time or at least at unit test phase. Here we suggest validation during unit tests execution. Enum can implement static initializer that is executed when enum is mentioned in any code. This actually means that if static initializer validates that enum DrawableColor fits Color it is enough to implement test like following to be sure that the code will be never broken in production environment:
@Test
public void drawableColorFitsMirror {
    DrawableColor.values();
}


Static initializer just have to compare elements of DrawableColor and Color and throw exception if they do not match. This code is simple and can be written for each particular case. Fortunately simple  open source library named enumus already implements this functionality, so the task becomes trivial:
enum DrawableColor {
    ....
    static {
        Mirror.of(Color.class);
    }
}


That's it. The test will fail if source enum and DrawableColor do not fit it any more. Utility class Mirror has other method that gets 2 arguments: classes of 2 enums that have to fit. This version can be called from any place in code and not only from enum that has to be validated.












EnumMap
Do we really have to define another enum that just holds implementation of one method? In fact, we do not have to. Here is an alternative solution. Let's define interface Drawer as following:
public interface Drawer {
    void draw();
}


Now let's create mapping between enum elements and implementation of interface Drawer:

Map<Color, Drawer> drawers = new EnumMap<>(Color.class) {{
    put(red, new Drawer() { @Override public void draw();});
    put(green, new Drawer() { @Override public void draw();})
    put(blue, new Drawer() { @Override public void draw();})
}}

The usage is simple:

drawers.get(color).draw();

EnumMap is chosen here as a Map implementation for better performance. Map guaranties that each enum element appears there only once. However, it does not guarantee that there is entry for each enum element. But it is enough to check that size of the map is equal to number of enum elements:

drawers.size() == Color.values().length

Enumus suggests convenient utility for this case also. The following code throws IllegalStateException with descriptive message if map does not fit Color:

EnumMapValidator.validateValues(Color.class, map, "Colors map");

It is important to call the validator from the code which is executed by unit test. In this case the map based solution is safe for future modifications of source enum.

EnumMap and Java 8 functional interface

In fact, we do not have to define special interface to extend enum functionality. We can use one of functional interfaces provided by JDK starting from version 8 (Function, BiFunction, Consumer, BiConsumer, Supplier etc.) The choice depends on parameters that have to be sent to the function. For example, Supplier can be used instead of Drawable defined in the previous example:

Map<Color, Supplier<Void>> drawers = new EnumMap<>(Color.class) {{
    put(red, new Supplier<Void>() { @Override public void get();});
    put(green, new Supplier<Void>() { @Override public void get();})
    put(blue, new Supplier<Void>() { @Override public void get();})
}}

Usage of this map is pretty similar to one from the previous example:

drawers.get(color).get();

This map can be validated exactly as the map that stores instances of Drawable.

Conclusions

This article shows how powerful can be Java enums if we put some logic inside. It also demonstrates two ways to expand the functionality of enums that work despite the language limitations. The article introduces to user the open source library named enumus that provides several useful utilities that help to operate enums easier.

Featured enum instead of switch

Problem and its solution

Switch/case is the common control structure implemented in most imperative programming languages. Switch is considered more readable than series of if/else.

Here is a simple example:

// Switch with int literal
switch (c) {
  case 1: one(); break;
  case 2: two(); break;
  case 3: three(); break;
  default: throw new UnsupportedOperationException(String.format("Operation %d is not supported", c));
}

Here is the list of the main problems in this code:

Relationship between int literals (1, 2, 3) and executed code is not obvious.
If one of the values (e.g. 2) becomes not supported anymore and this switch is not updated accordingly it will contain forever the unused code.
If new possible value of c (e.g. 4) is introduced and the switch is not updated accordingly the code will probably throw UnsupportedOperationException at runtime without any compile time notifications.
Such switch structure tends to be duplicated several times in code that makes problems 2 and 3 even more complicated.

The first simplest fix can be done by using int constants instead of literals. First, let's define constants:

private static int ONE = 1;
private static int TWO = 2;
private static int THREE = 3;

Now the code will look like this:

switch (c) {
  case ONE: one(); break;
  case TWO: two(); break;
  case THREE: three(); break;
  default: throw new UnsupportedOperationException(String.format("Operation %d is not supported", c));
}

(Obviously in real life the names of the constants must be self descriptive)
This snippet is more readable but all other disadvantages are still relevant. The next attempt to improve the initial code snippet uses enums introduced to Java language in version 5 in 2004. Let's define the following enum:

enum Action {ONE, TWO, THREE}

Now the switch snippet will be slightly changed:

Action a = ...
switch (a) {
  case ONE: one(); break;
  case TWO: two(); break;
  case THREE: three(); break;
  default: throw new UnsupportedOperationException(String.format("Operation %s is not supported", a));
}

This code is a little bit better: it will produce compilation error if one of the elements is removed from enum Action. However, it will not cause compilation error if additional element is added to enum Action. Some IDEs or static code analysis tools may produce warning in this case, but who is paying attention to warnings? Fortunately enum can declare abstract method that has to be implemented by each element:

enum Action {
  ONE { @Override public void action() { } }, 
  TWO { @Override public void action() { } }, 
  THREE { @Override public void action() { } }, 
  public abstract void action();
}

Now the switch statement can be replaced by single line:

Action a = ...
a.action();

This solution does not have any of disadvantages enumerated above:

It is readable. The method is "attached" to enum element; one can write as many javadoc as it is needed if method meaning is unclear. The code that calls method is trivial: what can be simpler than method invocation?
There is no way to remove enum constant without removing the implementation, so no unused code will remain if some functionality is no longer relevant.
New enum element cannot be added without implementation of method action(). Code without implementation can't be compiled.
If several actions are required they all can be implemented in enum. As we already mentioned the code that calls specific function is trivial, so now there is no code duplication.

Conclusion

Although switch/case structure is well known and widely used in various programming languages its usage may cause a lot of problems. Solution that uses java enums and described above does not have these disadvantages. The next article from this series shows how to extend functionality of existing enum.

Saturday, February 2, 2019

Syntax highlighting

I have written a lot of blog posts that contain code snippets in several programming languages (mostly Java). I separated each code snippet by empty line using monospace font to improve readability. Changing font type for code snippets is annoying and does not create the best results I want: I prefer highlighted code.

So, I searched for tools that can do this work for me and found 2 types of tools:

Tools that take your code snippet and produce HTML that can be embedded into any blog post
Tools that do this transformation at runtime, so the code snippet remains clear text.

The tools of the first type are often very flexible and support various color themes but have a serious disadvantage: they generate almost not editable HTML. If you want to change your code snippet you mostly have to regenerate its HTML representation. This also mean that you have to store your original snippet for future use, for example as a GitHub gist. It is not a show stopper but an obvious disadvantage.

The tools of the second type do their magic at runtime. The code snippet remains human readable. The injected java script runs when page is loaded and changes color of reserved words of the programming language used for the embedded code snippet.

The most popular and good looking syntax highlighter that I found is one created by Alex Gorbabchev.

Here is an example of code snippet highlighted by this tool:

public class MyTest {
    @Test
    public void multiply() {
        assertEquals(4, 2* 2);
    }
}

There are 2 things I had to do to make this magic happen:

Include several scripts and CSS files into HTML header
Write the code snippet into <pre> tag with specific style:

public class MyTest {
    @Test
    public void multiply() {
        assertEquals(4, 2* 2);
    }
}

Typically external resources (either scripts or CSS) are included by reference, i.e.

<script src='http://domain.com/path/script.js' type='text/javascript'></script> 
<link href='http://domain.com/path/style.css' rel='stylesheet' type='text/css'/>

This works perfectly with Syntax highlighter scripts in stand alone HTML document but did not work when I added the scripts to the themes of my blog. Discovery showed that blogger.com for some reason changed absolute resource references to relative once, so they did not work. Instead of src="http://domain.com/path/script.js" that I have written the following appeared: src="//domain.com/path/script.js", i.e. the http is omitted.

So, I have downloaded all scripts to be able to put their source code directly as a body of tag <script>. For convenience and better performance I have minimized the scripts using one of online tools available in web. The code is available here. This code should be added to <head> of the HTML page.

Now I can enjoy the great syntax highlighter.

Thursday, January 4, 2018

Why Gradle is called Gradle

Today I asked myself: what does name "Gradle" mean. I asked Google and here are the first 2 answers:

Answer #1

It's not an abbreviation, and doesn't have any particular meaning.

The name came from Hans Docter (Gradle founder) who thought it sounded cool.

Answer #2

hansdHans Dockter

Dec '11

My original idea was to call it Cradle. The disadvantages of that name were:

to diminutive
- not very unique

As Gradle is using Groovy for the DSL I went down the G-road and thought about calling it Gradle. Everyone I asked liked it so that became the official name. That was about 4 years ago. I'm still very happy with the name.

Conclusions: both answers are correct

This name indeed was invented by the Gradle founder Hans Dockter and IMHO he thinks that the name is cool.
The name has some meaning

Thursday, December 7, 2017

DB #42

DB #42 or how to choose technology :)

We have to implement a new feature. No matter what kind of feature it is. Well, we have to expose REST API, store and retrieve some dynamic data. We spent a lot of time choosing DB. The chief architect wanted MySql. Well, this idea have not pass our sanity filter. The data architect wanted AeroSpike, team leader suggested MongoDB that was blamed by DevOps that proposed Redis.

But as we all know the absolute answer to all questions is 42. So, we decided to choose the DB #42 from alphabetically sorted list of NoSQL databases.

I opened Google and typed "list of nosql databases". Then I have chosen the following article. I had to extract a list of names of the databases and did it using the following command:

curl http://nosql-database.org/ | grep h3 | grep href > /tmp/db.hrefs.txt

(because all DB names in this list are surrounded by tag <a> with reference to their web page)

There was one case when 2 databases were written in one physical line and I fixed this manually.

Then I ran the following command to get the alphabetical list of DB names:

cat /tmp/db.hrefs2.txt | sed 's#</a>.*##' | sed 's/.*>//' | sort > /tmp/db.names.txt

The last phase it to print this list with attached numbers:

i=1; cat /tmp/db.names.txt | while read l; do echo "$i,$l"; i=`expr $i + 1`; done

Here is the list:

1	Accumulo
2	acid-state
3	Aerospike
4	AlchemyDB
5	allegro-C
6	AllegroGraph
7	Amazon SimpleDB
8	AmisaDB:
9	Apache Flink
10	Applied Calculus
11	ArangoDB
12	ArangoDB
13	ArangoDB
14	Axibase
15	Azure DocumentDB
16	Azure Table Storage
17	BagriDB
18	BangDB
19	BaseX
20	BayesDB
21	BergDB
22	Berkeley DB
23	Berkeley DB XML
24	Bigdata
25	BinaryRage
26	BoltDB
27	BrightstarDB
28	BrightstarDB
29	Cachelot
30	Cassandra
31	Chordless
32	Chronicle Map
33	Cloudata
34	Cloud Datastore
35	Cloudera
36	Clusterpoint Server
37	CodernityDB
38	ConcourseDB
39	CoreObject
40	CortexDB
41	Couchbase Server
42	CouchDB
43	Crate Data
44	DaggerDB
45	Datomic
46	db4o
47	DBreeze
48	densodb
49	djondb
50	Druid
51	DynamoDB
52	Dynomite
53	EJDB
54	Elassandra
55	Elastic
56	Elliptics
57	EMC Documentum xDB
58	ESENT
59	Eventsourcing for Java (es4j)
60	Event Store
61	Execom IOG
62	eXist
63	eXtremeDB
64	eXtremeDB
65	eXtremeDB Financial Edition
66	EyeDB
67	Faircom C-Tree
68	Fallen 8
69	FileDB:
70	filejson
71	FlockDB
72	FoundationDB
73	FramerD
74	GemFire
75	GemStone/S
76	GenieDB
77	Genomu
78	GigaSpaces
79	Globals:
80	GPUdb
81	GraphBase
82	GridGain
83	GT.M
84	gunDB
85	gunDB
86	gunDB
87	Hadoop / HBase
88	Hazelcast
89	Hibari
90	HPCC
91	HSS Database
92	HyperDex
93	HyperGraphDB
94	Hypertable
95	IBM Cloudant
96	IBM Informix
97	IBM Lotus/Domino
98	iBoxDB
99	Infinispan
100	Infinite Graph
101	InfinityDB
102	influxdata
103	InfoGrid
104	Informix Time Series Solution
105	Intersystems Cache
106	ISIS Family
107	JADE
108	JasDB
109	jBASE
110	JEntigrator
111	JSON ODM
112	KAI
113	kdb+
114	KirbyBase
115	KitaroDB
116	KUDU
117	LevelDB
118	LightCloud
119	LSM
120	Magma
121	MapR
122	MarcelloDB
123	MarkLogic Server
124	Maxtable
125	MemcacheDB
126	MentDB:
127	Meronymy
128	MiniM DB
129	Mnesia
130	Model 204 Database
131	MonetDB
132	MongoDB
133	Moonshadow
134	Morantex
135	NCache
136	NDatabase
137	NeDB
138	NEO
139	Neo4J
140	nessDB
141	Newt DB
142	Ninja Database Pro
143	NosDB
144	NoSQL embedded db
145	ObjectDB
146	Objectivity
147	Onyx Database
148	OpenInsight
149	OpenLDAP
150	OpenLink Virtuoso
151	OpenQM
152	Oracle Coherence
153	Oracle NOSQL Database
154	OrientDB
155	OrientDB
156	Perst
157	PickleDB
158	PicoLisp
159	Pincaster
160	pipelinedb
161	Prevayler
162	Qizx
163	quasardb
164	Queplix
165	RaptorDB
166	RaptorDB
167	rasdaman
168	RavenDB
169	RDM Embedded
170	Reality
171	ReasonDB
172	Recutils:
173	Redis
174	RethinkDB
175	Riak
176	Riak TS
177	RockallDB
178	RocksDB
179	Scalaris
180	Scalien
181	SciDB
182	SCR Siemens Common Repository
183	Scylla
184	SDB
185	Sedna
186	SequoiaDB
187	Serenety
188	SharedHashFile
189	siaqodb
190	SisoDB
191	Sophia
192	Sparksee
193	Splice Machine
194	SpreadsheetDB
195	Starcounter
196	Sterling
197	STSdb
198	Symas LMDB
199	Tarantool/Box
200	TayzGrid
201	Terrastore
202	ThruDB
203	TIBCO Active Spaces
204	Tieto TRIP
205	TigerLogic PICK
206	TITAN
207	Tokutek:
208	Tokyo Cabinet / Tyrant
209	ToroDB
210	TreodeDB
211	Trinity
212	U2
213	upscaledb
214	VaultDB
215	VelocityDB
216	Versant
217	VertexDB
218	Voldemort
219	Vyhodb
220	weaver
221	WhiteDB
222	WonderDB
223	Yserial
224	ZODB

And the winner is .... CouchDB - number 42!

Conclusions

Every time you cannot agree which technology to choose just find the longest list of relevant technologies, sort them alphabetically and choose #42.

Thursday, March 17, 2016

Performance of try/catch

Yesterday I had a discussion with my friend and he said that according to his opinion try/catch statement in java is very performance expensive. Indeed it is always recommended to check value prior using it instead of try to use and then catch exception if wrong value caused its throwing.

I decided to check this and tried several code samples. All functions accept int and return the argument multiplied by 2.
But there were the differences:

just calculate the value and return it (foo())
calculate the value into try block followed by catch block (tryCatch())
calculate the value into try block followed by finally block (tryFinally())
calculate the value into try block followed by catch and finally blocks (tryCatchFinally())
divide integer value by zero into try block followed by catch block that just returns -1 (tryThrowCatch())
divide integer value by zero into try block followed by catch block that re-throws it. Outer try/catch structure catches the secondary exception and returns -1 (tryThrowCatch1())
divide integer value by zero into try block followed by catch block that wraps thrown exception with another RuntimeException and re-throws it. Outer try/catch structure catches the secondary exception and returns -1 (tryThrowCatch2())

I ran each test 100,000,000 times in loop and measured elapsed time. Here are the results.

Test name Elapsed time, ms

foo 46

tryCatch 45

tryFinally 45

tryCatchFinally 44

tryThrowCatch 133

tryThrowCatch1 139

tryThrowCatch2 62293

Test name	Elapsed time, ms
foo	46
tryCatch	45
tryFinally	45
tryCatchFinally	44
tryThrowCatch	133
tryThrowCatch1	139
tryThrowCatch2	62293

Analysis

try/catch/finally structure written in code itself does not cause any performance degradation
throwing and catching exception is 3 times more expensive than simple method call.
wrapping exception with another one and re-throwing it is really expensive.

Conclusions

Catching exceptions itself does not have any performance penalty. Throwing exception is indeed expensive, so validation of values before using them is better not only from design but also from performance perspective.

The important conclusion is that we should avoid using very common pattern in performance critical code:

try {

// some code

} catch (ThisLayerException e) {

throw new UpperLayerException(e);

}

This pattern helps us to use layer specific exceptions on each layer of our code. Exception thrown from lower layer can be wrapped many times that creates extremely long stack trace and causes serious performance degradation. Probably better approach is to extend our domain level exceptions from RuntimeException and wrap only checked exceptions and only once (like Spring does).

Source code

The source code can used here can be found on github.

Wednesday, February 24, 2016

Dangerous String.format()

Introduction

Static method format() that was added to class java.lang.String in java 5 became popular and widely used method that replaced MessageFormat, string concatenation or verbose calls of StringBuilder.append().

However using this method we should remember that this lunch is not free.

Performance issues

This method accepts ellipsis and therefore creates new Object array each time to wrap passed arguments. Extra object is created, extra object must be then removed by GC.
It internally creates instance of java.util.Formatter that parses the format specification. Yet another object and a lot of CPU intensive parsing.
It creates new instance of StringBuilder used to store the formatted data.
At the end it calls StringBuilder.toString() and therefore creates yet another object. The good news is that at least it does not copy the content of StringBulder but passes the char array directly to String constructor.

So, call of String.format() creates at least 4 short leaving objects and parses format specification. In real application it probably parses the same format millions times.

Solution

Use Formatter directly. Compare the following code snippets:

public static CharSequence str() {
 StringBuilder buf = new StringBuilder();
 for (int i = 0; i < n; i++) {
  buf.append(String.format("%d\n", 1));
 }
 return buf;
}


public static CharSequence fmt() {
 StringBuilder buf = new StringBuilder();
 Formatter fmt = new Formatter(buf);
 for (int i = 0; i < n; i++) {
  fmt.format("%d\n", 1);
 }
 return buf;

}

Method fmt() is about 1.5 times faster than method str(). Even better results may be received comparing writing directly to stream instead of creating String and then writing it to stream.

String.format() is Locale sensitive

There are 2 format() methods:

public static String format(String format, Object... args)

and

public static String format(Locale l, String format, Object... args)

Method that does not receives Locale argument uses default locale: Locale.getDefault(Locale.Category.FORMAT) that depends on machine configuration. This means that changing machine settings changes behavior of your application that may even break it. The most common problems are:

decimal separator
digits

Decimal separator

Programmers are so regular that decimal separator is dot (.) that sometimes forget that this depends on locale. I've written simple code snippet that iterates over all available locales and checks what character is used as a decimal separator:

Decimal separator	Number of locales
Dot (.)	71
Comma (,)	89

If produced string is then parsed the parsing may be broken by changing default locale of current machine.

Digits

Everyone knows that digits are 1,2,3,... This is right. But not in any locale. Arabic, Hindi, Thai and other languages use other characters that represent the same digits. Here is a code sample:

for (Locale locale : Locale.getAvailableLocales()) {
 String one = String.format(locale, "%d", 1);
 if (!"1".equals(one)) {
   System.out.println("\t" +locale + ": " + one);
 }
}

And this is its output when it is running on Linux machine with java 8:

hi_IN: १
th_TH_TH_#u-nu-thai: ๑

Being executed on Android this code produces 109 lines long output. It includes:

all versions of Arabic locales,
as, bn, dz, fa, ks, mr, my, ne, pa, ps, uz with dialects.

This may easily break application on some locales.

Conclusions

Since java formatting is locale dependent it should be used very carefully. Probably in some cases it is better to specify locale explicitly, e.g. Locale.US.
Be careful when calling String.format() in performance critical sections of code. Using other API (e.g. direct invocation of class Formatter) may significantly improve performance.

Acknowledgements

I'd like to thank Eliav Atoun that inspired discussion about this issue and helped me to try the code sample on Android.

Source code

Code snippets used here may be found on github.

Thursday, December 17, 2015

Creating a self-extracting tar.gz

Motivation

Winzip can create self-extracting executable. Actually this is the unzip utility together with content of zip file that is being extracted when running the executable.

Recently I was working on distribution package of our software for Linux. On Windows we have a huge zip file and a small script that we run once zip is extracted. Scripts for Linux are a little bit more complicated due to necessity of granting permissions, creating user and group etc. So, I wrote script that does all necessary actions including the archive extracting. The disadvantage of this is that now the distribution consists of at least 2 files: the archive and the script.

So, I decided to check how to create self extracting archive on Linux.

Used commands

I started from the following exercise:

#!/bin/sh

echo $0

exit

foo bar

This script runs ignoring the last line "foo bar".

Then I created tar.gz file and appended it to this script:

cat script.sh my.tar.gz >script.with.tar.sh

Although now the script contains binary content of tar.gz it runs as expected.

But I want to create self extracting script. Therefore I need a way to separate script from its "attachment". Command "dd" helps to implement this:

dd bs=1 skip=$SCRIPT_PREFIX if=$SELF_EXTRACTING_TAR

But how can script with attachment extract size of its scripting part? If script contains known number of lines (e.g. 3) we can do the following:

head -3 $SCRIPT | wc -c

Script can access its own name using variable $0.

Taking both commands together we can write line that extracts attachment appended to script:

dd bs=1 skip=`head -3 $0 | wc -c` if=$0

Extracting tar.gz can be achieve using command

gunzip -c | tar -x

So, this command extracts the attached tar.gz to current directory:

dd bs=1 skip=`head -3 $0 | wc -c` if=$0 | gunzip -c | tar -x

Script

It is a good idea to create script that takes regular tar.gz and creates self-extracting archive.
The script is available here. It accepts the following arguments:

mandatory path to tar.gz
optional command that is automatically executed right after the archive extracting. Typically it is script packaged into the tar.

The name of resulting executable is as name of tar.gz with suffix ".self".

Usage Examples:

Create self-extracted executable:

./selftar.sh my.tar.gz

Create self-extracted executable with command executed after archive extracting:

./selftar.sh my.tar.gz "folder/install.sh"

Both examples create executable file my.tar.gz.self that extracts files initially packaged to my.tar.gz to current directory:

./my.tar.gz.self

Usage

This script and article were inspired by my work on distribution package based on simple archive. Generally this technique is wrong by definition. Distribution package depends on the target platform and should be created utilizing the platform specific tools: .msi for MS Windows, .rpm for RedHat, .deb for Debian etc. Self-extracting executable however allows creating simple package that can be used on most Unix based platforms that is very convenient especially when the packaged application is written in cross-platform language like java.

Saturday, October 16, 2021

Preface

Introduction

Objections

Conventions source

Conventions and personal preferences

Conventions types

Code formatting

Naming

Best coding practices

Conventions are dynamic

Conventions and agile

Coding standards for different languages

Conclusions

Monday, August 23, 2021

Sunday, March 17, 2019

Monday, March 4, 2019

Introduction

Classical Visitor

Downsides of visitor pattern

Improvements to classical Visitor pattern

Return value

Default implementation

MonoVisitor

Function reference instead of Visitor

Conclusions

References

Wednesday, February 13, 2019

Preface

Introduction

Mirror enum

EnumMap

EnumMap and Java 8 functional interface

Conclusions

Problem and its solution

Conclusion

Saturday, February 2, 2019

Thursday, January 4, 2018

Answer #1

Answer #2

Conclusions: both answers are correct

Thursday, December 7, 2017

DB #42 or how to choose technology :)

Conclusions

Thursday, March 17, 2016

Test name Elapsed time, ms foo 46 tryCatch 45 tryFinally 45 tryCatchFinally 44 tryThrowCatch 133 tryThrowCatch1 139 tryThrowCatch2 62293

Analysis

Conclusions

Source code

Wednesday, February 24, 2016

Introduction

Performance issues

Solution

String.format() is Locale sensitive

Decimal separator

Digits

Conclusions

Acknowledgements

Source code

Thursday, December 17, 2015

Motivation

Used commands

Script

Usage

Test name Elapsed time, ms

foo 46

tryCatch 45

tryFinally 45

tryCatchFinally 44

tryThrowCatch 133

tryThrowCatch1 139

tryThrowCatch2 62293