Benchmarking String.intern() with JMH

String interning overview

From the Oracle Javadocs:

String.intern() returns a canonical representation for the string object.

In other words interned strings are pooled so that there is one instance of every string (the canonical representation) in memory. This also means interned strings can be  compared using the ‘==’ operator rather than equals() since there’s no possibility of having two identical strings with two different memory addresses (provided all strings are interned).

The downside is that invoking intern() is going to be more taxing in cpu time than a mere string allocation.

The upside is that interning optimises for memory consumption – function of how many dynamically built strings are generated by the application, and how many of these strings are unique.

Microbenchmarking

The cost/benefit of string interning need to be assessed on a case-by-case basis by taking appropriate time measurements.

The classic way to do so is to rely on a stopwatch to calculate the elapsed time before and after the operation being measured. This technique works relatively well for large, macro benchmarks when the operation being measured takes more than a few seconds eg. a database lookup.

Stopwatches however fail to take into account the many tricks used by the jvm to optimize the code at runtime: warmup, inlining, dead code elimination, loop unrolling etc. and this can lead to  biased results when dealing with millis/microseconds measurements.  A preferred option in that case is to use a microbenchmarking framework for Java, such as Caliper or JMH , which will generate benchmark code taking into account the above pitfalls.

Benchmark example with JMH

[gist  https://gist.github.com/eleco/d4096caa751eda96bf8f /]

Results

In the scenario above interning improves performance significantly. The cost of the intern() method is moot when it significantly reduces gc pressure overall.

JMH_interning