Java String concatenation using a loop

A JMH benchmark comparison

The wrong way to concatenate strings using a loop

String result = "";
String textToAppend = "WhateverWhatever";
int iterationCount = 16;
for (int i = 0; i < iterationCount; ++i) {
    result = result + textToAppend;
}

This code creates a new String object, places it onto the Java object heap, and points the variable called result at it. Then execution enters the loop, and every iteration of the loop creates a new String object whose content is equal to the result so far followed by the additional text to append. Each time the new, longer String is created, the variable result is pointed to the newest String. Which means that the older String object is dereferenced (the variable result no longer points to it), and its data sits around taking up space on the object heap until Java's garbage collector comes along and finds it (which could be some time).

All of this object creation and (eventual) garbage collection takes up CPU time and RAM space which could be avoided.

The ideal way to concatenate strings using a loop

String textToAppend = "WhateverWhatever";
int iterationCount = 16;
StringBuilder sb = new StringBuilder(iterationCount * textToAppend.length());
for (int i = 0; i < iterationCount; ++i) {
    sb.append(textToAppend);
}
String result = sb.toString();

This code creates a StringBuilder object with an initial capacity which is large enough to hold the entire finished text sequence (simply the number of iterations multiplied by the length of the text we append each time). Then every iteration of the loop just calls the StringBuilder.append(CharSequence) method to add the additional text to the pre-sized buffer within the StringBuilder. No additional objects are created within the loop, and writing characters into a pre-sized buffer is a very cheap process.

What if you don't know the ideal capacity when creating the StringBuilder?

Use StringBuilder anyway. Even if you have no idea what capacity will be needed and simply opt for the default capacity (16 characters in JDK 15) you will still get better performance than you'll get from using the + concatenation operator with two String objects.

Behind the scenes, the StringBuilder object will only need to create a new storage buffer if the current one fills up, and it increases the size by a decent factor each time this happens so that resizes don't need to occur every time additional text is appended.

What's the difference?

A JMH benchmark was created to compare the different approaches and measure the effect on throughput (operations per second, or how many times per second the benchmark method can run from start to finish) and on allocation rate (bytes per operation, or how many bytes of data are written to the heap each time the benchmark method runs).

The benchmark was configured to test three different loop sizes.

Small loop with 16 iterations

The "wrong" approach was recorded as achieving a mean average throughput of 2,179,349 ops/s (operations per second), while the "ideal" approach achieved a mean average throughput of 4,979,080 ops/s. Which means that for this small loop the "ideal" method is 2.3 times faster than the "wrong" approach.

The "wrong" approached recorded a mean allocation rate of 2,817 bytes/op (bytes per operation), while the "ideal" recorded a mean allocation rate of 568 bytes/op. So the "wrong" method writes 5.0 times as much data to the heap as the "ideal" approach.

Note that the "default" approach (using a StringBuilder with the default initial capacity) achieved throughput of 3,381,174 ops/s, and allocated 936 bytes/op. So its performance is not as good as the "ideal" (StringBuilder with the ideal initial capacity), but it's considerably better than the "wrong" approach.

(Note that the full JMH data shows that none of the 99.9% CIs overlap for the above statistics, so the differences are statistically significant. As a side note: be warned that if a benchmark shows that the CIs do overlap, this does not necessarily imply that no statistically significant difference exists, and further calculation would be needed to reach a conclusion.)

Medium loop with 256 iterations

The "wrong" approach managed a throughput of only 19,651 ops/s, while the "ideal" approach achieved 343,222 ops/s. So for this medium loop the "ideal" method is 17.5 times faster than the "wrong" approach.

The "wrong" approach allocated 536,651 bytes/op, and the "ideal" allocated 8,251 bytes/op. So the "wrong" approach was writing 65 times as much data to the heap.

The "default" approach achieved 263,528 ops/s, and allocated 13,484 bytes/op. So it was somewhat slower and greedier than the "ideal" approach, but vastly better than the "wrong" approach.

(Again, the 99.9% CIs do not overlap, so the differences in the above statistics are all statistically significant.)

Large loop with 4,096 iterations

The "wrong" approach managed a throughput of only 91 ops/s, while the "ideal" approach achieved 20,727 ops/s. So the "ideal" approach is 228 times faster than the "wrong" approach for this large loop.

The wrong approach allocated 134,431,629 bytes/op, while the "ideal" approach allocated only 131,180 bytes/op. Which means that the "wrong" approach was writing 1025 times as much data to the heap.

Bear in mind that the final text result can be described using just 16 characters (size of text fragment used in benchmark) × 2 bytes-per-character (UTF-16 in Java) × 4096 (loop iterations) = 131,072 bytes. Which means that the overhead (used by the internal structure of the one StringBuilder and the one String object) is just 108 bytes for the "ideal" approach. But the overhead of the "wrong" approach is more than 134 megabytes, all required just to hold incomplete fragments of the final text, and the internal structural overhead of more than four-thousand String objects which are created and quickly become obsolete.

The "default" approach achieved 18,182 ops/s, and allocated 213,287 bytes/op. So not much slower nor greedier than the "ideal" approach, but hugely faster and more memory efficient than the "wrong" approach.

(Again, the 99.9% CIs do not overlap, so the differences in the above statistics are all statistically significant.)

The JMH benchmark code

Define a state object

@org.openjdk.jmh.annotations.State(Scope.Thread)
public static class State {

    @Param({"16", "256", "4096"})
    public int iterationCount;

    public String textToAppend;

    @Setup(Level.Iteration)
    public void setupIteration() {
        textToAppend = "<TEXT TO APPEND>";
    }
}

This State class is annotated with @org.openjdk.jmh.annotations.State, so JMH creates an instance of it and calls its setup methods in a way that does not affect the benchmark results. This allows the textToAppend field to be defined ready for use in the benchmark methods. It also allows the number of iterations to be parameterised, by adding the @Param annotation to the iterationCode field. JMH reads the values from the annotation and automatically runs each benchmark test with each of the parameter values, so the benchmarks will be tested with loop size of 16, then 256, then 4096.

The benchmark methods

@Benchmark
public void plusOperator(State state, Blackhole blackhole) {
    String result = "";
    for (int i = 0; i < state.iterationCount; ++i) {
        result = result + state.textToAppend;
    }
    blackhole.consume(result);
}

@Benchmark
public void stringBuilder(State state, Blackhole blackhole) {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < state.iterationCount; ++i) {
        sb.append(state.textToAppend);
    }
    String result = sb.toString();
    blackhole.consume(result);
}

@Benchmark
public void stringBuilderIdealCapacity(State state,
        Blackhole blackhole) {
    StringBuilder sb = new StringBuilder(
            state.iterationCount * state.textToAppend.length());
    for (int i = 0; i < state.iterationCount; ++i) {
        sb.append(state.textToAppend);
    }
    String result = sb.toString();
    blackhole.consume(result);
}

JMH will run each method marked with the @Benchmark annotation. The State object gives access to the iterationCount and the textToAppend values. And the Blackhole is needed because if the result of each method is not returned or fed to the Blackhole then the Java compiler or JIT compiler is likely to optimize the code away and leave the benchmark measuring an empty method body.

All three benchmark methods outwardly behave the same way. That is: all three take a number of iterations and a text fragment, and then return a String which represents the result of repeating that text fragment the given number of times. So it is fair and meaningful to compare these three methods with each other. The difference in performance comes from how they operate internally: one method uses the + operator, one uses a StringBuilder with its default initial capacity, and one uses a StringBuilder with an initial capacity that will exactly hold the final assembled text sequence.

Define a main method which will run the test

public static void main(String[] args) throws RunnerException {
    Options opt = new OptionsBuilder()
            .include(StringConcatenationInLoop.class.getSimpleName())
            .warmupIterations(3)
            .warmupTime(TimeValue.seconds(2))
            .measurementIterations(5)
            .measurementTime(TimeValue.seconds(3))
            .forks(3)
            .mode(Mode.Throughput)
            .timeUnit(TimeUnit.SECONDS)
            .shouldDoGC(true)
            .addProfiler(StackProfiler.class)
            .addProfiler(GCProfiler.class)
            .result("jmh_StringConcatenationInLoop.json")
            .resultFormat(ResultFormatType.JSON)
            .build();

    new Runner(opt).run();
}

An Options object is defined and then a Runner object created and run using those Options. The options here add the current benchmark class (named StringConcatenationInLoop), and specify that each "fork" should have 3 warmup iterations (of at least 2 seconds) followed by 5 measurement iterations (of at least 3 seconds), and there will be 3 "forks". This will give 5×3 = 15 measurement iterations in total.

Mode.Throughput is used (to show ops/sec) and the GCProfiler is added along with a call to shouldDoGC(true) so that the Java garbage collector is run and studied in order to estimate the amount of data allocated to heap memory.

JSON file output is requested by calling result with the desired output file name, and calling resultFormat(ResultFormatType.JSON).

JMH data

Below is the data reported by the JMH benchmark when run on my machine. To make it easier to read, the numbers have been truncated (but keep at least five significant digits).

Data from the JMH benchmarks for a loop of 16 iterations.
Statistic	plusOperator	stringBuilder	stringBuilderIdeal
Throughput mean (ops/sec)	2,179,349	3,381,174	4,979,080
Throughput CI 99.9% lower	2,090,128	3,123,654	4,796,509
Throughput CI 99.9% upper	2,268,569	3,638,693	5,161,651
gc.alloc.rate.norm mean (bytes/op)	2,816.6	936.33	568.22
gc.alloc.rate.norm CI 99.9% lower	2,816.5	936.29	568.21
gc.alloc.rate.norm CI 99.9% upper	2,816.6	936.37	568.23

Data from the JMH benchmarks for a loop of 256 iterations.
Statistic	plusOperator	stringBuilder	stringBuilderIdeal
Throughput mean (ops/sec)	19,651	263,258	343,222
Throughput CI 99.9% lower	19,189	258,965	335,245
Throughput CI 99.9% upper	20,114	268,090	351,199
gc.alloc.rate.norm mean (bytes/op)	536,651	13,484	8,251.2
gc.alloc.rate.norm CI 99.9% lower	536,648	13,484	8,251.1
gc.alloc.rate.norm CI 99.9% upper	536,654	13,484	8,251.3

Data from the JMH benchmarks for a loop of 4,096 iterations.
Statistic	plusOperator	stringBuilder	stringBuilderIdeal
Throughput mean (ops/sec)	91.120	18,182	20,727
Throughput CI 99.9% lower	86.815	17,432	19,842
Throughput CI 99.9% upper	95.424	18,933	21,612
gc.alloc.rate.norm mean (bytes/op)	134,431,629	213,287	131,180
gc.alloc.rate.norm CI 99.9% lower	134,430,729	213,285	131,177
gc.alloc.rate.norm CI 99.9% upper	134,432,529	213,289	131,182

This data was gathered on a Windows 10 machine with an Intel® Core™ i5-1035G1 CPU and eight gibibytes of RAM, using JMH benchmarks compiled and run on OpenJDK 15.0.1 from within an IDE. (Be aware that for increased precision and reduced data noise, it would be recommended to run from a command line with no other applications running.)

Note that different hardware and different JDK and JVM versions will give different results, so if the findings are important then you should reproduce this benchmark on your own specific environment. However, with differences as strong as are seen in these data, it seems likely that both the "ideal" and "default" approaches will outperform the "wrong" approach on almost all systems for loops of non-trivial sizes.

Takeaway

When concatenating text multiple times using a loop, create a StringBuilder before the loop and use it to build up the text content using the append(String) method. Never use the + operator to concatenate text using a loop.