Launching a new mini-series / category on the blog. Some small performance hints and tips will be from now logged under "Tales from the Dungeon".

The title of the series is a reference to the* package, where all the low level concurrency things are implemented for Akka actors.

The Scala "idiomatic" style of creating arrays is often shown as:

val arr: Array[Byte] = Array.ofDim[Byte](64) // 64 element byte array  

The thing that you most likely missed though, is that this allocates something more -- a ClassTag, which is Scala's way of carrying class type information through methods, where otherwise the information would be erased in runtime (due to generic type erasure on the JVM).

As a short reminder, the signature of ofDim is:

def ofDim[T: ClassTag](n1: Int): Array[T] =  
    new Array[T](n1)

So the [T: ClassTag] here summons the tag, which is equivalent to def ofDim[T](n1: Int)(implicit tag: ClassTag[T]), which perhaps is more clear in highlighting where and how the extra allocation happens. This class tag is created by the scalac compiler, and then passed around, thanks to which we're able to allocate the "right" type of array (be it primitive or object etc). This only really matters for arrays on the JVM, since they're a bit special.

We can write a small benchmark to see how big the impact of this additional allocation is. As usual, we're using OpenJDK JMH via sbt-jmh - the go-to tool for microbenchmarking on the JVM. Here's the bench:

import java.util.concurrent.TimeUnit

import org.openjdk.jmh.annotations.{ Benchmark, BenchmarkMode, Mode, OutputTimeUnit }

class ArraysBenchmark {

  def create_ofDim: Array[Byte] =

  def create_new: Array[Byte] =
    new Array[Byte](32)


Let's have a look at the stack profiler's output for the two runs:

### ofDim ###
[info] Secondary result "scala.ArraysBenchmark.create_ofDim:·stack":
[info]  82.8%  82.8% s.g.ArraysBenchmark_create_ofDim_jmhTest.create_ofDim_avgt_jmhStub
[info]  17.1%  17.1% scala.reflect.ManifestFactory$$anon$6.newArray
[info]   0.0%   0.0%
[info]   0.0%   0.0% sun.misc.Unsafe.compareAndSwapObject
[info]   0.0%   0.0% sun.misc.Unsafe.getIntVolatile

### new ###
[info] Secondary result "scala.ArraysBenchmark.create_new:·stack":
[info]  80.6%  80.6% s.g.ArraysBenchmark_create_new_jmhTest.create_new_avgt_jmhStub
[info]  19.3%  19.3% scala.ArraysBenchmark.create_new
[info]   0.0%   0.0% sun.reflect.DelegatingMethodAccessorImpl.invoke
[info]   0.0%   0.0% java.util.concurrent.ThreadPoolExecutor.getTask
[info]   0.0%   0.0% java.lang.Class.isPrimitive

We already see a major difference in where the time is being spent -- in the ofDim case, a lot of time is being spent in the ManifestFactory, which handles the actual allocation then.

Let's finally look at the impact of this extra allocation. We'll look at average time of allocating one such array, and also at GC impact, which is really important in real world apps, where you'd like the infrastructure code to not cause too much of it. We can do that by running with the gc profiler using jmh:run -f 1 -wi 10 -i 30 -prof gc .*ArraysBench.*:

### new ###
[info] Benchmark                                                  Mode  Cnt     Score     Error   Units
[info] ArraysBenchmark.create_new                                 avgt   30     7.380 ±   2.438   ns/op

[info] ArraysBenchmark.create_new:·gc.alloc.rate                  avgt   30  4476.155 ± 539.496  MB/sec
[info] ArraysBenchmark.create_new:·gc.alloc.rate.norm             avgt   30    48.000 ±   0.001    B/op
[info] ArraysBenchmark.create_new:·gc.churn.G1_Eden_Space         avgt   30  4484.695 ± 655.569  MB/sec
[info] ArraysBenchmark.create_new:·gc.churn.G1_Eden_Space.norm    avgt   30    48.079 ±   3.350    B/op
[info] ArraysBenchmark.create_new:·gc.churn.G1_Old_Gen            avgt   30     0.001 ±   0.001  MB/sec
[info] ArraysBenchmark.create_new:·gc.churn.G1_Old_Gen.norm       avgt   30    ≈ 10⁻⁵              B/op
[info] ArraysBenchmark.create_new:·gc.count                       avgt   30   110.000            counts
[info] ArraysBenchmark.create_new:·gc.time                        avgt   30   351.000                ms

### ofDim ###
[info] ArraysBenchmark.create_ofDim                               avgt   30     6.691 ±   0.357   ns/op

[info] ArraysBenchmark.create_ofDim:·gc.alloc.rate                avgt   30  4581.808 ± 231.257  MB/sec
[info] ArraysBenchmark.create_ofDim:·gc.alloc.rate.norm           avgt   30    48.000 ±   0.001    B/op
[info] ArraysBenchmark.create_ofDim:·gc.churn.G1_Eden_Space       avgt   30  4564.270 ± 425.658  MB/sec
[info] ArraysBenchmark.create_ofDim:·gc.churn.G1_Eden_Space.norm  avgt   30    47.761 ±   3.471    B/op
[info] ArraysBenchmark.create_ofDim:·gc.churn.G1_Old_Gen          avgt   30     0.002 ±   0.001  MB/sec
[info] ArraysBenchmark.create_ofDim:·gc.churn.G1_Old_Gen.norm     avgt   30    ≈ 10⁻⁵              B/op
[info] ArraysBenchmark.create_ofDim:·gc.count                     avgt   30   112.000            counts
[info] ArraysBenchmark.create_ofDim:·gc.time                      avgt   30   350.000                ms

So as you can see, it's not a large impact, however it does show up a bit in the Eden churn. These results also include the absolute numbers, however I would not be too fixated around them, the purpose of this post was to highlight the additional allocation you may not have been aware of before. In general through, the effect on allocation is noticeable, at least in the lesser variance in the results when allocating using raw new operators rather than ofDim.

The usual caveat applies to this finding, most of the time in application code you don't have to worry about such minor impact and can keep using the idiomatic way. However if you have code which does care, now you know one more thing about how to avoid needless allocations.

For reference, this was only run on a MacBook Pro, 3.1 GHz Intel Core i7, 16 GB 2133 MHz LPDDR3, though with some background tasks running.

Does this mean one should not use ofDim? Not really, it usually is good enough, however if in a hot-path of your program, you may consider avoiding it. Another thing to mention is perhaps the useful overloads that ofDim provides:

/** Creates a 2-dimensional array */
  def ofDim[T: ClassTag](n1: Int, n2: Int): Array[Array[T]] = 

/** Creates a 3-dimensional array */
  def ofDim[T: ClassTag](n1: Int, n2: Int, n3: Int): Array[Array[Array[T]]] =

which are a bit nicer than having to manually loop through the outer array allocating the inner arrays.

Hope this mini-post was useful; I'll hope to write more such mini-posts if I get the time, until then–happy hakking!