Artenum Innovative Solutions - Multi Dimensional Java Arrays

Quick results

Java array of arrays can induce an important overhead in memory (and also slightly increased access time). Each array object stores a 20 bytes header and objects are aligned to a word (32b or 64b) in memory.

For example a 10Mb arrays of 1 int (4 bytes) int[10*1024*1024][1] should have a size of 40Mb. It finally uses 280Mb !!!

20 + 10Mb * ( 20 + 4 + 4 ) ~= 280Mb 
(1)     (2)         (3)   (4)  (5)

header size of the "outer" array
number of elements in the "outer" array. The product result is the size of 1 element of the "outer" array
header size of each "inner" array
size of the stored int in each "inner" array. Java int are 4bytes = (32bits)
alignment of the int to 64 bits (=8o). Strangely headers are not aligned… any input on this point is welcome.

Introduction

In many scientific developments, we have to deal with grid or matrices of two or more dimensions. A simple and practical approach for storing such objects is multidimensional arrays (array of array of array of array ...). We will demonstrate that this can induce a huge overhead in memory consumption using basic Java arrays (new type[][][]...). Time accesses are not discussed here but are also slightly increased due to non contiguous memory storage...

This study has previously been done, but results are a little bit different http://www.javamex.com/tutorials/memory/array_memory_usage.shtml

In memory usage computation, this page is also interesting to read: http://www.javamex.com/tutorials/memory/object_memory_usage.shtml

Finally we propose a data object for storing multidimensional arrays efficiently.

Conventions and notations

1 byte = 1 octet = 8 bits
Mb = Mo
a word is architecture dependent (32bits or 64bits, respectively 4o or 8o)

Java array structure

A Java array is a special object with its header and content. A 2D array is an array of array which is not necessarily rectangular (rows can eventually have different sizes). Considering this, multidimensional array content may be dispatched in memory, increasing the time access. The present study does not deal with arbitrary array of arrays but only with regular ones where a dimension is constant among the others (2D: rectangular grids, 3D: parallelepiped grids, ...)

C/C++ array structure

C/C++ multidimensional arrays are stored as a contiguous one-dimensional array. The compiler converts multi-coordinates to a one coordinate pointer shift. Because C/C++ multidimensional arrays are contiguous in memory, time accesses are faster than Java which has to deal with more complex addresses manipulation.

Environment

The following results have been obtained on a core 2 duo laptop x86 64b, Fedora15, 4Gb RAM, default JVM 1.6.

> java -version
java version "1.6.0_25"
Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.0-b11, mixed mode)>cat /proc/version
Linux version 2.6.38.8-35.fc15.x86_64 (mockbuild@x86-09.phx2.fedoraproject.org) 
(gcc version 4.6.0 20110530 (Red Hat 4.6.0-9) (GCC) ) #1 SMP Wed Jul 6 13:58:54 
UTC 2011
>cat /proc/cpuinfo | grep "model name"
model name    : Intel(R) Core(TM)2 Duo CPU     T7500  @ 2.20GHz

Memory usage

Above study deals with 2D rectangular arrays of integers. One dimension is 10Mb (1024x1024 bytes) wide and the other dimension (X) is few octets varying from 1 to 9 bytes. The resulting array can be stored as int[10Mb][X] or int [X][10Mb], those are the two following cases.

Theoretical analysis

A Java integer (int) primitive object is stored on 4 bytes. The minimum array size is 4 x 10Mb x X bytes

X (bytes)	1	2	3	4	5	6	7	8	9
array size	40Mb	80Mb	120Mb	160Mb	200Mb	240Mb	280Mb	320Mb	360Mb

case 1: large x small array

in this case we have a large number of small arrays. int[][] array = new int[10*1024*1024][X]

X (bytes)	1	2	3	4	5	6	7	8	9
array size	280Mb	280Mb	360Mb	360Mb	440Mb	440Mb	520Mb	520Mb	600Mb

case 2: small x large array

in this case we have a small number of large arrays. int[][] array = new int[X][10*1024*1024]

X (bytes)	1	2	3	4	5	6	7	8	9
array size	40Mb	80Mb	120Mb	160Mb	200Mb	240Mb	280Mb	320Mb	360Mb

results analysis

As we can see there is a huge constant difference (240Mb) between the two approaches and a threshold phenomenon for odd/even values of X.

The constant difference is due to java array headers storing the memory location, object size and the number of objects (and maybe some other values...). This header is not imposed by the JVM specification and may vary depending on the concrete JVM implementation.
The threshold phenomenon is due to classical word alignment

Sharing arrays between Java and C++ using JNI

Extracted from android google page (http://android.git.kernel.org/?p=platform/dalvik.git;a=blob_plain;f=docs/jni-tips.html;hb=HEAD#FAQSharing)

FAQ: Sharing raw data with native code
You may find yourself in a situation where you need to access a large buffer of raw data from code written in Java and C/C++. Common examples include manipulation of bitmaps or sound samples. There are two basic approaches.
You can store the data in a byteMulti Dimensional Java Arrays. This allows very fast access from code written in Java. On the native side, however, you're not guaranteed to be able to access the data without having to copy it. In some implementations, GetByteArrayElements and GetPrimitiveArrayCritical will return actual pointers to the raw data in the managed heap, but in others it will allocate a buffer on the native heap and copy the data over.
The alternative is to store the data in a direct byte buffer. These can be created with java.nio.ByteBuffer.allocateDirect, or the JNI NewDirectByteBuffer function. Unlike regular byte buffers, the storage is not allocated on the managed heap, and can always be accessed directly from native code (get the address with GetDirectBufferAddress). Depending on how direct byte buffer access is implemented in the VM, accessing the data from code written in Java can be very slow.
The choice of which to use depends on two factors:
Will most of the data accesses happen from code written in Java or in C/C++? If the data is eventually being passed to a system API, what form must it be in? (For example, if the data is eventually passed to a function that takes a byteMulti Dimensional Java Arrays, doing processing in a direct ByteBuffer might be unwise.)
If there's no clear winner, use a direct byte buffer. Support for them is built directly into JNI, and access to them from code written in Java can be made faster with VM improvements.
This article study the overhead of java multidimensional arrays for uniform grids (i.e. rows, columns and any other dimension have the same size along the grid).

Source code

public static void main(String[] args) throws InterruptedException {
       System.out.println("Free: " + Runtime.getRuntime().freeMemory() / 1024 / 1024);
       System.out.println("Max: " + Runtime.getRuntime().maxMemory() / 1024 / 1024);
       System.out.println("Total: " + Runtime.getRuntime().totalMemory() / 1024 / 1024);
       System.out.println("Used: " + (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / 1024 / 1024);       int sizeArray = 1024*1024;
       int dim = 1;
       // allocate array
       float[][] pos = new float[sizeArray][dim];
       System.out.println("dim="+dim);
       System.out.println("Free: " + Runtime.getRuntime().freeMemory() / 1024 / 1024);
       System.out.println("Max: " + Runtime.getRuntime().maxMemory() / 1024 / 1024);
       System.out.println("Total: " + Runtime.getRuntime().totalMemory() / 1024 / 1024);
       System.out.println("Used: " + (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / 1024 / 1024);
    }

Reply to this thread

Multidimensional Java Arrays

posted by JeT at Jul 22, 2011 2:32 PM

Quote

This study has only been made using one JVM and one 64bits architecture. Any input considering other inputs is welcome.

Libresource

Navigation

Contextual Menu

Resource Menu

Multi Dimensional Java Arrays