Resource Menu


by JeT - (no comment)
Quick results

Java array of arrays can induce an important overhead in memory (and also slightly increased access time). Each array object stores a 20 bytes header and objects are aligned to a word (32b or 64b) in memory.

For example a 10Mb arrays of 1 int (4 bytes) int[10*1024*1024][1] should have a size of 40Mb. It finally uses 280Mb !!!

20 + 10Mb * ( 20 + 4 + 4 ) ~= 280Mb 
(1)     (2)         (3)   (4)  (5)
  1. header size of the "outer" array
  2. number of elements in the "outer" array. The product result is the size of 1 element of the "outer" array
  3. header size of each "inner" array
  4. size of the stored int in each "inner" array. Java int are 4bytes = (32bits)
  5. alignment of the int to 64 bits (=8o). Strangely headers are not aligned… any input on this point is welcome.


Introduction

In many scientific developments, we have to deal with grid or matrices of two or more dimensions. A simple and practical approach for storing such objects is multidimensional arrays (array of array of array of array ...). We will demonstrate that this can induce a huge overhead in memory consumption using basic Java arrays (new type[][][]...). Time accesses are not discussed here but are also slightly increased due to non contiguous memory storage...

This study has previously been done, but results are a little bit different http://www.javamex.com/tutorials/memory/array_memory_usage.shtml

In memory usage computation, this page is also interesting to read: http://www.javamex.com/tutorials/memory/object_memory_usage.shtml

Finally we propose a data object for storing multidimensional arrays efficiently.

Conventions and notations

  • 1 byte = 1 octet = 8 bits
  • Mb = Mo
  • a word is architecture dependent (32bits or 64bits, respectively 4o or 8o)

Java array structure

A Java array is a special object with its header and content. A 2D array is an array of array which is not necessarily rectangular (rows can eventually have different sizes). Considering this, multidimensional array content may be dispatched in memory, increasing the time access. The present study does not deal with arbitrary array of arrays but only with regular ones where a dimension is constant among the others (2D: rectangular grids, 3D: parallelepiped grids, ...)

C/C++ array structure

C/C++ multidimensional arrays are stored as a contiguous one-dimensional array. The compiler converts multi-coordinates to a one coordinate pointer shift. Because C/C++ multidimensional arrays are contiguous in memory, time accesses are faster than Java which has to deal with more complex addresses manipulation.

Environment

The following results have been obtained on a core 2 duo laptop x86 64b, Fedora15, 4Gb RAM, default JVM 1.6.
> java -version
java version "1.6.0_25"
Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.0-b11, mixed mode)

>cat /proc/version Linux version 2.6.38.8-35.fc15.x86_64 (mockbuild@x86-09.phx2.fedoraproject.org) (gcc version 4.6.0 20110530 (Red Hat 4.6.0-9) (GCC) ) #1 SMP Wed Jul 6 13:58:54 UTC 2011

>cat /proc/cpuinfo | grep "model name" model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz

Memory usage

Above study deals with 2D rectangular arrays of integers. One dimension is 10Mb (1024x1024 bytes) wide and the other dimension (X) is few octets varying from 1 to 9 bytes. The resulting array can be stored as int[10Mb][X] or int [X][10Mb], those are the two following cases.

Theoretical analysis

A Java integer (int) primitive object is stored on 4 bytes. The minimum array size is 4 x 10Mb x X bytes

X (bytes)123456789
array size40Mb80Mb120Mb160Mb200Mb240Mb280Mb320Mb360Mb

case 1: large x small array

in this case we have a large number of small arrays. int[][] array = new int[10*1024*1024][X]

X (bytes)123456789
array size280Mb280Mb360Mb360Mb440Mb440Mb520Mb520Mb600Mb

case 2: small x large array

in this case we have a small number of large arrays. int[][] array = new int[X][10*1024*1024]

X (bytes)123456789
array size40Mb80Mb120Mb160Mb200Mb240Mb280Mb320Mb360Mb

results analysis

As we can see there is a huge constant difference (240Mb) between the two approaches and a threshold phenomenon for odd/even values of X.

    • The constant difference is due to java array headers storing the memory location, object size and the number of objects (and maybe some other values...). This header is not imposed by the JVM specification and may vary depending on the concrete JVM implementation.
    • The threshold phenomenon is due to classical word alignment

Sharing arrays between Java and C++ using JNI

Extracted from android google page (http://android.git.kernel.org/?p=platform/dalvik.git;a=blob_plain;f=docs/jni-tips.html;hb=HEAD#FAQSharing)
FAQ: Sharing raw data with native code

You may find yourself in a situation where you need to access a large buffer of raw data from code written in Java and C/C++. Common examples include manipulation of bitmaps or sound samples. There are two basic approaches.

You can store the data in a byteMulti Dimensional Java Arrays. This allows very fast access from code written in Java. On the native side, however, you're not guaranteed to be able to access the data without having to copy it. In some implementations, GetByteArrayElements and GetPrimitiveArrayCritical will return actual pointers to the raw data in the managed heap, but in others it will allocate a buffer on the native heap and copy the data over.

The alternative is to store the data in a direct byte buffer. These can be created with java.nio.ByteBuffer.allocateDirect, or the JNI NewDirectByteBuffer function. Unlike regular byte buffers, the storage is not allocated on the managed heap, and can always be accessed directly from native code (get the address with GetDirectBufferAddress). Depending on how direct byte buffer access is implemented in the VM, accessing the data from code written in Java can be very slow.

The choice of which to use depends on two factors:

Will most of the data accesses happen from code written in Java or in C/C++? If the data is eventually being passed to a system API, what form must it be in? (For example, if the data is eventually passed to a function that takes a byteMulti Dimensional Java Arrays, doing processing in a direct ByteBuffer might be unwise.)

If there's no clear winner, use a direct byte buffer. Support for them is built directly into JNI, and access to them from code written in Java can be made faster with VM improvements.

This article study the overhead of java multidimensional arrays for uniform grids (i.e. rows, columns and any other dimension have the same size along the grid).


Source code

public static void main(String[] args) throws InterruptedException {
       System.out.println("Free: " + Runtime.getRuntime().freeMemory() / 1024 / 1024);
       System.out.println("Max: " + Runtime.getRuntime().maxMemory() / 1024 / 1024);
       System.out.println("Total: " + Runtime.getRuntime().totalMemory() / 1024 / 1024);
       System.out.println("Used: " + (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / 1024 / 1024);

int sizeArray = 1024*1024; int dim = 1;

// allocate array float[][] pos = new float[sizeArray][dim];

System.out.println("dim="+dim); System.out.println("Free: " + Runtime.getRuntime().freeMemory() / 1024 / 1024); System.out.println("Max: " + Runtime.getRuntime().maxMemory() / 1024 / 1024); System.out.println("Total: " + Runtime.getRuntime().totalMemory() / 1024 / 1024); System.out.println("Used: " + (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / 1024 / 1024); }



posted by JeT at Jul 22, 2011 2:32 PM
Quote
This study has only been made using one JVM and one 64bits architecture. Any input considering other inputs is welcome.



Last edited by null at - Edit content - View history - View source