Resource Menu


by JeT - (no comment)

Introduction

In many scientific developments, we have to deal with grid or matrices of two or more dimension. A simple and practical approach for storing such objects is multidimensional arrays (array of array of array of array ...). We will demonstrate that this can induce a huge overhead in time access and memory consumption using basic Java arrays (new type[][][]...).

This study has previously been done, but results are a little different http://www.javamex.com/tutorials/memory/array_memory_usage.shtml

In memory usage computation, this page is also interesting to read: http://www.javamex.com/tutorials/memory/object_memory_usage.shtml

Finally we propose a data object for storing multidimensional arrays efficiently.

Java array structure

A Java array is a special object with its header and content. A 2D array is an array of array which is not necessarily rectangular (rows can eventually have different sizes). Considering this, multidimensional array content may be dispatched in memory, increasing the time access.

C/C++ array structure

C/C++ multidimensional arrays are stored as a contiguous one dimensional array. The compiler converts multi-coordinates to a one coordinate pointer shift. Despite Java intrinsic assumption, pointer manipulation can also increase time access.

Environment

The following results have been obtain

Memory usage

Above study deals with 2D rectangular arrays of integers. One dimension is 10Mb (1024x1024 bytes) wide and the other dimension (X) is few octets varying from 1 to 9 bytes. The resulting array can be stored as int[10Mb][X] or int [X][10Mb], those are the two following cases.

theoretical analysis

A Java integer (int) primitive object is stored on 4 bytes. The minimum array size is 4 x 10Mb x X bytes

X (bytes)123456789
array size40Mb80Mb120Mb160Mb200Mb240Mb280Mb320Mb360Mb

case 1: large x small array

in this case we have a large number of small arrays. int[][] array = new int[1024*1024][X]

X (bytes)123456789
array size280Mb280Mb360Mb360Mb440Mb440Mb520Mb520Mb600Mb

case 2: small x large array

in this case we have a small number of large arrays. int[][] array = new int[1024*1024][X]

X (bytes)123456789
array size40Mb80Mb120Mb160Mb200Mb240Mb280Mb320Mb360Mb

result analysis

As we can see

Sharing arrays between Java and C++ using JNI

Extracted from android google page (http://android.git.kernel.org/?p=platform/dalvik.git;a=blob_plain;f=docs/jni-tips.html;hb=HEAD#FAQSharing)
FAQ: Sharing raw data with native code

You may find yourself in a situation where you need to access a large buffer of raw data from code written in Java and C/C++. Common examples include manipulation of bitmaps or sound samples. There are two basic approaches.

You can store the data in a byteMulti Dimensional Java Arrays. This allows very fast access from code written in Java. On the native side, however, you're not guaranteed to be able to access the data without having to copy it. In some implementations, GetByteArrayElements and GetPrimitiveArrayCritical will return actual pointers to the raw data in the managed heap, but in others it will allocate a buffer on the native heap and copy the data over.

The alternative is to store the data in a direct byte buffer. These can be created with java.nio.ByteBuffer.allocateDirect, or the JNI NewDirectByteBuffer function. Unlike regular byte buffers, the storage is not allocated on the managed heap, and can always be accessed directly from native code (get the address with GetDirectBufferAddress). Depending on how direct byte buffer access is implemented in the VM, accessing the data from code written in Java can be very slow.

The choice of which to use depends on two factors:

Will most of the data accesses happen from code written in Java or in C/C++? If the data is eventually being passed to a system API, what form must it be in? (For example, if the data is eventually passed to a function that takes a byteMulti Dimensional Java Arrays, doing processing in a direct ByteBuffer might be unwise.)

If there's no clear winner, use a direct byte buffer. Support for them is built directly into JNI, and access to them from code written in Java can be made faster with VM improvements.

This article study the overhead of java multidimensional arrays for uniform grids (i.e. rows, columns and any other dimension have the same size along the grid).


Source code

public static void main(String[] args) throws InterruptedException {
       System.out.println("Free: " + Runtime.getRuntime().freeMemory() / 1024 / 1024);
       System.out.println("Max: " + Runtime.getRuntime().maxMemory() / 1024 / 1024);
       System.out.println("Total: " + Runtime.getRuntime().totalMemory() / 1024 / 1024);
       System.out.println("Used: " + (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / 1024 / 1024);

int sizeArray = 1024*1024; int dim = 1;

// allocate array float[][] pos = new float[sizeArray][dim];

System.out.println("dim="+dim); System.out.println("Free: " + Runtime.getRuntime().freeMemory() / 1024 / 1024); System.out.println("Max: " + Runtime.getRuntime().maxMemory() / 1024 / 1024); System.out.println("Total: " + Runtime.getRuntime().totalMemory() / 1024 / 1024); System.out.println("Used: " + (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / 1024 / 1024); }



posted by JeT at Jul 22, 2011 2:32 PM
Quote
This study has only been made using one JVM and one 64bits architecture. Any input considering other inputs is welcome.



Last edited by null at - Edit content - View history - View source