Monday, 22 February 2016

Memory leak problem in substring method of String

Another very popular interview question.

The buildup would start by asking basic question like what are immutable classes, are they thread safe, Give an example of an immutable class from java api.

Then the interviewer shall ask about the flyweight pattern that String uses(more on this in next blogs), how many objects are created sort of questions. Then he asks about the substring method followed by the topic under discussion Memory leak problem in substring method of String.

Memory leak problem in String in Java 6

Now before understanding this we need to understand what a String is. In Java we must remember String is nothing but a class( and NOT A PRIMITIVE TYPE).

A String class is backed by a char array.

String class two other  attributes apart from char[]: offset and count.

They are used to store the characters first index and the number of characters in String.

For example for the String s = "TEST" we shall have the String object in below state:

char[] = {T','E','S','T'};

offset = 0 and count = 4.

In the heap(inside String Literal pool, more on this in later blogs) an object created which reference s shall point to. This object refers to the char [] Test.

Now lets see what happens when a call to substring method of String is made.

When the substring method is called then String internally calls the constructor of String which accepts the original char [], begin Index and end
index.
And in the Constructor a new String object.
But the main thing to note here is that it refers to the same array, only the offset and count changes.Something like the below code snippet.

String(int offset, int count, char value[]){
      this.value = value;
      this.offset= offset;
      this.count = count;
}

public String substring(int beginIndex, int endIndex){
     return new String(offset + beginIndex, endIndex-beginIndex,value);
}

For example if we call substring(0,3) on "TEST" then the new String object would be in below state:

char[] = {'T','E','S','T'}
offset = 0
count = 3

First in a very brief way we need to understand what memory leak means and about garbage collection mechain java(more on this in later blogs).

In Java we do not need to do anything for garbage collection, the JVM shall do this for us. JVM runs a demon thread for garbage collection, which periodically scans for the object which does not have any reference. It marks them for garbage collection.

Memory leak (more on this in later blogs) is a problem when due to some reason GC is unable to mark a object for garbage collection although it is functionally no longer required.

Now coming back to our problem. In the example we had taken the String TEST which is very negligible for any sort of problem to occur. Now lets us assume we had taken a huge String of size 100MB(which may be rare) and we call a substring(0,2) on it.

Therfore functionally we are using only a very very small part of it.

But since in Java 6 we are pointing to the same 100MB String, the huge 100MB String is blocked for garbage collection(although we need only a small part of it). Thus a memory leak shall occur.

No comments:

Post a Comment