Thursday, 25 February 2016

Can Hash map be used in a multithreaded environment ?

Another very popular interview question.

Well the answer to this is NO.

The basic reason for this is that since hashmap being non-synchronized would not guarantee us atomicity and fail in put if absent type scenarios.

But apart from this is another major problem. The problem of hashmap getting stuck in infinite loop.

To understand this we need to brush up a bit on how hash map works in java (click here).

Now since we all must have understood by now that hash map is backed by an array and hash map is ideally supposed to maintain a get and put of complexity o(1).

Even in the most ideal scenario this can happen only when the Entry array has only 1 Entry object per location or in other words the number of Entry objects must be equal to the array length(16 initially).

Now since we all must have experienced that hash map can(infact mostly does) store more than 16 values.

So does it mean that after adding 16 Entry objects it fails to maintain the complexity of o(1) ?

Well not really.

Hash Map internally uses the logic that when the number of Entry Objects exceeds a certain value it recreats the array(twice the size), copies the existing elements from previous map to new map(by a concept called re-hashing).

Now that certain valueis know as load factor.

Now let us take a scenario.

There are two threads T1 and T2 which are trying to put a key in the map when it has reached its load factor.

The current state of the hash map is


In the above image we have assumed that 1, 2 and 3 have the same hash keys so they are in the same array location. Also we have assumed that there are 13 other Entry objects (not shown in figure.

Now let us see what happens when these thread try to alter the Map.

When Thread T1 access the map it creates a new array and starts to copy all the component in the new array. and inserts the first Entry(1,Test1) and next point to Entry(2,Test2) whose next is Entry(3,Test3).

Now the current structure of the hash bucket (singly linked list) at the array location (given in the above figure) is

Entry(1,Test1) next is Entry(2,Test2) for which next is Entry(3,Test3).

Now let as assume at the same time another thread T2 tries to write to map. Since the put method is not thread safe T2 would be allowed access to it.

When T2 would be accessing put method, again upon finding that the map is full would make a new array and start copying to that.

To copy the above hashbucket(linked list) the Thread to avoid transversing the list again and again inserts it in a reverse order that is the new state of array is:
Now lets come back to Thread T1.

Now it tries to insert Entry(2,Test2)(next for which should be Entry(3,Test3) but is Entry(1,Test1), as changed by T2. Now it will keep on looping between Entry(2,Test2) and  Entry(1,Test1) infinitely.

Monday, 22 February 2016

Memory leak problem in substring method of String

Another very popular interview question.

The buildup would start by asking basic question like what are immutable classes, are they thread safe, Give an example of an immutable class from java api.

Then the interviewer shall ask about the flyweight pattern that String uses(more on this in next blogs), how many objects are created sort of questions. Then he asks about the substring method followed by the topic under discussion Memory leak problem in substring method of String.

Memory leak problem in String in Java 6

Now before understanding this we need to understand what a String is. In Java we must remember String is nothing but a class( and NOT A PRIMITIVE TYPE).

A String class is backed by a char array.

String class two other  attributes apart from char[]: offset and count.

They are used to store the characters first index and the number of characters in String.

For example for the String s = "TEST" we shall have the String object in below state:

char[] = {T','E','S','T'};

offset = 0 and count = 4.

In the heap(inside String Literal pool, more on this in later blogs) an object created which reference s shall point to. This object refers to the char [] Test.

Now lets see what happens when a call to substring method of String is made.

When the substring method is called then String internally calls the constructor of String which accepts the original char [], begin Index and end
index.
And in the Constructor a new String object.
But the main thing to note here is that it refers to the same array, only the offset and count changes.Something like the below code snippet.

String(int offset, int count, char value[]){
      this.value = value;
      this.offset= offset;
      this.count = count;
}

public String substring(int beginIndex, int endIndex){
     return new String(offset + beginIndex, endIndex-beginIndex,value);
}

For example if we call substring(0,3) on "TEST" then the new String object would be in below state:

char[] = {'T','E','S','T'}
offset = 0
count = 3

First in a very brief way we need to understand what memory leak means and about garbage collection mechain java(more on this in later blogs).

In Java we do not need to do anything for garbage collection, the JVM shall do this for us. JVM runs a demon thread for garbage collection, which periodically scans for the object which does not have any reference. It marks them for garbage collection.

Memory leak (more on this in later blogs) is a problem when due to some reason GC is unable to mark a object for garbage collection although it is functionally no longer required.

Now coming back to our problem. In the example we had taken the String TEST which is very negligible for any sort of problem to occur. Now lets us assume we had taken a huge String of size 100MB(which may be rare) and we call a substring(0,2) on it.

Therfore functionally we are using only a very very small part of it.

But since in Java 6 we are pointing to the same 100MB String, the huge 100MB String is blocked for garbage collection(although we need only a small part of it). Thus a memory leak shall occur.

Sunday, 21 February 2016

How Hashmap works in java ?

Hash map is one of the most important API to master if one is preparing for a java interview.
The interviewer expects the person to know each and every aspects of it.
Some questions can be:
1) Explain the get and put method of hash map or hash map works in java ?
2) What are the properties of the key you use for the hash map ?
3) Can hash map be used in a multithreaded environment ?
4) What is a concurrent hash map
5) Changes made to java 8 to hash map?
6) Complexity of get and put in hash map?
7) when is the complexity of get and put operation o(1) in java ?

In this section we shall only discuss the working of the hash map and discuss the property of the key to be used in hash map.

How hash map works in java ?

The built up starts with the interviewer asking have you worked on collections. To which we all proudly say yes. Then a follow up questions on Array List, Linked List, then the main question on how hashmap works.

Now to answer simply Hash Map works on the principle of hashing.

We need to realize hash map is nothing but a datastucture which can maintain key and values.

Hash map internally has a Entry class whose structure is something like given below:

class Entry<K,V>{
K Key;
V Value;
Entry next;
}

Here the next represents the next element in the hash bucket(more on this later).

This Entry object maintains the key and value. For example if we make a map like one below:

HashMap<String. String> map = new HashMap<String,String>();
map.put("first","example");

Now internally these key and value gets set in the Key and Value attribute of the Entry class.

So remember for now that hash map maintains an Entry Object for every key, value pair we enter.

Now since we add more than 1 key value pair hash map needs a Data Strucutre to store the Entry object for each pair.

Hashmap fulfills this by keeping an array.

So hash map class internally also has an Entry[], where it stores the Entry Objects.

Now the question is how does the Entry Object fill in the array ? That is how does it decide which location of the array does it needs to fill the Entry object ?

The answer to this actually gives the hash map its name.

For every key entered the hash map calculates its hashcode and using this hashcode it decides the position where it needs to store the Entry object.

Therefore to explain more clearly, whenever we call the put method of hash map, the hash map internally passes the key to its hash mehtod which takes in key.hashcode() as an argument. HashMap recalculates the hash value to prevent anyone from entering a very high or a very low key.

Now the value obtained from hash method is then used to calculate the array location using the & operator along with the initial capacity.

So,

KEY - > Hash  -> & with initial capacity -> array location

Then the entry is populated in this array location.

Before going any further let us disccuss a boundary case.

what if someone were to enter a null key in the hash map.

Wont it break and throw a NPE at key.hashCode() ?

Well actually HashMap has special provisions for null keys, whenever it encounters null value HashMap stores the entry for this in the 0th location of the array.
Note that HashMap can have only one entry key.
Now a problem comes when two objects have same hash key.
We need to understand that now since both keys have the same hash code the array location obtained for them would be same.
Now we shall come back to the structure of Entry class we were discussing earlier. Entry class is a single Linked List(uptill Java 7 more on this in the next blogs).

class Entry<K,V>{
K Key;
V Value;
Entry next;
}

Therefore whenever a key with the same hash is added it is added in the same array location in the linked List.

How does get method works ?
An object is retrieved from hash map in way very similar to the way its stored.
Now since we know the key value pair is stored in the Entry array the location for which is determined by the hash code of the key, so the first thing the hash map needs to do while retrieving is determining the hash code of the key object.
So whenever we call the get method internally hash map determines the hash code of the key object and determines the array location. In a simple case scenario(when there is only one object in the array, it retrieves it and the operation is o(1) (more on this in the later blogs).

The problem comes when there are more than one object in the array.

Hash Map then has to traverse the Linked List and call equals method on each object to determine the correct one. The complexity is now o(n) (more on this later)

What are the properties of the key you use for the hash map ?
To make an object as the key of hash map we need to refer to its working.
Since hash map stores the object based on its hash code and retrieves it using the equals method, The first thing that we should do is overide the equals and hash method based on the equals and hash code contract which states:

1) if two objects are equal then there hash code is always same.
2) if two objects hash code are same then they may or may not be equal.

Secondly since equal and hashcode are made using the state of an object they must ideally be immutable. As if we add mutable objects as key then there is a chance that they might never be retrieved.