Before going into internal working of HashSet in Java it is important to know two points about HashSet.
- HashSet in Java only stores unique values i.e. no duplicates are allowed.
- HashSet works on the concept of hashing just like HashMap in Java but how it differs from the HashMap is; in HashMap a (Key, Value) pair is added and the hash function is calculated using key. Where as in the HashSet hash function is calculated using the value itself. Note that in HashSet we have add(E e) method which takes just the element to be added as parameter.
Also you may have guessed by now, since hash function is calculated using value that is why only unique values are stored in the HashSet. If you try to store the same element again, the calculated hash function would be same, thus the element will be overwritten.
Now coming back to internal implementation of HashSet in Java, which is also an important Java Collections interview question, the most important point is HashSet implementation uses HashMap to store it's objects. Within the HashSet there are many constructors one without any parameter and several more with initial capacity or load factor but each one of these constructor creates a HashMap. If you know how HashMap works internally in Java it is easy to understand how HashSet works internally in Java.
HashSet Constructor snippets
In the HashSet class in Java you can see that constructors of the class do create a HashMap.
/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
* default initial capacity (16) and load factor (0.75).
*/
public HashSet() {
map = new HashMap<>();
}
public HashSet(int initialCapacity, float loadFactor) {
map = new HashMap<>(initialCapacity, loadFactor);
}
And the map, which is used for storing values, is defined as
private transient HashMap<E,Object> map;
In the constructor, if you have noticed, there are parameters named initial capacity and load factor. For HashSet, default initial capacity is 16, that is an array (or bucket) of length 16 would be created and default load factor is 0.75. Where load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.
How elements are added in HashSet
I stated in the point 2 above that HashSet calculates the hash function using value itself and there is no (Key, Value) pair in HashSet and then came the statement that HashSet internally uses HashMap to store objects. These two statements may sound contradictory as HashMap stores (key, value) pair so let's see how these these two contradictory statements hold true.
Actually from add method of HashSet class put() method of HashMap is called where the value, which has to be added in the Set, becomes Key and a constant object "PRESENT" is used as value.
That's how PRESENT is defined in HashSet implementation -
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
And that's how add method is implemented in HashSet class -
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
So you can see with in the internal implementation of the HashSet it's a (key, value) pair which is actually getting added. It's just that the actual value (which is added to the HashSet) becomes the key and a dummy value "PRESENT" is added as value when storing it in the backing HashMap.
One thing to note here is, in HashMap value may be duplicate but Key should be unique. That's how HashSet makes sure that only unique values are stored in it, since the value which is to be stored in the HashSet becomes the key while storing it in HashMap.
How element is removed from a HashSet in Java
When we need to remove an element from the HashSet, internally again remove method of HashSet calls remove(Object key) method of the HashMap.
That is how it is implemented in HashSet class.
public boolean remove(Object o) {
return map.remove(o)==PRESENT;
}
Here note that remove(Object key) method of the HashMap returns the Value associated with the key. Whereas the remove(Object o) method of the HashSet returns Boolean value. Also we know that for every value added in HashSet that value becomes Key and the value is always an object called PRESENT. Therefore the value that is returned from the remove(Object key) method of the HashMap is always PRESENT thus the condition map.remove(o)==PRESENT.
How elements are retrieved from HashSet in Java
In HashSet there is no get method as provided in Map or List. In HashSet iterator is there which will iterate through the values of the Set. Internally it will call the keyset of the HashMap, as values are stored as keys in the HashMap so what we'll get is the values stored in the HashSet.
That's how iterator is internally implemented in the HashSet in Java.
/**
* Returns an iterator over the elements in this set. The elements
* are returned in no particular order.
*
* @return an Iterator over the elements in this set
* @see ConcurrentModificationException
*/
public Iterator<E> iterator() {
return map.keySet().iterator();
}
Points to note
- Unlike HashMap where hash function is calculated using key HashSet uses the value itself to calculate the hash function.
- Since hash function is calculated using value that is why only unique values are stored in the HashSet.
- HashSet internally uses HashMap to store its elements.
- When element is added to HashSet using add(E e) method internally HashSet calls put() method of the HashMap where the value passed in the add method becomes key in the put() method. A dummy value “PRESENT” is passed as value in the put() method.
That's all for this topic How HashSet Works Internally in Java. If you have any doubt or any suggestions to make please drop a comment. Thanks!
Related Topics
You may also like -