17

Does anyone know how Java implements its hash tables (HashSet or HashMap)? Given the various types of objects that one may want to put in a hash table, it seems very difficult to come up with a hash function that would work well for all cases.

2
  • 13
    If only there was some way of looking at the Java source code, goddamit! Oct 29, 2009 at 23:59
  • 3
    Not everyone realises it's available, unfortunately. That's why this site is here. If you have the link, post it (as JG did below) Oct 30, 2009 at 0:09

5 Answers 5

22

HashMap and HashSet are very similar. In fact, the second contains an instance of the first.

A HashMap contains an array of buckets in order to contain its entries. Array size is always powers of 2. If you don't specify another value, initially there are 16 buckets.

When you put an entry (key and value) in it, it decides the bucket where the entry will be inserted calculating it from its key's hashcode (hashcode is not its memory address, and the the hash is not a modulus). Different entries can collide in the same bucket, so they'll be put in a list.

Entries will be inserted until they reach the load factor. This factor is 0.75 by default, and is not recommended to change it if you are not very sure of what you're doing. 0.75 as load factor means that a HashMap of 16 buckets can only contain 12 entries (16*0.75). Then, an array of buckets will be created, doubling the size of the previous. All entries will be put again in the new array. This process is known as rehashing, and can be expensive.

Therefore, a best practice, if you know how many entries will be inserted, is to construct a HashMap specifying its final size:

new HashMap(finalSize);
2
  • 1
    Technically depends upon implementation. It used to be popular to attempt a prime size of table. (Also, IIRC, the constructor to HashMap takes the capacity, so you should divide by the load factor and round up, but I think specifying the capacity is almost always more effort than it is worth.) Oct 30, 2009 at 5:17
  • 1
    Also, when there are more than one elements in the same bucket, Java then uses the equals() method to determine if the object in question in present in the HashTable. This is the reason why its recommended to override equals() and hashcode() together.
    – Nemin
    Feb 19, 2014 at 2:13
8

You can check the source of HashMap, for example.

7

Java depends on each class' implementation of the hashCode() method to distribute the objects evenly. Obviously, a bad hashCode() method will result in performance problems for large hash tables. If a class does not provide a hashCode() method, the default in the current implementation is to return some function (i.e. a hash) of the the object's address in memory. Quoting from the API doc:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

6
  • hashcode() can return totally different ranges of values (eg 1 to 100 vs 1 to 100,000). So how does Java know how many buckets it needs to create for a hash table?
    – escalon
    Oct 30, 2009 at 0:02
  • @escalon: hashcode() returns an int (which always has range -2^31 to 2^31-1). You modulo the hash code by the current number of buckets for the table. This is how these hash tables usually work. The number of buckets can change as the hash table grows.
    – newacct
    Oct 30, 2009 at 0:07
  • escalon, your original question was how java implements hashcode, while your comment here is more on the theory of hash table design. You may want to check out the wikipedia articles on hashing.
    – Jherico
    Oct 30, 2009 at 0:24
  • If you try Object.hashCode you should see that it clearly does not return the memory address. (On reasonable implementations, the value is stored in the object header. Sun's implementation initialises the value on first use, using a slight rehash of the memory address at the time of initialisation.) Oct 30, 2009 at 5:05
  • @tackline You are of course correct. I've edited the answer. Oct 30, 2009 at 15:48
2

There are two general ways to implement a HashMap. The difference is how one deals with collisions.

The first method, which is the one Java users, makes every bucket in a the HashMap contain a singly linked list. To accomplish this, each bucket contains an Entry type, which caches the hashCode, has a pointer to the key, pointer to the value, and a pointer to the next entry. When a collision occurs in Java, another entry is added to the list.

The other method for handling collisions, is to simply put the item into the next empty bucket. The advantage of this method is it requires less space, however, it complicates removals, as if the bucket following the removed item is not empty, one has to check to see if that item is in the right or wrong bucket, and shift the item if it originally has collided with the item being removed.

1
  • 1
    IdentityHashMap and the hash map used in ThreadLocal use the probing algorithm (in Sun's JRE). Oct 30, 2009 at 5:07
1

In my own words:

An Entry object is created to hold the reference of the Key and Value.

The HashMap has an array of Entry's.

The index for the given entry is the hash returned by key.hashCode()

If there is a collision ( two keys gave the same index ) , the entry is stored in the .next attribute of the existing entry.

That's how two objects with the same hash could be stored into the collection.

From this answer we get:

   public V get(Object key) {
       if (key == null)
           return getForNullKey();
       int hash = hash(key.hashCode());
       for (Entry<K,V> e = table[indexFor(hash, table.length)];
            e != null;
            e = e.next) {
           Object k;
           if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
               return e.value;
       }
       return null;
   }

Let me know if I got something wrong.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.