June 17, 2022

What iteration order can you expect from a Java HashMap?

Summary:

While the iteration order of a Java HashMap is undefined (that is, there are no guarantees as to the order), one should not construe “undefined” to mean “random”. Depending on how the hashCode() of the key is implemented, the iteration order can turn out to be quite predictable and thus quite biased.

In Java, one of the most commonly-used data structures is the HashMap. Unlike LinkedHashMap (insertion-order) and TreeMap (key-based order), HashMap does not provide any guarantees of ordering when iterating over the map, e.g. when using a for-each loop, or calling forEach().

Instead, the iteration order of a HashMap depends on internal implementation details. In general, this iteration order will not be random, and may instead be quite predictable. That is, not having a guaranteed order does not mean having a random order. Instead the iteration order will be defined by the current state of the HashMap and how the key’s hashCode() method is defined.

To understand, let’s look at how HashMap is designed.

This information is based off of the source code of HashMap.java in OpenJDK 18 and its documentation. It probably applies for earlier versions of Java as well.

`HashMap` structure

Note that HashSet is also affected by everything discussed here, since internally, HashSet just delegates to HashMap for its implementation.

Java’s HashMap is based off of a hash table that uses separate chaining to resolve collisions. That means when two keys map to the same bucket (a collision), the entries are stored in that bucket using a linked list data structure. If the number of entries mapping to a single bucket gets too large (i.e. the linked list is too long), the linked list is converted into a binary search tree (BST) that uses the hash code of the key for ordering within the tree. (A red-black tree is used)

Let’s see what that looks like:

How the key-value pairs are mapped to buckets in a HashMap — How the key-value pairs are mapped to buckets in the HashMap

The hash table is implemented as an array, which we’ll call the backing array. Each element in the array represents one bucket in the hash table, and stores zero or more entries. This backing array always has a size that is a power of two - why this is will be clear soon. (The size of the backing array is called the capacity)

When you call HashMap.put(key, value), the following happens: (Some details left out for clarity)

Compute the hash of the entry by calling key.hashCode() to get a 32-bit integer.
Determine which bucket (array index) the entry should go in by calculating hashCode % numBuckets.
- When the number of buckets is a power of two, this reduces to (numBuckets - 1) & hashCode, which is a simple bitwise operation - in this case a bit masking.
- For example, if numBuckets = 128 this reduces to 127 & hashCode, which simply looking at the last seven bits of hashCode.
If the number of entries in the hash table gets too large, a new backing array of twice the current size will be allocated, and all of the entries will be re-hashed/copied into it.

These important results follow from the above details:

The hash table will perform best when the hash code values are uniformly distributed.
- This will result in the entries being uniformly distributed across the buckets and reduce the likelihood of collisions, i.e. many entries going to the same bucket.
- Collisions degrade the performance because now a linked list needs to be traversed for get() operations. (If there are too many entries, the linked list will be transformed into a BST, which helps mitigate this slow down)
This depends on the hashCode() method of the key class being implemented in such a way as to distribute the keys uniformly across all the buckets.

Iteration Order

Once you have put entries into a HashMap, the normal iterators simply iterate over the backing array, going from one bucket to another. If a bucket has multiple entries, either in a linked list or a BST, those entries are visited first before any entries in subsequent buckets.

This means that the iteration order can be predictable, especially if the hashCode() method of the key is trivial. Note that predictable in this context does not mean guaranteed (as in an API contract), since this iteration order is heavily dependent on the HashMap’s internal implementation details, which could change in between different versions of Java.

Because the iteration order proceeds sequentially over the buckets of the hash table and the hash code of the key determines which bucket a key-value pair goes into, if the hashCode() method of a key is straightforward it can produce a very predictable iteration order over the keys in a HashMap.

Some pitfalls of common key types

`Integer` or `Long` as key

When using an Integer as a key the hashCode() (a 32-bit value) is simply the value of the Integer itself. This makes the iteration order very predictable, and also may result in the hash table not being filled uniformly, if the integer values themselves are not uniformly distributed.

For example, if we put the keys 4, 0, 3, 1, and 2 into a HashMap with a capacity of at least five, no matter what the insertion order was, the iteration order will always be: [0, 1, 2, 3, 4].

This is perhaps best illustrated when we generate a shuffled list of integers, and then add them to a HashMap as keys and observe the iteration order: (Using jshell)

// Create a list of integers [0, 99].
jshell> var ints = IntStream.range(0, 100).boxed().collect(Collectors.toList())
ints ==> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 ... 3, 94, 95, 96, 97, 98, 99]
|  created variable ints : List<Integer>

// Randomly permute (shuffle) the order.
jshell> Collections.shuffle(ints)

// The list of integers [0, 99] is now shuffled into a random order.
jshell> ints
ints ==> [74, 24, 87, 78, 67, 89, 66, 13, 44, 84, 85, 9, 8, 62, 22, 75, 90, 53, 38, 72, 11, 69, 68, 4, 12, 18, 32, 52, 56, 35, 98, 20, 59, 55, 1, 64, 50, 10, 60, 45, 86, 41, 81, 2, 96, 82, 54, 40, 77, 48, 76, 14, 42, 99, 28, 65, 27, 39, 6, 19, 25, 58, 26, 70, 92, 61, 57, 15, 80, 51, 30, 37, 7, 94, 91, 0, 23, 3, 47, 29, 43, 83, 88, 5, 95, 71, 17, 16, 31, 73, 79, 49, 36, 97, 93, 46, 63, 34, 33, 21]
|  value of ints : List<Integer>

jshell> var m = new HashMap<Integer, Integer>()
m ==> {}
|  created variable m : HashMap<Integer,Integer>

// Add each of the integers above into a HashMap as the key (and value), in the random order from above:
jshell> ints.forEach(i -> m.put(i, i))

jshell> m.size()
$17 ==> 100
|  created scratch variable $17 : int

// The iteration order of the HashMap has now re-sorted the integers into ascending order.
jshell> m.forEach((k, v) -> System.out.print(k + " "))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

Here’s an explanation of the above behaviour:

We generate the numbers 0 to 99 inclusive into a list, and then randomly shuffle this list.
Each of these numbers is then added to a HashMap as the key.
The HashMap will be automatically resized so that its final capacity (number of buckets) is at least 128.
- The capacity will actually be 256, since the default load factor is 0.75, and 100/128 exceeds this, so the number of buckets would have to be at least 256 to keep the size/capacity ratio below this load factor.
With this number of buckets, and knowing that Integer.hashCode() is simply the value of the integer itself, it follows that each integer key will simply be mapped to that same bucket index. For example, 0 will go to the zeroth bucket, 1 will go to the first bucket, and so on.
Thus, iterating over the buckets results in the keys of the HashMap being iterated over in a very predictable order - despite the integer keys having been added randomly to the map!

Note that in this extreme example, using a HashMap or HashSet in this way is an example of a bucket sort.

A similar argument applies when using Long as the key, since its hashCode() method is just the upper 32 bits XOR’d with the lower 32

`String` as key

String.hashCode() delegates to this method, and while it’s more complicated than that of Integer.hashCode(), we can still come up with an example that illustrates how the iteration order of a String-keyed HashMap will be predictable.

In this example, we create a randomly-shuffled list of the 26 uppercase letters of the English alphabet, each as a String. These String values are then used as the keys for a HashMap:

// Create a list of the letters [A-Z] each as a String
// NOTE: Generally it's not safe to convert integer values to characters this way, but this is only done as an example for brevity.
jshell> var letters = IntStream.range(65, 91).boxed().map(Character::toString).collect(Collectors.toList())
letters ==> [A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z]
|  created variable letters : List<String>

// Shuffle so the order is random.
jshell> Collections.shuffle(letters)

// Order looks random enough.
jshell> letters
letters ==> [A, T, H, W, D, L, R, V, Y, E, Z, J, C, S, G, Q, K, M, F, B, U, O, P, I, N, X]
|  value of letters : List<String>

jshell> var m = new HashMap<String, String>()
m ==> {}
|  created variable m : HashMap<String,String>

// Add the letters (Strings) to the map in the random order.
jshell> letters.forEach(l -> m.put(l, l))

jshell> m.size()
$16 ==> 26
|  created scratch variable $16 : int

// The iteration order is again predictable and sorted.
jshell> m.forEach((k, v) -> System.out.print(k + " "))
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

As with the Integer-keyed example above, no matter what the insertion order, the iteration order is predictable and the keys end up being sorted. Although this is a contrived example using only single character String values, a similar deterministic iteration order would exist when using any set of String values, although the exact pattern would be more difficult to detect due to the more complicated implementation of String.hashCode().

As an aside, because the String.hashCode() method implementation is well-known (and because it’s not meant to be a cryptographic hash function), producing collisions (that is, two different String values which have the same hash code) is fairly straightforward. This could be leveraged by attackers to initiate DoS attacks against servers if a user-provided string is used as the key for a HashMap: The attacker can just repeatedly send requests with separate string values that have the same hash code. If these string values are continually added to a HashMap the performance of that data structure will go down.

Conclusion

While the iteration order of a Java HashMap is undefined (that is, there are no guarantees as to the order), one should not construe undefined to mean random. Depending on how the hashCode() of the key is implemented, the iteration order can turn out to be quite predictable and thus quite biased.

If you want something to be random, you will have to explicitly add in a source of randomness, by for example using Collections.shuffle() on a list or using Random as a source of that randomness.