What iteration order can you expect from a Java HashMap?
Summary:
While the iteration order of a Java
HashMap
is undefined (that is, there are no guarantees as to the order), one should not construe “undefined” to mean “random”. Depending on how thehashCode()
of the key is implemented, the iteration order can turn out to be quite predictable and thus quite biased.
In Java, one of the most commonly-used data structures is the HashMap
. Unlike LinkedHashMap
(insertion-order) and TreeMap
(key-based order), HashMap
does not provide any guarantees of ordering when iterating over the map, e.g. when using a for-each loop, or calling forEach()
.
Instead, the iteration order of a HashMap
depends on internal implementation details. In general, this iteration order will not be random, and may instead be quite predictable. That is, not having a guaranteed order does not mean having a random order. Instead the iteration order will be defined by the current state of the HashMap
and how the key’s hashCode()
method is defined.
To understand, let’s look at how HashMap
is designed.
This information is based off of the source code of
HashMap.java
in OpenJDK 18 and its documentation. It probably applies for earlier versions of Java as well.
HashMap
structure
Note that
HashSet
is also affected by everything discussed here, since internally,HashSet
just delegates toHashMap
for its implementation.
Java’s HashMap
is based off of a hash table that uses separate chaining to resolve collisions. That means when two keys map to the same bucket (a collision), the entries are stored in that bucket using a linked list data structure. If the number of entries mapping to a single bucket gets too large (i.e. the linked list is too long), the linked list is converted into a binary search tree (BST) that uses the hash code of the key for ordering within the tree. (A red-black tree is used)
Let’s see what that looks like:
The hash table is implemented as an array, which we’ll call the backing array. Each element in the array represents one bucket in the hash table, and stores zero or more entries. This backing array always has a size that is a power of two - why this is will be clear soon. (The size of the backing array is called the capacity)
When you call HashMap.put(key, value)
, the following happens: (Some details left out for clarity)
- Compute the hash of the entry by calling
key.hashCode()
to get a 32-bit integer. - Determine which bucket (array index) the entry should go in by calculating
hashCode % numBuckets
.- When the number of buckets is a power of two, this reduces to
(numBuckets - 1) & hashCode
, which is a simple bitwise operation - in this case a bit masking. - For example, if
numBuckets = 128
this reduces to127 & hashCode
, which simply looking at the last seven bits ofhashCode
.
- When the number of buckets is a power of two, this reduces to
- If the number of entries in the hash table gets too large, a new backing array of twice the current size will be allocated, and all of the entries will be re-hashed/copied into it.
These important results follow from the above details:
- The hash table will perform best when the hash code values are uniformly distributed.
- This will result in the entries being uniformly distributed across the buckets and reduce the likelihood of collisions, i.e. many entries going to the same bucket.
- Collisions degrade the performance because now a linked list needs to be traversed for
get()
operations. (If there are too many entries, the linked list will be transformed into a BST, which helps mitigate this slow down)
- This depends on the
hashCode()
method of the key class being implemented in such a way as to distribute the keys uniformly across all the buckets.
Iteration Order
Once you have put entries into a HashMap
, the normal iterators simply iterate over the backing array, going from one bucket to another. If a bucket has multiple entries, either in a linked list or a BST, those entries are visited first before any entries in subsequent buckets.
This means that the iteration order can be predictable, especially if the hashCode()
method of the key is trivial. Note that predictable in this context does not mean guaranteed (as in an API contract), since this iteration order is heavily dependent on the HashMap
’s internal implementation details, which could change in between different versions of Java.
Because the iteration order proceeds sequentially over the buckets of the hash table and the hash code of the key determines which bucket a key-value pair goes into, if the hashCode()
method of a key is straightforward it can produce a very predictable iteration order over the keys in a HashMap
.
Some pitfalls of common key types
Integer
or Long
as key
When using an Integer
as a key the hashCode()
(a 32-bit value) is simply the value of the Integer
itself. This makes the iteration order very predictable, and also may result in the hash table not being filled uniformly, if the integer values themselves are not uniformly distributed.
For example, if we put the keys 4
, 0
, 3
, 1
, and 2
into a HashMap
with a capacity of at least five, no matter what the insertion order was, the iteration order will always be: [0, 1, 2, 3, 4]
.
This is perhaps best illustrated when we generate a shuffled list of integers, and then add them to a HashMap
as keys and observe the iteration order: (Using jshell
)
// Create a list of integers [0, 99].
jshell> var ints = IntStream.range(0, 100).boxed().collect(Collectors.toList())
ints ==> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 ... 3, 94, 95, 96, 97, 98, 99]
| created variable ints : List<Integer>
// Randomly permute (shuffle) the order.
jshell> Collections.shuffle(ints)
// The list of integers [0, 99] is now shuffled into a random order.
jshell> ints
ints ==> [74, 24, 87, 78, 67, 89, 66, 13, 44, 84, 85, 9, 8, 62, 22, 75, 90, 53, 38, 72, 11, 69, 68, 4, 12, 18, 32, 52, 56, 35, 98, 20, 59, 55, 1, 64, 50, 10, 60, 45, 86, 41, 81, 2, 96, 82, 54, 40, 77, 48, 76, 14, 42, 99, 28, 65, 27, 39, 6, 19, 25, 58, 26, 70, 92, 61, 57, 15, 80, 51, 30, 37, 7, 94, 91, 0, 23, 3, 47, 29, 43, 83, 88, 5, 95, 71, 17, 16, 31, 73, 79, 49, 36, 97, 93, 46, 63, 34, 33, 21]
| value of ints : List<Integer>
jshell> var m = new HashMap<Integer, Integer>()
m ==> {}
| created variable m : HashMap<Integer,Integer>
// Add each of the integers above into a HashMap as the key (and value), in the random order from above:
jshell> ints.forEach(i -> m.put(i, i))
jshell> m.size()
$17 ==> 100
| created scratch variable $17 : int
// The iteration order of the HashMap has now re-sorted the integers into ascending order.
jshell> m.forEach((k, v) -> System.out.print(k + " "))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
Here’s an explanation of the above behaviour:
- We generate the numbers 0 to 99 inclusive into a list, and then randomly shuffle this list.
- Each of these numbers is then added to a
HashMap
as the key. - The
HashMap
will be automatically resized so that its final capacity (number of buckets) is at least 128.- The capacity will actually be 256, since the default load factor is 0.75, and 100/128 exceeds this, so the number of buckets would have to be at least 256 to keep the size/capacity ratio below this load factor.
- With this number of buckets, and knowing that
Integer.hashCode()
is simply the value of the integer itself, it follows that each integer key will simply be mapped to that same bucket index. For example,0
will go to the zeroth bucket,1
will go to the first bucket, and so on. - Thus, iterating over the buckets results in the keys of the
HashMap
being iterated over in a very predictable order - despite the integer keys having been added randomly to the map!
Note that in this extreme example, using a HashMap
or HashSet
in this way is an example of a bucket sort.
A similar argument applies when using Long
as the key, since its hashCode()
method is just the upper 32 bits XOR’d with the lower 32
String
as key
String.hashCode()
delegates to this method, and while it’s more complicated than that of Integer.hashCode()
, we can still come up with an example that illustrates how the iteration order of a String
-keyed HashMap
will be predictable.
In this example, we create a randomly-shuffled list of the 26 uppercase letters of the English alphabet, each as a String
. These String
values are then used as the keys for a HashMap
:
// Create a list of the letters [A-Z] each as a String
// NOTE: Generally it's not safe to convert integer values to characters this way, but this is only done as an example for brevity.
jshell> var letters = IntStream.range(65, 91).boxed().map(Character::toString).collect(Collectors.toList())
letters ==> [A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z]
| created variable letters : List<String>
// Shuffle so the order is random.
jshell> Collections.shuffle(letters)
// Order looks random enough.
jshell> letters
letters ==> [A, T, H, W, D, L, R, V, Y, E, Z, J, C, S, G, Q, K, M, F, B, U, O, P, I, N, X]
| value of letters : List<String>
jshell> var m = new HashMap<String, String>()
m ==> {}
| created variable m : HashMap<String,String>
// Add the letters (Strings) to the map in the random order.
jshell> letters.forEach(l -> m.put(l, l))
jshell> m.size()
$16 ==> 26
| created scratch variable $16 : int
// The iteration order is again predictable and sorted.
jshell> m.forEach((k, v) -> System.out.print(k + " "))
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
As with the Integer
-keyed example above, no matter what the insertion order, the iteration order is predictable and the keys end up being sorted. Although this is a contrived example using only single character String
values, a similar deterministic iteration order would exist when using any set of String
values, although the exact pattern would be more difficult to detect due to the more complicated implementation of String.hashCode()
.
As an aside, because the String.hashCode()
method implementation is well-known (and because it’s not meant to be a cryptographic hash function), producing collisions (that is, two different String
values which have the same hash code) is fairly straightforward. This could be leveraged by attackers to initiate DoS attacks against servers if a user-provided string is used as the key for a HashMap
: The attacker can just repeatedly send requests with separate string values that have the same hash code. If these string values are continually added to a HashMap
the performance of that data structure will go down.
Conclusion
While the iteration order of a Java HashMap
is undefined (that is, there are no guarantees as to the order), one should not construe undefined to mean random. Depending on how the hashCode()
of the key is implemented, the iteration order can turn out to be quite predictable and thus quite biased.
If you want something to be random, you will have to explicitly add in a source of randomness, by for example using Collections.shuffle()
on a list or using Random
as a source of that randomness.