# What iteration order can you expect from a Java HashMap?

**Summary:**

While the iteration order of a Java

`HashMap`

is undefined (that is, there are no guarantees as to the order), one should not construe “undefined” to mean “random”. Depending on how the`hashCode()`

of the key is implemented, the iteration order can turn out to be quite predictable and thus quite biased.

In Java, one of the most commonly-used data structures is the `HashMap`

. Unlike `LinkedHashMap`

(insertion-order) and `TreeMap`

(key-based order), `HashMap`

does not provide any guarantees of ordering when iterating over the map, e.g. when using a for-each loop, or calling `forEach()`

.

Instead, the iteration order of a `HashMap`

depends on internal implementation details. In general, this iteration order will **not** be random, and may instead be quite predictable. That is, *not* having a guaranteed order does **not** mean having a random order. Instead the iteration order will be defined by the current state of the `HashMap`

and how the key’s `hashCode()`

method is defined.

To understand, let’s look at how `HashMap`

is designed.

This information is based off of the source code of

`HashMap.java`

in OpenJDK 18 and its documentation. It probably applies for earlier versions of Java as well.

`HashMap`

structure

Note that

`HashSet`

is also affected by everything discussed here, since internally,`HashSet`

just delegates to`HashMap`

for its implementation.

Java’s `HashMap`

is based off of a hash table that uses *separate chaining* to resolve *collisions*. That means when two keys map to the same *bucket* (a collision), the entries are stored in that bucket using a linked list data structure. If the number of entries mapping to a single bucket gets too large (i.e. the linked list is too long), the linked list is converted into a binary search tree (BST) that uses the hash code of the key for ordering within the tree. (A red-black tree is used)

Let’s see what that looks like:

The hash table is implemented as an array, which we’ll call the backing array. Each element in the array represents one bucket in the hash table, and stores zero or more entries. This backing array always has a size that is a power of two - why this is will be clear soon. (The size of the backing array is called the *capacity*)

When you call `HashMap.put(key, value)`

, the following happens: (Some details left out for clarity)

- Compute the hash of the entry by calling
`key.hashCode()`

to get a 32-bit integer. - Determine which bucket (array index) the entry should go in by calculating
`hashCode % numBuckets`

.- When the number of buckets is a power of two, this reduces to
`(numBuckets - 1) & hashCode`

, which is a simple bitwise operation - in this case a bit masking. - For example, if
`numBuckets = 128`

this reduces to`127 & hashCode`

, which simply looking at the last seven bits of`hashCode`

.

- When the number of buckets is a power of two, this reduces to
- If the number of entries in the hash table gets too large, a new backing array of twice the current size will be allocated, and all of the entries will be re-hashed/copied into it.

These important results follow from the above details:

- The hash table will perform best when the hash code values are uniformly distributed.
- This will result in the entries being uniformly distributed across the buckets and reduce the likelihood of collisions, i.e. many entries going to the same bucket.
- Collisions degrade the performance because now a linked list needs to be traversed for
`get()`

operations. (If there are too many entries, the linked list will be transformed into a BST, which helps mitigate this slow down)

- This depends on the
`hashCode()`

method of the key class being implemented in such a way as to distribute the keys uniformly across all the buckets.

## Iteration Order

Once you have put entries into a `HashMap`

, the normal iterators simply iterate over the backing array, going from one bucket to another. If a bucket has multiple entries, either in a linked list or a BST, those entries are visited first before any entries in subsequent buckets.

This means that the iteration order can be predictable, especially if the `hashCode()`

method of the key is trivial. Note that *predictable* in this context does not mean *guaranteed* (as in an API contract), since this iteration order is heavily dependent on the `HashMap`

’s internal implementation details, which could change in between different versions of Java.

Because the iteration order proceeds sequentially over the buckets of the hash table and the hash code of the key determines which bucket a key-value pair goes into, if the `hashCode()`

method of a key is straightforward it can produce a very predictable iteration order over the keys in a `HashMap`

.

# Some pitfalls of common key types

`Integer`

or `Long`

as key

When using an `Integer`

as a key the `hashCode()`

(a 32-bit value) is simply the value of the `Integer`

itself. This makes the iteration order very predictable, and also may result in the hash table not being filled uniformly, if the integer values themselves are not uniformly distributed.

For example, if we put the keys `4`

, `0`

, `3`

, `1`

, and `2`

into a `HashMap`

with a capacity of at least five, no matter what the insertion order was, the iteration order will always be: `[0, 1, 2, 3, 4]`

.

This is perhaps best illustrated when we generate a shuffled list of integers, and then add them to a `HashMap`

as keys and observe the iteration order: (Using `jshell`

)

```
// Create a list of integers [0, 99].
jshell> var ints = IntStream.range(0, 100).boxed().collect(Collectors.toList())
ints ==> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 ... 3, 94, 95, 96, 97, 98, 99]
| created variable ints : List<Integer>
// Randomly permute (shuffle) the order.
jshell> Collections.shuffle(ints)
// The list of integers [0, 99] is now shuffled into a random order.
jshell> ints
ints ==> [74, 24, 87, 78, 67, 89, 66, 13, 44, 84, 85, 9, 8, 62, 22, 75, 90, 53, 38, 72, 11, 69, 68, 4, 12, 18, 32, 52, 56, 35, 98, 20, 59, 55, 1, 64, 50, 10, 60, 45, 86, 41, 81, 2, 96, 82, 54, 40, 77, 48, 76, 14, 42, 99, 28, 65, 27, 39, 6, 19, 25, 58, 26, 70, 92, 61, 57, 15, 80, 51, 30, 37, 7, 94, 91, 0, 23, 3, 47, 29, 43, 83, 88, 5, 95, 71, 17, 16, 31, 73, 79, 49, 36, 97, 93, 46, 63, 34, 33, 21]
| value of ints : List<Integer>
jshell> var m = new HashMap<Integer, Integer>()
m ==> {}
| created variable m : HashMap<Integer,Integer>
// Add each of the integers above into a HashMap as the key (and value), in the random order from above:
jshell> ints.forEach(i -> m.put(i, i))
jshell> m.size()
$17 ==> 100
| created scratch variable $17 : int
// The iteration order of the HashMap has now re-sorted the integers into ascending order.
jshell> m.forEach((k, v) -> System.out.print(k + " "))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
```

Here’s an explanation of the above behaviour:

- We generate the numbers 0 to 99 inclusive into a list, and then randomly shuffle this list.
- Each of these numbers is then added to a
`HashMap`

as the key. - The
`HashMap`

will be automatically resized so that its final capacity (number of buckets) is at least 128.- The capacity will actually be 256, since the default load factor is 0.75, and 100/128 exceeds this, so the number of buckets would have to be at least 256 to keep the size/capacity ratio below this load factor.

- With this number of buckets, and knowing that
`Integer.hashCode()`

is simply the value of the integer itself, it follows that each integer key will simply be mapped to that same bucket index. For example,`0`

will go to the zeroth bucket,`1`

will go to the first bucket, and so on. - Thus, iterating over the buckets results in the keys of the
`HashMap`

being iterated over in a very predictable order - despite the integer keys having been added randomly to the map!

Note that in this extreme example, using a `HashMap`

or `HashSet`

in this way is an example of a bucket sort.

A similar argument applies when using `Long`

as the key, since its `hashCode()`

method is just the upper 32 bits XOR’d with the lower 32

`String`

as key

`String.hashCode()`

delegates to this method, and while it’s more complicated than that of `Integer.hashCode()`

, we can still come up with an example that illustrates how the iteration order of a `String`

-keyed `HashMap`

will be predictable.

In this example, we create a randomly-shuffled list of the 26 uppercase letters of the English alphabet, each as a `String`

. These `String`

values are then used as the keys for a `HashMap`

:

```
// Create a list of the letters [A-Z] each as a String
// NOTE: Generally it's not safe to convert integer values to characters this way, but this is only done as an example for brevity.
jshell> var letters = IntStream.range(65, 91).boxed().map(Character::toString).collect(Collectors.toList())
letters ==> [A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z]
| created variable letters : List<String>
// Shuffle so the order is random.
jshell> Collections.shuffle(letters)
// Order looks random enough.
jshell> letters
letters ==> [A, T, H, W, D, L, R, V, Y, E, Z, J, C, S, G, Q, K, M, F, B, U, O, P, I, N, X]
| value of letters : List<String>
jshell> var m = new HashMap<String, String>()
m ==> {}
| created variable m : HashMap<String,String>
// Add the letters (Strings) to the map in the random order.
jshell> letters.forEach(l -> m.put(l, l))
jshell> m.size()
$16 ==> 26
| created scratch variable $16 : int
// The iteration order is again predictable and sorted.
jshell> m.forEach((k, v) -> System.out.print(k + " "))
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
```

As with the `Integer`

-keyed example above, no matter what the insertion order, the iteration order is predictable and the keys end up being sorted. Although this is a contrived example using only single character `String`

values, a similar deterministic iteration order would exist when using any set of `String`

values, although the exact pattern would be more difficult to detect due to the more complicated implementation of `String.hashCode()`

.

As an aside, because the `String.hashCode()`

method implementation is well-known (and because it’s not meant to be a cryptographic hash function), producing collisions (that is, two different `String`

values which have the same hash code) is fairly straightforward. This could be leveraged by attackers to initiate DoS attacks against servers if a user-provided string is used as the key for a `HashMap`

: The attacker can just repeatedly send requests with separate string values that have the same hash code. If these string values are continually added to a `HashMap`

the performance of that data structure will go down.

# Conclusion

While the iteration order of a Java `HashMap`

is undefined (that is, there are no guarantees as to the order), one should not construe *undefined* to mean *random*. Depending on how the `hashCode()`

of the key is implemented, the iteration order can turn out to be quite predictable and thus quite biased.

If you want something to be random, you will have to explicitly add in a source of randomness, by for example using `Collections.shuffle()`

on a list or using `Random`

as a source of that randomness.