Java Collections

Collection Interface

collection

List

ArrayList

Basic Concepts

  1. Allows null values.
  2. Memory is contiguous, saving memory space.
  3. Insertion time complexity: O(n) for insertion at the head because all subsequent elements need to be shifted; O(1) for insertion at the tail if no resizing is needed, otherwise O(n) to copy the original ArrayList and then O(1) for insertion. O(n) for insertion in the middle.
  4. Deletion time complexity: Corresponds to insertion time complexity.
  5. Not thread-safe.

Underlying Implementation

  1. Implemented by dynamic array. Random access to a node only requires calculating its address, so random access is O(1). Implements the RandomAccess marker interface.
  2. If no initial capacity is specified, the default initial capacity is 0. When the first element is added, the capacity becomes 10. Each dynamic expansion increases the capacity to 1.5 times the original.

Differences between ArrayList and Array

  1. ArrayList can be dynamically resized; Array cannot, its length is fixed and needs to be specified when initializing the array.
  2. ArrayList supports generics.
  3. ArrayList only supports objects; for primitive data types, their wrapper classes like Integer, Double are needed. Array supports both primitive data types and objects.

LinkedList

Underlying Implementation

  1. Implemented by a doubly linked list. Insertion/deletion at the head/tail are both O(1).
  2. Does not support random access. Time complexity for accessing a specific element is O(n).
  3. Performance is not as good as ArrayList, rarely used in actual development.
  4. Not thread-safe.
  5. Allows null values.
  6. Occupies more space than ArrayList because it stores pointers in addition to data.

HashMap

Implementation Principle

  1. Implemented by hash table + linked list/red-black tree. Hash collision resolution methods include chaining/probing, usually chaining.
  2. How is the hash value calculated? Get the hashCode() of the key. Perform an XOR operation between the original hashcode and the hashcode shifted right by 16 bits.
1
int hash = (key == null) ? 0 : (key.hashCode()) ^ ((key.hashCode()) >>> 16);
  1. Calculate array index. hash & (length - 1), equivalent to hash % length. Binary operations are faster than modulo operations.
  2. If a collision occurs, check if the key exists in the linked list or red-black tree. If it exists, replace the value directly. If not, add it. When the linked list length is 8 and the hash table length is greater than 64, it is converted to a red-black tree.
  3. If the number of red-black tree nodes is less than 6, it reverts to a linked list.
  4. Initial capacity is 16, load factor is 0.75, expands to twice the original size.
  5. Not thread-safe. Resizing may cause a circular linked list.

Time Complexity

Ideally, the average time complexity for HashMap’s insertion, deletion, and lookup operations is O(1). If converted to a red-black tree, the time complexity of operations will stabilize at O(log n).

Resizing

  1. Only one node in the bucket: directly rehash and move to the new hash table.
  2. Linked list: Iterate through the linked list, rehash each element. Their new positions will either be the original index or the original index + original capacity. So it’s equivalent to splitting into two linked lists.
  3. Red-black tree: Treated as a linked list. If the number of nodes is 6, it degenerates into a linked list.

Not Thread-Safe

put operations are not atomic. If two threads calculate the same bucket index for two elements and concurrently insert values into the hash table, one insertion operation may be overwritten by another.

If a put operation is executed by another thread while the HashMap is resizing, will it be inserted into the new table or the old table?

If resizing is not yet complete: The new thread’s put operation will attempt to operate on the old table. This is because during resizing, the old table is still accessible, and the new thread will first check the old table’s status when performing a put. If it finds that resizing is in progress, it will try to assist with the resizing operation (help migrate elements). After assisting with a portion of the resizing work, it will continue to execute the put operation, but ultimately it will still calculate the insertion position based on the old table. If resizing is complete: The new thread’s put operation will be performed on the new table. This is because after resizing is complete, the HashMap’s internal table reference already points to the new array, and subsequent operations will be executed based on the new array.

ConcurrentHashMap

  1. For single key-value pairs, CAS is generally used.
  2. When dealing with linked lists or red-black trees, synchronized is applied to their head nodes. However, other threads can freely modify other buckets at this time.
  3. HashMap’s key and value can both be null. However, ConcurrentHashMap’s key and value cannot be null.
  4. ConcurrentHashMap does not allow null values, mainly because its design philosophy is to provide strong consistency in high-concurrency scenarios. If null values were allowed, handling concurrent read/write operations would become very complex, making it difficult to determine whether a key’s absence means it truly doesn’t exist or if its value is null, which could easily lead to data inconsistency issues. Therefore, null values are simply not allowed.