Topic 7. Collections Framework (List, Set, Map, Queue)
Collections Framework in Java is not just a bag of containers. It is the core language for modeling in-memory data behavior. In backend systems, almost every flow relies on collection semantics: loading rows into List, creating lookup indexes in Map, enforcing uniqueness with Set, and scheduling work through Queue. Because of that, collection choice is not only about performance. It is also about correctness, readability, and long-term maintainability.
A weak choice often creates hidden problems: unstable ordering, accidental duplicates, repeated linear scans in hot paths, and brittle code that is hard to reason about under load. A strong choice makes business intent explicit in the type system itself. In this topic, we go layer by layer: Iterable, Collection, List, Set, Map, and Queue, with clear storage-model explanations, code snippets, and visuals for each structure.
Iterable: traversal contract behind for-each
Iterable<T> is the minimal contract that says: this data structure can be traversed sequentially. It exposes one method, iterator(), which returns an Iterator<T>. The enhanced for loop (for-each) compiles down to that iterator protocol.
This interface does not promise uniqueness, sorting, random access, or queue semantics. It only promises valid traversal. That is exactly why it is powerful: one algorithm can traverse many data structures without knowing their concrete implementation.
Iterable<String> names = List.of("Alice", "Bob", "Carol");
for (String item : names) {
System.out.println(item);
}
A practical design rule: use the weakest abstraction that is sufficient for your algorithm. If a method only needs sequential read access, accept Iterable<T> instead of List<T> or Set<T>. This reduces coupling and keeps APIs flexible.
public static void printAll(Iterable<String> data) {
for (String value : data) {
System.out.println(value);
}
}
Another important detail is fail-fast behavior. Most standard iterators throw ConcurrentModificationException if the underlying collection is structurally modified outside the iterator during traversal. That is a debugging guardrail, not thread-safety. For concurrent mutation, you need concurrent collections or explicit synchronization.
Collection: shared behavioral contract for element groups
Collection<E> extends Iterable<E> and adds common operations for element groups: add, remove, contains, size, isEmpty, clear, and iterator. This is the abstraction layer before specialization into ordering, uniqueness, or queue processing.
The critical engineering point is that Collection defines behavior, not storage strategy or complexity guarantees. The same operation can have very different costs depending on implementation. For example, contains is linear in ArrayList but usually near O(1) average in HashSet.
Collection<String> tags = new ArrayList<>();
tags.add("java");
tags.add("spring");
boolean hasJava = tags.contains("java");
int count = tags.size();
Also remember that not every collection supports mutation. Some implementations are immutable and throw UnsupportedOperationException for modifying calls. This is common with factory methods such as List.of(...) and Set.of(...). API design should make mutability expectations explicit.
At service boundaries, Collection is often the right abstraction when the caller should decide concrete storage but the callee only needs shared collection behavior.
List: ordered sequence with index and duplicates
Use List<E> when element order is meaningful and positional access matters. List explicitly allows duplicates, which is often semantically important: equal values can still represent different events because position and context differ.
The two most common implementations are ArrayList and LinkedList. ArrayList stores elements in a dynamic contiguous array. That gives very fast index reads (get(i)), but middle insertions/removals shift elements and can be expensive. Capacity growth may trigger resize and copy.
List<String> users = new ArrayList<>();
users.add("u1");
users.add("u2");
users.add(1, "u1.5"); // shifts tail elements
System.out.println(users.get(2)); // u2
LinkedList stores nodes connected by references. Structural updates near known positions or ends are cheap, but random index access is expensive because traversal is required.
List<String> events = new LinkedList<>();
events.add("A");
events.add("B");
events.remove(0); // relinks nodes, no array shift
Do not pick LinkedList only because of theoretical O(1) insertion claims. In real workloads, memory locality and iteration patterns often make ArrayList the better default. Choose LinkedList only when your actual mutation pattern clearly benefits from node-based structure.
Set: uniqueness model and membership checks
Set<E> expresses a binary membership rule: an element is either present or not present. Duplicates are rejected by contract. This is more than convenience; it is a way to encode business invariants directly into data structures.
HashSet is the typical default. Internally it uses hashing and buckets. Uniqueness depends on hashCode and equals. If these methods are inconsistent for custom objects, set behavior becomes logically broken: lookups fail, duplicates sneak in, or removals miss existing entries.
Set<String> roles = new HashSet<>();
roles.add("ADMIN");
roles.add("ADMIN");
roles.add("USER");
System.out.println(roles.size()); // 2
System.out.println(roles.contains("USER")); // true
LinkedHashSet preserves insertion iteration order. TreeSet keeps elements sorted and supports range operations, with O(log n) complexity for core operations.
Set<Integer> sorted = new TreeSet<>();
sorted.add(40);
sorted.add(10);
sorted.add(25);
System.out.println(sorted); // [10, 25, 40]
If uniqueness is part of the domain rule, model it with Set from the beginning instead of deduplicating later through ad hoc logic.
Map: key-value indexing and fast addressing
Map<K, V> does not extend Collection because it stores key-value associations rather than standalone elements. In backend architecture, Map is the primary in-memory indexing structure: user id to user data, code to config, token to session metadata, and many other lookup-heavy paths.
HashMap is the default for general lookup cases. It offers fast average put/get/remove when key hashing is well distributed.
Map<Long, String> userById = new HashMap<>();
userById.put(101L, "Alice");
userById.put(102L, "Bob");
String name = userById.get(102L); // Bob
LinkedHashMap is useful when predictable iteration order matters, and can support LRU-style behavior in access-order mode. TreeMap provides sorted keys and navigation/range operations (firstKey, ceilingKey, subMap), with O(log n) operation cost.
Think of Map as a local index. Replacing repeated linear scans with map lookups is one of the highest-impact and safest optimizations in business code.
Queue: processing order, FIFO and priority
Queue<E> models consumption order rather than simple storage. Standard queue semantics are FIFO: first in, first out. This maps directly to practical flows such as task pipelines, event handling, and command processing.
Core methods are offer (enqueue), poll (dequeue head), and peek (inspect head). In robust code, these are often preferred over exception-oriented alternatives because they handle empty-state boundaries more gracefully.
Queue<String> tasks = new ArrayDeque<>();
tasks.offer("import-users");
tasks.offer("send-report");
String first = tasks.poll(); // import-users
String next = tasks.peek(); // send-report
ArrayDeque is usually the best default queue/deque in single-threaded contexts. PriorityQueue follows different semantics: it returns the highest-priority element, not the earliest inserted one.
Queue<Integer> pq = new PriorityQueue<>();
pq.offer(50);
pq.offer(10);
pq.offer(30);
System.out.println(pq.poll()); // 10 (lowest value is highest priority by default)
That distinction is essential. If strict arrival order is required, use FIFO queue semantics. If business priority should dominate processing order, use PriorityQueue with an explicit comparator.
How to choose in real systems
Start with semantics, not with a complexity table. If you need positional order, use List. If you need uniqueness guarantees, use Set. If you need key-based addressing, use Map. If you need processing order, use Queue.
When semantics match structure, performance and correctness usually align naturally. When semantics mismatch structure, complexity leaks into business code as manual deduplication, repeated sorting, or expensive repeated scanning.
A reliable engineering pattern is: expose interfaces (List, Set, Map, Queue) in contracts, choose concrete implementations from measured workload behavior, and revisit choices when real profiling data says so. That is how collection design stays robust at production scale.