Skip to content

Java HashSet

When working with Java, efficient data storage and retrieval are essential. Java’s HashSet class, part of the Java Collections Framework, is a robust tool that offers developers an easy way to store unique elements. It’s backed by a HashMap and offers constant-time performance for basic operations like add, remove, and contains. In this blog, we’ll explore the HashSet in detail, discussing how it works, its key features, and some practical examples to help you fully understand its utility.


1. Introduction to HashSet

HashSet is part of Java’s java.util package and implements the Set interface. A set is a collection of elements that does not allow duplicate values, and HashSet is one of the most commonly used implementations. The primary reason developers choose HashSet is its excellent performance for operations like adding, removing, and searching for elements.

It’s important to note that HashSet does not maintain any order of the elements. If ordering is necessary, other implementations like TreeSet or LinkedHashSet may be more appropriate. However, if order doesn’t matter and you’re only concerned with performance and uniqueness, HashSet is a fantastic choice.

Key Points about HashSet:

  • HashSet contains only unique elements.
  • It allows null values.
  • It is not synchronized, meaning it’s not thread-safe unless manually synchronized.
  • It doesn’t maintain any insertion order.

2. Key Characteristics of HashSet

Understanding the main properties of HashSet will help you decide whether it’s the right tool for your specific use case.

  • Uniqueness: HashSet ensures that no duplicate elements are stored. It relies on the hashCode() and equals() methods of objects to determine equality.
  • Null Elements: HashSet allows exactly one null element.
  • No Guarantee of Order: Unlike lists, HashSet does not maintain any order of the elements, including insertion order.
  • Capacity and Load Factor: HashSet is backed by a HashMap, which means it has an initial capacity and a load factor that governs when the set expands.

3. How HashSet Works Internally

To understand the inner workings of a HashSet, you need to know that it’s backed by a HashMap. Each element in the HashSet is stored as a key in this HashMap, while a default value is assigned to it.

Here’s how it works:

  1. Hashing: When you add an element to a HashSet, its hashCode() method is called to determine where the element should be placed in the internal HashMap. The hashing mechanism ensures a constant time (on average) for the operations.

  2. Buckets: The HashMap uses an array of buckets. Each element is stored in a bucket based on its hash code. If two elements have the same hash code, they will be placed in the same bucket using linked lists or, after Java 8, balanced trees.

  3. Collision Resolution: If two elements have the same hash code (collision), they are compared using their equals() method. If equals() returns false, both elements are kept in the same bucket.

  4. Rehashing: If the number of elements exceeds the capacity defined by the load factor, the HashMap resizes itself (usually doubling its capacity), which can degrade performance momentarily.


4. Common HashSet Operations

Here are some of the most commonly used operations with a HashSet:

Adding Elements

To add an element to a HashSet, you use the add() method. If the element is already present, it returns false.

java
HashSet<String> set = new HashSet<>();
set.add("Java");
set.add("Python");
set.add("Java");  // Duplicate, not added

Removing Elements

Elements can be removed using the remove() method. If the element doesn’t exist, the method returns false.

java
set.remove("Python");

Checking for Membership

To check if an element is present in the set, use the contains() method.

java
if (set.contains("Java")) {
    System.out.println("Java is in the set.");
}

Size of HashSet

The size() method returns the number of elements in the HashSet.

java
System.out.println("Size of the set: " + set.size());

5. Iterating Over a HashSet

Even though HashSet doesn’t maintain order, you can still iterate through its elements using multiple approaches.

Using an Iterator:

java
Iterator<String> iterator = set.iterator();
while (iterator.hasNext()) {
    System.out.println(iterator.next());
}

Using Enhanced For Loop:

java
for (String element : set) {
    System.out.println(element);
}

Using forEach and Lambda Expressions (Java 8+):

java
set.forEach(element -> System.out.println(element));

6. Practical Examples with HashSet

Let’s explore a few practical examples to solidify our understanding.

Example 1: Remove Duplicates from a List

HashSet can be used to remove duplicates from a list since it automatically filters duplicates.

java
List<String> list = Arrays.asList("apple", "banana", "apple", "orange");
HashSet<String> uniqueFruits = new HashSet<>(list);
System.out.println(uniqueFruits); // Output: [banana, orange, apple]

Example 2: Intersection of Two Sets

You can use retainAll() to find common elements between two sets.

java
HashSet<String> set1 = new HashSet<>(Arrays.asList("a", "b", "c"));
HashSet<String> set2 = new HashSet<>(Arrays.asList("b", "c", "d"));
set1.retainAll(set2);
System.out.println(set1);  // Output: [b, c]

7. HashSet vs. Other Set Implementations

It’s important to compare HashSet with other Set implementations in Java to understand when you might want to use one over the other.

  • HashSet vs. TreeSet:

    • HashSet is faster, but it does not maintain any order of elements.
    • TreeSet is slower as it maintains elements in natural (or custom) order using a Red-Black Tree structure.
  • HashSet vs. LinkedHashSet:

    • LinkedHashSet maintains insertion order but has a slight performance overhead due to the linked list used internally.

8. When to Use HashSet

A HashSet is ideal when:

  • You need to store unique elements.
  • The order of elements is irrelevant.
  • You require high performance for operations like adding, removing, and checking for elements.
  • You are dealing with large datasets and performance is a priority.

Avoid using HashSet if:

  • You need the elements to be ordered. Instead, use TreeSet or LinkedHashSet.
  • You require thread safety. Consider using Collections.synchronizedSet() or ConcurrentHashMap.

9. Performance Considerations

Performance is one of the main reasons why developers choose HashSet. The average time complexity for basic operations like add(), remove(), and contains() is O(1), assuming the hash function disperses the elements properly among the buckets.

However, there are some cases where performance might degrade:

  • Collisions: If many elements end up in the same bucket due to poor hash functions, you may experience O(n) time complexity.
  • Rehashing: When the number of elements exceeds the threshold defined by the load factor, the underlying HashMap is resized, which can momentarily degrade performance.

10. Conclusion

The HashSet class is a powerful and flexible tool in Java’s Collection Framework that is perfect for situations where you need fast, efficient handling of unique elements. Its simplicity, combined with impressive performance, makes it a go-to choice in many applications.

Whether you’re dealing with a large dataset or simply need to eliminate duplicates, HashSet offers an excellent balance of functionality and speed. Remember to choose wisely based on your specific requirements, especially if element order or thread safety is crucial.

Understanding the internal workings and characteristics of HashSet is essential for making informed decisions on when and how to use it effectively in your Java projects.

Waytojava is designed to make learning easier. We simplify examples for better understanding. We regularly check tutorials, references, and examples to correct errors, but it's important to remember that humans can make mistakes.