Few notes about Java Collections

Hi,

In this post I will cover few of the important details about the java collections. So many times I saw myself fainted the moment my interviewer started questioning about Java collections :). Collection Framework is one of the important part of Java programming. It provides the complete solutions for the data structure. I will highlight few of the basic but important details about the collection. Hope this helps.

HashSet vs TreeSet

The Collection Framework provides two general purpose implementations often Set interface, HashSet and TreeSet.

– More often than not you will use a HashSet for storing your duplicate-free collection.

– For efficiency objects added to a HashSet need to implement the hashCode() method in a manner that properly distributes the hash codes. While most system classes override the default hashCode() implementation of the Object, when creating your own class to add to a HashSet remember to override hashCode().

The TreeSet implementations useful when you need to extract elements from a collection in a sorted manner.

It is generally faster to add elements to the HasSet then convert the collection to a TreeeSet for sorted traversal.

To optimize HashSet space usage , you can tune initial capacity and load factor.

TreeSet has no tuning options, as the tree is always balanced, ensuring log(n0 performance for insertions, deletions and queries.

*
Note that these implementation is not synchronized. If multiple threads access a set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be “wrapped” using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:

Set s = Collections.synchronizedSet(new HashSet(…));

SortedSet s = Collections.synchronizedSortedSet(new TreeSet(…));


ArrayList and Linked List

There are two general-purpose List implementations in the Collection Framework, ArrayList and LinkedList, which of the two List implementations you use depends on your specific needs.

– If you need to support random access, without inserting or removing elements from any place to other than the end, then ArrayList offers you the optimal collection,

the LinkedList class provides uniformly named methods to get, remove and insert an element at the beginning and end of the list.

Each ArrayList instance has a capacity. The capacity is the size of the array used to store the elements in the list. It is always at least as large as the list size. As elements are added an ArrayList, its capacity grows automatically. The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.

An application can increase the capacity of an ArrayList instance before adding a large number of elements using the ensureCapacity operation. This may reduce the amount of incremental reallocation.

*Note that these implementation is not synchronized. If multiple threads access a set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be “wrapped” using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:

List list = Collections.synchronizedList(new LinkedList(…));

List list = Collections.synchronizedList(new ArrayList(…));


Tree Map and HashMap

The Collection Framework provides two general-purpose Map implementation: HashMap and TreeMap.

As with all the concrete implementations, which implement you use depends on your specific needs.
For inserting, deleting and locating elements in a Map the HashMap offers best alternatively.

If however you need to traverse the keys in a sorted order then TreeMap is better alternative.

Depending upon your size of your collection, it may be faster to add elements to a HashMap then convert the Map to a TreeMap for sorted key traversal.

– Using a HashMap requires that the class of key added have a well-defined hashCode() implementation. With the TreeMap implementation elements added to the Map must be sortable.

To optimize HashMap usage you can tune the initial capacity and load factor.

The TreeMap has no tuning options as the tree is always balanced

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the capacity is roughly doubled by calling the rehash method.

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

* Note that these implementation is not synchronized. If multiple threads access a set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be “wrapped” using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:

Map m = Collections.synchronizedMap(new HashMap(…));

Map m = Collections.synchronizedMap(new TreeMap(…));

Array List vs Vector

Vector and ArrayList are very similar. Both of them represent a ‘growable array‘, where you access to the elements in it through an index.

ArrayList it’s part of the Java Collection Framework, and has been added with version 1.2, while Vector it’s an object that is present since the first version of the JDK. Vector, anyway, has been retrofitted to implement the List interface.

The main difference is that Vector it’s a synchronized object, while ArrayList it’s not.

While the iterator that are returned by both classes are fail-fast (they cleanly thrown a ConcurrentModificationException when the orignal object has been modified), the Enumeration returned by Vector are not.

Unless you have strong reason to use a Vector, the suggestion is to use the ArrayList.

I will try to cover as many as in the same post, So please keep watching.

Thanks
R Vashi

Advertisements

2 thoughts on “Few notes about Java Collections

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s