Precise Java Google
Home  | J2SE  | J2EE  | Book  | Free Books  | Contact
Collections  | StringAndStringBuffer  | Serialization  | IO  | Objects  | Loops  | Exceptions

Performance Improvement techniques in Collections

 

This topic illustrates the performance improvement techniques in Collections with the following sections:

Overview of Collections

Collection is a group of objects.  java.util package provides important types of collections. There are two fundamental types of collections they are Collection and Map. Collection types hold a group of objects, Eg. Lists and Sets where as Map types hold group of objects as key, value pairs Eg. HashMap and Hashtable.

This section examples are tested on Windows 2000, millennium, 320mb RAM, JDK 1.3 and JDK1.2.2

Note: This section assumes that reader has some basic knowledge of Java Collections.

Optimization techniques in Lists

List types represent an ordered collection of objects. ArrayList, Vector, Stack and LinkedList are the List implementation classes. All  List types support basic operations - adding objects, removing objects, accessing objects and iterating through the list. So which one to choose since all the list implementations support these basic operations? Performance is different for each class based on specific operations. So your choice is driven by the performance and the requirement options. Your requirement could be

1. Thread safe collection

2. Size of collection (large or small collection)

3. Type of operation ( adding, removing, accessing or iterating )

If you want your collection to be thread safe then Vector or Stack must be used because both have synchronized methods. While ArrayList and LinkedList are not thread safe. Stack is meant for specific LIFO (last in - first out) operations, this can be filtered down based on this specific requirement. If you don't want your collection to be thread safe then you have can choose between ArrayList or LinkedList. General concept from performance point of view is that ArrayList gives better performance when accessing and iterating objects whereas LinkedList gives better performance when adding and removing objects. Although true in most cases, sometimes there is an exception.

The following are the benchmarks shown for each operation separately.

Adding objects:

The table below shows time taken when I run ListAddTest.java which adds elements at different positions and is run on different JDK versions to measure performance bench marks.  Time shown here is in milli seconds.

Size of collection Type of operation JDK version ArrayList with out initialization ArrayList with initialization Vector with out initialization Vector with initialization LinkedList
50000 objects Adding objects at end 1.2 195 15 30 25 390
1.3 70 10 60 20 50
Adding objects at middle 1.2 2488 2513 2578 2508 162413
1.3 2543 2553 2619 2574 96553
Adding objects at first 1.2 5538 6053 5472 8377 45
1.3 6499 5848 5568 6078 30
10000 objects Adding objects at end 1.2 215 35 70 60 325
Adding objects at middle 1.2 11191 11416 11001 10875 662172
Adding objects at first 1.2 115871 119962 118831 120293 85

 

conclusion is

Type of operation ArrayList with out initialization ArrayList with initialization Vector with out initialization Vector with initialization LinkedList
Adding objects at end fast (but slower than initialization) fast fast (but sligtly slower than initialization and slower than ArrayList because of synchronization) fast(but sligtly slower than ArrayList because of synchronization) fast ( but slightly slower than ArrayList and Vector)
Adding objects at middle slow ( slower than when adding objects at last) slow ( slower than when adding objects at last) slow ( slower than when adding objects at last) slow ( slower than when adding objects at last) worse( worse than every operation)
Adding objects at first slow ( slower than when adding objects at last and middle) slow ( slower than when adding objects at last and middle) slow ( slower than when adding objects at last and middle) slow ( slower than when adding objects at last and middle) slow ( slower than when adding objects at last and middle)

 

The reason is

1. JDK 1.3 gives best performance because of HotSpot Virtual Machine.

2. The initial size for ArrayList and Vector is 10. ArrayList increases its capacity by half approximately whenever its capacity reaches maximum (10) but Vector increases its capacity by double whenever its capacity reaches maximum. That is the reason why ArrayList takes more time than Vector if it is not initialized with proper size though ArrayList is not synchronized. As soon as it reaches its maximum capacity when adding objects, it creates one more bigger array ( with 15 capacity for ArrayList approximately and 20 capacity for Vector) and copies the previous and new objects into new array. Obviously it is expensive to create new array and copy objects. So best approach is to initialize the ArrayList and Vector with proper size using constructors or using ensureCapacity(int capacity) which gives good performance.  If you initialize with proper size then the ArrayList gives better performance than Vector.

ArrayList with initialization gives better performance than others because its methods are non-synchronized. Synchronized methods are bit expensive because JVM has to lock the objects whenever it finds synchronized methods.

Vector takes slightly more time than ArrayList when you use JDK1.3 Hotspot JVM ,if you are not sure that whether your collection needs to be thread safe or not then it is better to use Vector to have higher safety.

You can convert an ArrayList as thread safe collection using Collections.synchronizedList(ArrayList object) but it is more expensive than using a Vector.

 

3. ArrayList and Vector maintain internal Object array ( Object[]) to store objects. So whenever you add an object, they add it to the end of the array which is  fine as long as it doesn't reach its maximum capacity. If you want to add an object at any other position, it creates a new object array and recopies all the objects which is expensive. That is the reason why adding objects at middle and beginning of collection takes a long time than when it is adding at the end. 

4. LinkedList gives good performance when adding elements at the end and beginning but it is worse when adding objects at middle because it needs to scan the node whenever it needs to add an object. LinkedList cannot be initialized.

The constructors for ArrayList and Vector to initialize with proper size are

ArrayList( int initialcapacity)

Vector( int initialcapacity)

Vector( int initialcapacity, int capacityIncrement)

You can give incremental capacity in Vector to change the default increment in capacity.

Here is the ListAddTest.java source code.

 

package com.performance.util;

 //This class gives the List classes  benchmarks for adding objects at  end,middle and first

 import java.util.List;

import java.util.ArrayList;

import java.util.LinkedList;

import java.util.Vector;

 public class ListAddTest {

              private static final int NUM = 50000;

             private static String[] objs = null;

 public void addLast(List list) {

              long startTime = System.currentTimeMillis();

              for(int i=0;i<NUM;i++){

                          list.add(objs[i]);

             }

             long endTime = System.currentTimeMillis();

             System.out.println("Time taken for adding Objects at End: "

                                                                        + (endTime - startTime) );

              }

public void addFirst(List list) {

              long startTime = System.currentTimeMillis();

              for(int i=0;i<NUM;i++){

                          list.add(0,objs[i]);

             }

            long endTime = System.currentTimeMillis();

             System.out.println("Time taken for adding Objects at First : "

                                                                        + (endTime - startTime) );

            }

public void addMiddle(List list) {

             long startTime = System.currentTimeMillis();

             for(int i=0;i<NUM;i++){

                          list.add(i/2,objs[i]);

             }

             long endTime = System.currentTimeMillis();

             System.out.println("Time taken for adding Objects at Middle : "

                                                                        + (endTime - startTime) );

            }

public void doTest(List list) {

                          addLast(list);

                          clear(list);

                          addMiddle(list);

                          clear(list);

                          addFirst(list);

                          clear(list);

             }

public void clear(List col){

                          if(!col.isEmpty())

                                          col.clear();

}

public static void main(String[] args){

             objs = new String[NUM];

             for(int i=0;i<NUM;i++){

                          objs[i] = "Object"+i;

             }

             ListAddTest col = new ListAddTest();

             ArrayList collection1 = new ArrayList();

             col.doTest(collection1);

             ArrayList collection1A = new ArrayList(NUM);

             col.doTest(collection1A);

             Vector collection2 = new Vector();

             col.doTest(collection2);

             Vector collection2A = new Vector(NUM);

             col.doTest(collection2A);

             LinkedList collection4 = new LinkedList();

             col.doTest(collection4);

             }

}

 

 

Removing objects:

The table below shows time taken when I run ListRemoveTest.java on removing elements from different positions to measure performance bench marks.  Time shown here is in milliseconds.

 

Size of collection JDK version Type of operation ArrayList Vector LinkedList
20000 objects 1.3 Removing objects from end 0 50 12
Removing objects from middle 370 302 2240
Removing objects from first 615 617 0
50000 objects Removing objects from end 60 50 15
Removing objects from middle 1810 1810 46850
Removing objects from first 6155 5862 0
80000 objects Removing objects from end 15 52 0
Removing objects from middle 5427 5972 139952
Removing objects from first 30827 36370 27

 

The conclusion for removing objects is

  1. All classes take approximately same time when removing objects from end
  2. ArrayList and Vector give similar performance with slight difference because of JDK1.3 Hotspot JVM.
  3. LinkedList  gives worst performance when removing objects from middle (similar to adding objects at middle).
  4. LinkedList gives better performance when removing objects from the beginning.
  5. Only LinkedList gives better performance when removing objects from the beginning.

Here is the ListRemoveTest.java source code

 

package com.performance.util;

// This class shows the benchmarks of List types for removing objects at end, middle and first 

import java.util.List;

import java.util.ArrayList;

import java.util.LinkedList;

import java.util.Vector;

import java.util.Arrays;

public class ListRemoveTest {

             private static final int NUM = 20000;

             private static Object[] objs = null;

public void removeLast(List list) {

             long startTime = System.currentTimeMillis();

             for(int i=NUM;i>0;i--){

                          list.remove(i-1);

             }

             long endTime = System.currentTimeMillis();

             System.out.println("Time taken for removing Objects at End: "

                                                                        + (endTime - startTime) );

             }

public void removeFirst(List list) {

             long startTime = System.currentTimeMillis();

             for(int i=0;i<NUM;i++){

                          list.remove(0);

             }

             long endTime = System.currentTimeMillis();

             System.out.println("Time taken for removing Objects at First : "

                                                                        + (endTime - startTime) );

             }

public void removeMiddle(List list) {

             long startTime = System.currentTimeMillis();

             for(int i=0;i<NUM;i++){

                          list.remove((NUM-i)/2);

             }

             long endTime = System.currentTimeMillis();

             System.out.println("Time taken for removing Objects at Middle : "

                                                                        + (endTime - startTime) );

             }

public void doTest(List collection) {

                          collection.addAll(getList()); // fill the List

                          removeLast(collection);

                          clear(collection);

                          collection.addAll(getList()); // fill the List

                          removeMiddle(collection);

                          clear(collection);

                          collection.addAll(getList()); // fill the List

                          removeFirst(collection);

                          clear(collection);

             }

public void clear(List col){

                          if(!col.isEmpty())

                                          col.clear();

}

public List getList(){

             objs = new Object[NUM];

             for(int i=0;i<NUM;i++){

                          objs[i] = new Object();

             }

              return Arrays.asList(objs);       

}

public static void main(String[] args){

             ListRemoveTest col = new ListRemoveTest();

             ArrayList collection1 = new ArrayList();

             col.doTest(collection1);

             Vector collection2 = new Vector();

             col.doTest(collection2);

             LinkedList collection4 = new LinkedList();

             col.doTest(collection4);

             }

}

 

Accessing objects:

The table below shows time taken when I run ListAccessTest.java for accessing elemnts at different positions to measure performance bench marks.  Time shown here is in milliseconds.

 

Size of collection JDK version Type of operation ArrayList Vector LinkedList
25000 objects 1.3 Accessing objects from end 0 0 4701
Accessing objects from middle 0 0 21002
Accessing objects from first 0 0 15
50000 objects Accessing objects from end 0 45 45100
Accessing objects from middle 0 0 112100
Accessing objects from first 0 20 15
100000 objects Accessing objects from end 0 50 211850
Accessing objects from middle 0 30 457090
Accessing objects from first 0 0 15

The conclusion is

  1. ArrayList and Vector give best performance because they access objects using index. Vector takes slightly more time but it is negligible.
  2. LinkedList gives worst performance  when accessing objects at end and middle because it has to scan nodes to access objects.

Here is the ListAccessTest.java source code

 

package com.performance.util;

// This class shows the benchmarks of List types for accessing objects at end, middle and first  

import java.util.List;

import java.util.ArrayList;

import java.util.LinkedList;

import java.util.Vector;

import java.util.Arrays;

public class ListAccessTest {

             private static final int NUM = 25000;

             private static Object[] objs = null;

public void getFromLast(List list) {

             long startTime = System.currentTimeMillis();

             for(int i=NUM;i>0;i--){

                          list.get(i-1);

             }

              long endTime = System.currentTimeMillis();

             System.out.println("Time taken for getting Objects at Last: "

                                                                        + (endTime - startTime) );

             }

public void getFromFirst(List list) {

             long startTime = System.currentTimeMillis();

             for(int i=0;i<NUM;i++){

                          list.get(0);

             }

             long endTime = System.currentTimeMillis();

             System.out.println("Time taken for getting Objects at First : "

                                                                        + (endTime - startTime) );

             }

public void getFromMiddle(List list) {

             long startTime = System.currentTimeMillis();

             for(int i=0;i<NUM;i++){

                          list.get(NUM/2);

             }

             long endTime = System.currentTimeMillis();

             System.out.println("Time taken for getting Objects at Middle : "

                                                                        + (endTime - startTime) );

             }

public void doTest(List list) {

                          list.addAll(getList());

                          getFromLast(list);

                          getFromMiddle(list);

                          getFromFirst(list);

             }

public void clear(List col){

                          if(!col.isEmpty())

                                          col.clear();

}

public static List getList(){

             objs = new Object[NUM];

             for(int i=0;i<NUM;i++){

                          objs[i] = new Object();

             }

              return Arrays.asList(objs);       

}

public static void main(String[] args){

             ListAccessTest col = new ListAccessTest();

             ArrayList collection1 = new ArrayList();

             col.doTest(collection1);

             Vector collection2 = new Vector();

             col.doTest(collection2);

             LinkedList collection4 = new LinkedList();

             col.doTest(collection4);

             }

}

 

Iterating collection:

Iterating collection using all three types of classes ,ArrayList ,Vector and LinkedList gives similar performance because they need not do any extra work  they simply iterate one by one. So I did not give any benchmark results. You can use any Iterator . But using ListIterator gives more flexibility than Iterator and Enumeration. You can traverse both sides

 

Optimization techniques in Sets

Set is a collection of unique objects, it doesn't allow duplicate objects and modification of existing objects. Set types also allow basic operations like adding objects, removing objects, accessing objects, iterating objects but do not allow modification. There are two implementations of the Set interface they are HashSet and TreeSet. 

HashSet gives better performance than TreeSet because , TreeSet is an ordered collection of objects and the objects are sorted while they are inserted in the TreeSet where as in case of HashSet objects are added in an adhoc manner. It is expensive to do all basic operations in TreeSet because it has to compare and sort every object. We can get better performance by using a HashSet and converting  it to a TreeSet later on.

HashSet and TreeSet are backed by HashMap and TreeMap respectively. Whenever we use a HashSet we can specify an initial capacity and load factor using constructors. The default size for a HashSet is 11 and it's load factor is 0.75. Load factor determines at which capacity HashSet has to be resized. It's internal structure will become double in size when it reaches it's maximum capacity based on load factor. HashSet scales well when it is initialized with proper size and default load factor. When you know the the number of objects to be added, it is better to initialize with that capacity and put load factor as 1.0f. The objects in HashSet are stored and retrieved through hash code which provides constant look up time.  I did not give any bench marks for these two Sets because We don't have many options here to compare and evaluate. We have two Sets to choose. Use TreeSet if you want sorted collection otherwise use HashSet.

The constructors for the HashSet to initialize with proper size are:

HashSet(int initialcapacity)

HashSet(int initialcapacity, float loadfactor)

 

Optimization techniques in Maps

Map is a collection of key and value object associations. You can do all basic operations as in Lists and Sets such as adding , removing ,and accessing key-value pairs. There are four types of Map implementations they are HashMap, Hashtable, WeakHashMap and TreeMap.

HashMap, Hashtable and WeakHashMap have similar implementations. TreeMap is meant for sorted collection, this can be filtered down based on the requirement. Then we have other three types to choose from. The choice  again depends upon your requirement. Your requirement could be

1. Thread safe

2. Type of operation ( basic operations )

HashMap and  WeakHashMap are not synchronized whereas Hashtable is synchronized.

WeakHashMap is a special purpose map which uses an internal hashtable. When there are no more references to key object except weak reference maintained by WeakHashMap, the garbage collector reclaims the key object and mapping between key object and value object is also reclaimed, if the value object does not have any other references then the value object is also reclaimed by the garbage collector.

If you want your Map type collection to be thread safe, then you need to use Hashtable otherwise use HashMap. HashMap gives better performance than Hashtable because of it's non-synchronized methods. The reason I did not give any bench marks for Map types is that it is pretty straight forward to choose Map type depending on requirement .

You can improve performance by using proper initial size and load factor in the constructor for all types of Map types except TreeMap.

The constructors are

HashMap(int initialcapacity)

HashMap(int initialcapacity, float loadfactor)

Hashtable(int initialcapacity)

Hashtable(int initialcapacity, float loadfactor)

WeakHashMap(int initialcapacity)

WeakHashMap(int initialcapacity, float loadfactor)

When the number of objects exceed loadfactor capacity, then the capacity of the class increases to (2*capacity + 1). The default load factor is 0.75.   

All these classes work in accordance with hash values which are used to identify the value objects. The key objects are converted to integer called hash code by using hashing algorithms which is used as an index for value objects. Hash code must be same for two equal objects, i.e. when tested with the equals() method,it must return true.  The hash code is determined by using hashCode() method. The hash code of a collection is determined by cumulating all the hash codes of the associated objects.

 

Key Points

    Lists:
  1. Use ArrayList with proper initialization if you don't want thread safe for the collection whenever you  add/remove/access objects at end and middle of collection.
  2. Use Vector with proper initialization if you want thread safe for the collection whenever you  add/remove/access objects at end and middle of collection.
  3. Use LinkedList if you don't want thread safe for the collection whenever you  add/remove/access objects at beginning of collection.
  4. Use synchronized LinkedList if you want thread safe for the collection whenever you add/remove/access objects at beginning of collection.
  5. Use ListIterator than Iterator and Enumeration for List types

Sets:

  1. Use HashSet for maintaining unique objects if you don't want thread safe for the collection for all basic(add/remove/access) operations otherwise use synchronized HashSet for thread safe.
  2. Use TreeSet for ordered and sorted set of unique objects for non-thread safe collection otherwise use synchronized TreeSet for thread safe

Maps:

  1. Use HashMap for non-thread safe map collection otherwise use Hashtable for thread safe collection.
  2. Use TreeMap for non-thread safe ordered map collection otherwise use synchronized TreeMap for thread safe.

 

 

Feed back

We appreciate and welcome your comments on this section. Email commentsZZZ@precisejavaZZZ.com (remove ZZZ which is placed to prevent spam). Please note that we may not be able to reply to all the emails due to huge number of emails that we receive but we appreciate your comments and feedback.

 





Copyright © 2001-2005, Ravi Kalidindi and Rohini Kalidindi. All rights reserved.