Tuesday, October 21, 2008

Weak references

Coming from a low level background, I have always thought that memory management (allocating and freeing memory), pointers and related concepts are quite natural. The fact that you should always free what have allocated is quite deeply rooted in my programmer's soul.

Therefore, when starting to learn Java it was quite hard for me to stop thinking about this and instead trust the garbage collector to do the work for me.

However, the more I learned and the more I read about Java I started to realise that you cannot completely forget about memory management. In fact, in some situations you have to think quite hard about it!

This combined with the - for me - quite disturbing fact that most texts on Java go to great lengths to avoid even touching on the concept of "real memory" and pointers can easily lead to some confusion. The texts instead insist of talking about objects and references to objects, since that is the chosen abstraction level of the language and the only thing a Java programmer should be concerned about.

For the most part, references work really well and are indeed uncomplicated to use, but instead of managing memory you now have to manage references. This is because a reference is a kind of a "smart pointer" which keeps track of how many references there are to a given block of memory (an object) and the memory block isn't freed until the number of references reaches zero.

If you "forget" a reference somewhere, the reference counter will never reach zero and the associated memory won't be freed. A memory leak has just occured!

A typical example of where it is easy to forget a reference is in a hash map. If the hash map isn't the primary "storage point" of the object but just a way to associate the object with some key (an id for example) it is very easy to forget to remove the reference from the map once the object isn't needed anymore. As long as the map has a reference to the object, the object's reference counter won't reach zero and the object's memory won't be freed.

To make things worse, each entry in an map has two references; one to the key object and one to the value object. Thus, there are two references with the risk of being forgotten.

In Java it is especially easy to forget to remove these references, since the language often handles it for you (for example when you assign a new value to a refernce) and you aren't supposed to think too hard about it. In low level programming where you manage memory manually you think about this all the time and it's therefore less likely that you forget a reference somewhere.

The example below illustrates this problem by creating a hash map and then entering an endless loop which creates a string object (value) and an id object (key) for the string and then adds the {id, string} entry to the map.

The main storage point of the string in this example is the local reference 'str'. Until 'str' is overwritten in the next iteration of the loop, there exists two references to the string; 'str' and the map entry's ref. When 'str' is overwritten, only one reference exists. However, the reference count never reaches zero and thus, sooner or later we will get an OutOfMemoryError exception. The id object's reference 'key' work exactly the same way.


import java.util.Map;
import java.util.HashMap;

/**
* key value
* 0 "string for id 0"
* 1 "string for id 1"
* ...
*/
public class Hash {
public static void main(String[] args) {
Map map = new HashMap();
long id = 0;
while ( true ) {
Long key = new Long(id); // Long(id) object's ref count = 1
String str = "string for id " + id; // "string..." object's ref count = 1
map.put(key, str); // Long(id) ref. cnt. = 2, "string..." ref count = 2
id++;
}
}
}


To solve this, you have to explicitly remove the reference from the map by calling map.remove(key) or clearing the whole map with map.clear(). The first alternative is demonstrated in the slightly modified version of the program below.


import java.util.Map;
import java.util.HashMap;

/**
* key value
* 0 "string for id 0"
* 1 "string for id 1"
* ...
*/
public class HashWithRemove {
public static void main(String[] args) {
Map map = new HashMap();
long id = 0;
while ( true ) {
Long key = new Long(id); // ref count = 1
String str = "string for id " + id; // ref count = 1
map.put(key, str); // ref counts = 2
id++;
map.remove(key); // ref counts = 1
}
}
}


This manual management of references is a bit un-Java like, so therefore there exists an alternative solution which uses so called weak references (WeakReference)and a special hash map which uses weak references internally (WeakHashMap).

Simply put, a weak reference is a way to keep a reference to an object, while saying that you don't really care if the object is still available when you "need" it (try to access it) sometime in the future. Clearly, you should not use weak references for objects that you actually do need in the future. For such objects, you should still use normal strong references.

By simply exchanging the HashMap with a WeakHashMap you don't have to worry about removing the reference from the map anymore, and you won't run out of memory any longer.


import java.util.Map;
import java.util.WeakHashMap;

/**
* key value
* 0 "string for id 0"
* 1 "string for id 1"
* ...
*/
public class WeakHash {
public static void main(String[] args) {
Map map = new WeakHashMap();
long id = 0;
while ( true ) {
Long key = new Long(id); // strong reference
String str = "string for id " + id; // strong reference
map.put(key, str); // weak references
id++;
}
}
}


What happens in the WeakHashMap version is that the strong references 'str' and 'key' are removed, while the weak references in the map are kept. Instead of running out of memory the garbage collector will automatically remove the weak references to the objects and thus bring the reference counts down to zero and freeing the objects' memory.

This means that the entry for the object in the map can suddenly "disappear". If you try to retrieve the object from the map using its key the map.get(key) method will simply return null.

The program below demonstrates this "disappearance" by continuously checking the value for the key 0 and printing a message when it no longer can get it from the map.


import java.util.Map;
import java.util.WeakHashMap;

/**
* key value
* 0 "string for id 0"
* 1 "string for id 1"
*/
public class WeakHashCheck {
public static void main(String[] args) {
Map map = new WeakHashMap();
long id = 0;
boolean isCheck = true;
while ( true ) {
Long key = new Long(id);
String str = "string for id " + id;
map.put(key, str);
if ( isCheck && map.get(new Long(0)) == null ) {
isCheck = false;
System.out.println("*** key 0 no longer available in map; current id=" + id);
}
id++;
}
}
}


For situations like these, when you want to keep a map of objects alongside with the primary storage of the objects, weak references in general and WeakHashMap in particular provide a very elegant solution to an otherwise quite messy memory management problem.

For my own part, after discovering this, I have a new tool in my Java toolbox which I hopefully will find some use for in the future.