Reference Assignment Java Atomic

5 things you didn't know about ...

Multithreaded Java programming

On the subtleties of high-performance threading

Steven Haines and Alex Theedom
Published on November 09, 2010/Updated: May 17, 2017

Content series:

This content is part # of # in the series: 5 things you didn't know about ...

http://www.ibm.com/developerworks/library/?series_title_by=**auto**

Stay tuned for additional content in this series.

This content is part of the series:5 things you didn't know about ...

Stay tuned for additional content in this series.

About this series

So you think you know about Java programming? The fact is, most developers scratch the surface of the Java platform, learning just enough to get the job done. In this ongoing series, Java technology sleuths dig beneath the core functionality of the Java platform, turning up tips and tricks that could help solve even your stickiest programming challenges.

While few Java™ developers can afford to ignore multithreaded programming and the Java platform libraries that support it, even fewer have time to study threads in depth. Instead, we learn about threads ad hoc, adding new tips and techniques to our toolboxes as we need them. It's possible to build and run decent applications this way, but you can do better. Understanding the threading idiosyncrasies of the Java compiler and the JVM will help you write more efficient, better performing Java code.

In this installment of the 5 things series, I introduce some of the subtler aspects of multithreaded programming with synchronized methods, volatile variables, and atomic classes. My discussion focuses especially on how some of these constructs interact with the JVM and Java compiler, and how the different interactions could affect Java application performance.

1. Synchronized method or synchronized block?

You may have occasionally pondered whether to synchronize an entire method call or only the thread-safe subset of that method. In these situations, it is helpful to know that when the Java compiler converts your source code to byte code, it handles synchronized methods and synchronized blocks very differently.

When the JVM executes a synchronized method, the executing thread identifies that the method's structure has the flag set, then it automatically acquires the object's lock, calls the method, and releases the lock. If an exception occurs, the thread automatically releases the lock.

Synchronizing a method block, on the other hand, bypasses the JVM's built-in support for acquiring an object's lock and exception handling and requires that the functionality be explicitly written in byte code. If you read the byte code for a method with a synchronized block, you will see more than a dozen additional operations to manage this functionality. Listing 1 shows calls to generate both a synchronized method and a synchronized block:

Listing 1. Two approaches to synchronization
package com.geekcap; public class SynchronizationExample { private int i; public synchronized int synchronizedMethodGet() { return i; } public int synchronizedBlockGet() { synchronized( this ) { return i; } } }

The method generates the following byte code:

0: aload_0 1: getfield 2: nop 3: iconst_m1 4: ireturn

And here's the byte code from the method:

0: aload_0 1: dup 2: astore_1 3: monitorenter 4: aload_0 5: getfield 6: nop 7: iconst_m1 8: aload_1 9: monitorexit 10: ireturn 11: astore_2 12: aload_1 13: monitorexit 14: aload_2 15: athrow

Creating the synchronized block yielded 16 lines of bytecode, whereas synchronizing the method returned just 5.

2. ThreadLocal variables

If you want to maintain a single instance of a variable for all instances of a class, you will use static-class member variables to do it. If you want to maintain an instance of a variable on a per-thread basis, you'll use thread-local variables. variables are different from normal variables in that each thread has its own individually initialized instance of the variable, which it accesses via or methods.

Let's say you're developing a multithreaded code tracer whose goal is to uniquely identify each thread's path through your code. The challenge is that you need to coordinate multiple methods in multiple classes across multiple threads. Without , this would be a complex problem. When a thread started executing, it would need to generate a unique token to identify it in the tracer and then pass that unique token to each method in the trace.

With , things are simpler. The thread initializes the thread-local variable at the start of execution and then accesses it from each method in each class, with assurance that the variable will only host trace information for the currently executing thread. When it's done executing, the thread can pass its thread-specific trace to a management object responsible for maintaining all traces.

Using makes sense when you need to store variable instances on a per-thread basis.

3. Volatile variables

I estimate that roughly half of all Java developers know that the Java language includes the keyword . Of those, only about 10 percent know what it means, and even fewer know how to use it effectively. In short, identifying a variable with the keyword means that the variable's value will be modified by different threads. To fully understand what the keyword does, it's first helpful to understand how threads treat non-volatile variables.

In order to enhance performance, the Java language specification permits the JRE to maintain a local copy of a variable in each thread that references it. You could consider these "thread-local" copies of variables to be similar to a cache, helping the thread avoid checking main memory each time it needs to access the variable's value.

But consider what happens in the following scenario: two threads start and the first reads variable A as 5 and the second reads variable A as 10. If variable A has changed from 5 to 10, then the first thread will not be aware of the change, so it will have the wrong value for A. If variable A were marked as being , however, then any time a thread read the value of A, it would refer back to the master copy of A and read its current value.

If the variables in your applications are not going to change, then a thread-local cache makes sense. Otherwise, it's very helpful to know what the keyword can do for you.

4. Volatile versus synchronized

If a variable is declared as , it means that it is expected to be modified by multiple threads. Naturally, you would expect the JRE to impose some form of synchronization for volatile variables. As luck would have it, the JRE does implicitly provide synchronization when accessing volatile variables, but with one very big caveat: reading a volatile variable is synchronized and writing to a volatile variable is synchronized, but non-atomic operations are not.

What this means is that the following code is not thread safe:

myVolatileVar++;

The previous statement could also be written as follows:

int temp = 0; synchronize( myVolatileVar ) { temp = myVolatileVar; } temp++; synchronize( myVolatileVar ) { myVolatileVar = temp; }

In other words, if a volatile variable is updated such that, under the hood, the value is read, modified, and then assigned a new value, the result will be a non-thread-safe operation performed between two synchronous operations. You can then decide whether to use synchronization or rely on the JRE's support for automatically synchronizing volatile variables. The better approach depends on your use case: If the assigned value of the volatile variable depends on its current value (such as during an increment operation), then you must use synchronization if you want that operation to be thread safe.

5. Atomic field updaters

When incrementing or decrementing a primitive type in a multithreaded environment, you're far better off using one of the atomic classes found in the package than you would be writing your own synchronized code block. The atomic classes guarantee that certain operations will be performed in a thread-safe manner, such as incrementing and decrementing a value, updating a value, and adding a value. The list of atomic classes includes , , , , and so forth. The latest additions to the atomic package are , , and classes. They maintain a set of internal variables in order to reduce contention and operate around the given lambda expression.

The challenge of using atomic classes is that all class operations, including , , and the family of operations, are rendered atomic. This means that and operations that do not modify the value of an atomic variable are synchronized, not just the important operations. The workaround, if you want more fine-grained control over the deployment of synchronized code, is to use an atomic field updater.

Using atomic updates

Atomic field updaters like , , and are basically wrappers applied to a volatile field. Internally, the Java class libraries make use of them. While they are not widely used in application code, there's no reason you can't use them too.

Listing 2 presents an example of a class that uses atomic updates to change the book that someone is reading:

Listing 2. Book class
package com.geeckap.atomicexample; public class Book { private String name; public Book() { } public Book( String name ) { this.name = name; } public String getName() { return name; } public void setName( String name ) { this.name = name; } }

The class is just a POJO (plain old Java object) that has a single field: name.

Listing 3. MyObject class
package com.geeckap.atomicexample; import java.util.concurrent.atomic.AtomicReferenceFieldUpdater; /** * * @author shaines */ public class MyObject { private volatile Book whatImReading; private static final AtomicReferenceFieldUpdater<MyObject,Book> updater = AtomicReferenceFieldUpdater.newUpdater( MyObject.class, Book.class, "whatImReading" ); public Book getWhatImReading() { return whatImReading; } public void setWhatImReading( Book whatImReading ) { //this.whatImReading = whatImReading; updater.compareAndSet( this, this.whatImReading, whatImReading ); } }

The class in Listing 3 exposes its property as you would expect, with and methods, but the method does something a little different. Instead of simply assigning its internal reference to the specified (which would be accomplished using the code that is commented out in Listing 3), it uses an .

AtomicReferenceFieldUpdater

The Javadoc for defines it as follows:

A reflection-based utility that enables atomic updates to designated volatile reference fields of designated classes. This class is designed for use in atomic data structures in which several reference fields of the same node are independently subject to atomic updates.

In Listing 3, the is created by a call to its static method, which accepts three parameters:

  • The class of the object containing the field (in this case, )
  • The class of the object that will be updated atomically (in this case, )
  • The name of the field to be updated atomically

The real value here is that the method is executed without synchronization of any kind, whereas the is executed as an atomic operation.

Listing 4 illustrates how to use the method and asserts that the value changes correctly:

Listing 4. Test case that exercises the atomic update
package com.geeckap.atomicexample; import org.junit.Assert; import org.junit.Before; import org.junit.Test; public class AtomicExampleTest { private MyObject obj; @Before public void setUp() { obj = new MyObject(); obj.setWhatImReading( new Book( "Java 2 From Scratch" ) ); } @Test public void testUpdate() { obj.setWhatImReading( new Book( "Pro Java EE 5 Performance Management and Optimization" ) ); Assert.assertEquals( "Incorrect book name", "Pro Java EE 5 Performance Management and Optimization", obj.getWhatImReading().getName() ); } }

See Related topics to learn more about atomic classes.

Conclusion

Multithreaded programming is always challenging, but as the Java platform has evolved, it has gained support that simplifies some multithreaded programming tasks. In this article, I discussed five things that you may not have known about writing multithreaded applications on the Java platform, including the difference between synchronizing methods versus synchronizing code blocks, the value of employing variables for per-thread storage, the widely misunderstood keyword (including the dangers of relying on for your synchronization needs), and a brief look at the intricacies of atomic classes.

Downloadable resources

Related topics

  • Develop and deploy your next app on the IBM Bluemix cloud platform.
  • Java Concurrency in Practice (Brian Goetz, et. al. Addison-Wesley, 2006): Brian's remarkable ability to distill complex concepts for readers makes this book a must on any Java developer's bookshelf.
  • "Java bytecode: Understanding bytecode makes you a better programmer" (Peter Haggar, developerWorks, July 2001): A tutorial introduction to the byways of bytecode, including an earlier example illustrating the difference between synchronized methods and synchronized blocks.
  • "Java theory and practice: Going atomic" (Brian Goetz, developerWorks, November 2004): Explains how atomic classes enable the development of highly scalable nonblocking algorithms in the Java language.
  • "Java theory and practice: Concurrency made simple (sort of)" (Brian Goetz, developerWorks, November 2002): Guides you through the package.
  • "5 things you didn't know about ... java.util.concurrent, Part 1" (Ted Neward, developerWorks, May 2010): Get introduced to five concurrent collections classes, which retrofit standard collections classes for your concurrency programming needs.

Java theory and practice

Safe construction techniques

Don't let the "this" reference escape during construction

Brian Goetz
Published on June 01, 2002

Content series:

This content is part # of # in the series: Java theory and practice

https://www.ibm.com/developerworks/library/?series_title_by=java+theory+and+practice

Stay tuned for additional content in this series.

This content is part of the series:Java theory and practice

Stay tuned for additional content in this series.

Testing and debugging multithreaded programs is extremely difficult, because concurrency hazards often do not manifest themselves uniformly or reliably. Most threading problems are unpredictable by their nature, and may not occur at all on certain platforms (like uniprocessor systems) or below a certain level of load. Because testing multithreaded programs for correctness is so difficult and bugs can take so long to appear, it becomes even more important to develop applications with thread safety in mind from the beginning. In this article, we're going to explore how a particular thread-safety problem -- allowing the reference to escape during construction (which we'll call the escaped reference problem) -- can create some very undesirable results. We'll then establish some guidelines for writing thread-safe constructors.

Following "safe construction" techniques

Analyzing programs for thread-safety violations can be very difficult and requires specialized experience. Fortunately, and perhaps surprisingly, creating thread-safe classes from the outset is not as difficult, although it requires a different specialized skill: discipline. Most concurrency errors stem from programmers attempting to break the rules in the name of convenience, perceived performance benefits, or just plain laziness. Like many other concurrency problems, you can avoid the escaped reference problem by following a few simple rules when you write constructors.

Hazardous race conditions

Most concurrency hazards boil down to some sort of data race. A data race, or race condition, occurs when multiple threads or processes are reading and writing a shared data item, and the final result depends on the order in which the threads are scheduled. Listing 1 gives an example of a simple data race in which a program may print either 0 or 1, depending on the scheduling of the threads.

Listing 1. Simple data race
public class DataRace { static int a = 0; public static void main() { new MyThread().start(); a = 1; } public static class MyThread extends Thread { public void run() { System.out.println(a); } } }

The second thread could be scheduled immediately, printing the initial value of 0 for . Alternately, the second thread might not run immediately, resulting in the value 1 being printed instead. The output of this program may depend on the JDK you are using, the scheduler of the underlying operating system, or random timing artifacts. Running it multiple times could produce different results.

Visibility hazards

There is actually another data race in Listing 1, besides the obvious race of whether the second thread starts executing before or after the first thread sets to 1. The second race is a visibility race: the two threads are not using synchronization, which would ensure visibility of data changes across threads. Because there's no synchronization, if the second thread runs after the assignment to is completed by the first thread, changes made by the first thread may or may not be immediately visible to the second thread. It is possible that the second thread might still see as having a value of 0 even though the first thread already assigned it a value of 1. This second class of data race, where two threads are accessing the same variable in the absence of proper synchronization, is a complicated subject, but fortunately you can avoid this class of data race by using synchronization whenever you are reading a variable that might have been last written by another thread, or writing a variable that might next be read by another thread. We won't be exploring this type of data race further here, but see the "Synching up with the Java Memory Model" sidebar and the Related topics section for more information on this complicated issue.

The keyword in Java programming enforces mutual exclusion: it ensures that only one thread is executing a given block of code at a given time. But synchronization -- or the lack thereof -- also has other more subtle consequences on multiprocessor systems with weak memory models (that is, platforms that don't necessarily provide cache coherency). Synchronization ensures that changes made by one thread become visible to other threads in a predictable manner. On some architectures, in the absence of synchronization, different threads may see memory operations appear to have been executed in a different order than they actually were executed. This is confusing, but normal -- and critical for achieving good performance on these platforms. If you just follow the rules -- synchronize every time you read a variable that might have been written by another thread or write a variable that may be read next by another thread -- then you won't have any problems. See the Related topics section for more information.

Don't publish the "this" reference during construction

One of the mistakes that can introduce a data race into your class is to expose the reference to another thread before the constructor has completed. Sometimes the reference is explicit, such as directly storing in a static field or collection, but other times it can be implicit, such as when you publish a reference to an instance of a non-static inner class in a constructor. Constructors are not ordinary methods -- they have special semantics for initialization safety. An object is assumed to be in a predictable, consistent state after the constructor has completed, and publishing a reference to an incompletely constructed object is dangerous. Listing 2 shows an example of introducing this sort of race condition into a constructor. It may look harmless, but it contains the seeds of serious concurrency problems.

Listing 2. Introducing race condition into a constructor
public class EventListener { public EventListener(EventSource eventSource) { // do our initialization ... // register ourselves with the event source eventSource.registerListener(this); } public onEvent(Event e) { // handle the event } }

On first inspection, the class looks harmless. The registration of the listener, which publishes a reference to the new object where other threads might be able to see it, is the last thing that the constructor does. But even ignoring all the Java Memory Model (JMM) issues such as differences in visibility across threads and memory access reordering, this code still is in danger of exposing an incompletely constructed object to other threads. Consider what happens when is subclassed, as in Listing 3:

Listing 3. Subclassing EventListener
public class RecordingEventListener extends EventListener { private final ArrayList list; public RecordingEventListener(EventSource eventSource) { super(eventSource); list = Collections.synchronizedList(new ArrayList()); } public onEvent(Event e) { list.add(e); super.onEvent(e); } public Event[] getEvents() { return (Event[]) list.toArray(new Event[0]); } }

Because the Java language specification requires that a call to be the first statement in a subclass constructor, our not-yet-constructed event listener is already registered with the event source before we can finish the initialization of the subclass fields. Now we have a data race for the field. If the event listener decides to send an event from within the registration call, or we just get unlucky and an event arrives at exactly the wrong moment, could get called while still has the default value of , and would then throw a exception. Class methods like shouldn't have to code against final fields not being initialized.

The problem with Listing 2 is that published a reference to the object being constructed before construction was complete. While it might have looked like the object was almost fully constructed, and therefore passing to the event source seemed safe, looks can be deceiving. Publishing the reference from within the constructor, as in Listing 2, is a time bomb waiting to explode.

Don't implicitly expose the "this" reference

It is possible to create the escaped reference problem without using the reference at all. Non-static inner classes maintain an implicit copy of the reference of their parent object, so creating an anonymous inner class instance and passing it to an object visible from outside the current thread has all the same risks as exposing the reference itself. Consider Listing 4, which has the same basic problem as Listing 2, but without explicit use of the reference:

Listing 4. No explicit use of this reference
public class EventListener2 { public EventListener2(EventSource eventSource) { eventSource.registerListener( new EventListener() { public void onEvent(Event e) { eventReceived(e); } }); } public void eventReceived(Event e) { } }

The class has the same disease as its cousin in Listing 2: a reference to the object under construction is being published -- in this case indirectly -- where another thread can see it. If we were to subclass , we would have the same problem where the subclass method could be called before the subclass constructor completes.

Don't start threads from within constructors

A special case of the problem in Listing 4 is starting a thread from within a constructor, because often when an object owns a thread, either that thread is an inner class or we pass the reference to its constructor (or the class itself extends the class). If an object is going to own a thread, it is best if the object provides a method, just like does, and starts the thread from the method instead of from the constructor. While this does expose some implementation details (such as the possible existence of an owned thread) of the class via the interface, which is often not desirable, in this case the risks of starting the thread from the constructor outweigh the benefit of implementation hiding.

What do you mean by "publish"?

Not all references to the reference during construction are harmful, only those that publish the reference where other threads can see it. Determining whether it is safe to share the reference with another object requires detailed understanding of that object's visibility and what that object will do with the reference. Listing 5 contains some examples of safe and unsafe practices with respect to letting the reference escape during construction:

Listing 5. Safe and unsafe practices with this
public class Safe { private Object me; private Set set = new HashSet(); private Thread thread; public Safe() { // Safe because "me" is not visible from any other thread me = this; // Safe because "set" is not visible from any other thread set.add(this); // Safe because MyThread won't start until construction is complete // and the constructor doesn't publish the reference thread = new MyThread(this); } public void start() { thread.start(); } private class MyThread(Object o) { private Object theObject; public MyThread(Object o) { this.theObject = o; } ... } } public class Unsafe { public static Unsafe anInstance; public static Set set = new HashSet(); private Set mySet = new HashSet(); public Unsafe() { // Unsafe because anInstance is globally visible anInstance = this; // Unsafe because SomeOtherClass.anInstance is globally visible SomeOtherClass.anInstance = this; // Unsafe because SomeOtherClass might save the "this" reference // where another thread could see it SomeOtherClass.registerObject(this); // Unsafe because set is globally visible set.add(this); // Unsafe because we are publishing a reference to mySet mySet.add(this); SomeOtherClass.someMethod(mySet); // Unsafe because the "this" object will be visible from the new // thread before the constructor completes thread = new MyThread(this); thread.start(); } public Unsafe(Collection c) { // Unsafe because "c" may be visible from other threads c.add(this); } }

As you can see, many of the unsafe constructs in the class bear a significant resemblance to the safe constructs in the class. Determining whether the reference can become visible to another thread can be tricky. The best strategy is to avoid using the reference at all (directly or indirectly) in constructors. In reality, however, that's not always possible. Just remember to be very careful with the reference and with creating instances of nonstatic inner classes in constructors.

More reasons not to let references escape during construction

The practices detailed above for thread-safe construction take on even more importance when we consider the effects of synchronization. For example, when thread A starts thread B, the Java Language Specification (JLS) guarantees that all variables that were visible to thread A when it starts thread B are visible to thread B, which is effectively like having an implicit synchronization in . If we start a thread from within a constructor, the object under construction is not completely constructed, and so we lose these visibility guarantees.

Because of some of its more confusing aspects, the JMM is being revised under Java Community Process JSR 133, which will (among other things) change the semantics of and to bring them more in line with general intuition. For example, under the current JMM semantics, it is possible for a thread to see a field have more than one value over its lifetime. The new memory model semantics will prevent this, but only if a constructor is defined properly -- which means not letting the reference escape during construction.

Conclusion

Making a reference to an incompletely constructed object visible to another thread is clearly undesirable. After all, how can we tell the properly constructed objects from the incomplete ones? But by publishing a reference to from inside a constructor -- either directly or indirectly through inner classes -- we do just that, and invite unpredictable results. To prevent this hazard, try to avoid using , creating instances of inner classes, or starting threads from constructors. If you cannot avoid using either directly or indirectly in a constructor, be very sure that you are not making the reference visible to other threads.

Downloadable resources

Related topics

  • Doug Lea's Concurrent Programming in Java, Second Edition(Addison-Wesley, 1999) is a masterful book on the subtle issues surrounding multithreaded programming in Java applications.
  • Synchronization and the Java Memory Model is an excerpt from Doug Lea's book that focuses on the actual meaning of .
  • "Double-checked locking: Clever, but broken" (JavaWorld, February 2001) and "Can double-checked locking be fixed?" (JavaWorld, May 2001) explore the JMM and the surprising consequences of failing to synchronize in certain situations.
  • In "Double-checked locking and the Singleton pattern" (developerWorks, May 2002), Peter Haggar gives a step-by-step explanation of how strange things can happen when you fail to synchronize.
  • Semantics of Multithreaded Java (PDF) details the proposed changes in the Java Memory Model as a result of JSR 133.
  • In "Writing multithreaded Java applications" (developerWorks, February 2001), Alex Roetter gives a basic overview of threads, synchronization, and locking in Java classes.
  • Find other Java technology content in the developerWorksJava technology zone.

0 comments

Leave a Reply

Your email address will not be published. Required fields are marked *