|
In any multithreaded system that includes memory shared among threads, there needs to be a specification (called a memory model) that states exactly how memory accesses from various threads interact. The simplest such model is called sequential consistency. In this model, memory is independent of any of the processors (threads) that use it. The memory is connected to each of the threads by a controller that feeds read and write requests from each thread. The reads and writes from a single thread reach memory in exactly the order specified by the thread, but they might be interleaved with reads and writes from other threads in an unspecified way.
Clearly, the ability to arbitrarily move memory accesses anywhere at all would lead to chaos, so all practical memory models have the following three fundamental rules:
1. The behavior of the thread when run in isolation is not changed. Typically, this means that a read or a write from a given thread to a given location cannot pass a write from the same thread to the same location. 2. Reads cannot move before entering a lock. 3. Writes cannot move after exiting a lock.
This is one more place where the locking protocol really adds value. The protocol ensures that every access to thread-shared, read/write memory only occurs when holding the associated lock. When a thread exits the lock, the third rule ensures that any writes made while the lock was held are visible to all processors. Before the memory is accessed by another thread, the reading thread will enter a lock and the second rule ensures that the reads happen logically after the lock was taken. While the lock is held, no other thread will access the memory protected by the lock, so the first rule ensures that during that time the program will behave as a sequential program.
The result is that programs that follow the locking protocol have the same behavior on any memory model having these three rules. This is an extremely valuable property. It is hard enough to write correct concurrent programs without having to think about the ways the compiler or memory system can rearrange reads and writes. Programmers who follow the locking protocol don't need to think about any of this. Once you deviate from the locking protocol, however, you must specify and consider what transformations hardware or a compiler might do to reads and writes.
Strong Model 1: x86 Behavior Unfortunately, programs that correctly follow the locking protocol are more the exception than the rule. When multiprocessor systems based on, the x86architecture were being designed, the designers needed a memory model that would make most programs just work, while still allowing the hardware to be reasonably efficient. The resulting specification requires writes from a single processor to remain in order with respect to other writes, but does not constrain reads at all.
Unfortunately, a guarantee about write order means nothing if reads are unconstrained ,after all, it does not matter that A is written before B if every reader reading B followed by A has reads reordered so that the pre-update value of B and the post-update value of A is seen. The end result is the same write order seems reversed. Thus, as specified, the x86model does not provide any stronger guarantees than the ECMA model.
It is my belief, however, that the x86processor actually implements a slightly different memory model than is documented. While this model has never failed to correctly predict behavior in my experiments, and it is consistent with what is publicly known about how the hardware works, it is not in the official specification. New processors might break it. In this model, in addition to the three fundamental memory model rules, these rules also come into play.
1. A write can only move later in time. 2. A write cannot move past another write from the same thread. 3. A write cannot move past a read from the same thread to the same location. 4. A read can only move by going later in time to stay after a write to keep from breaking rule 3 as that write moves later in time.
This model effectively captures a system with a write buffer queue with read snooping. Writes to memory are not written to memory immediately, but instead are placed in the queue and kept in order. These writes may be delayed and thus move with respect to other reads (and reads/writes from other threads), but will keep their order with respect to other writes from the same thread. Reads effectively don't move except to allow snooping of the write queue. Every read snoop the write buffer to see if the processor recently wrote the value being read, and uses the value in the write buffer if it is found. Since the write in the write buffer does not logically happen until the entry is flushed to main memory, this read effectively gets pushed later to that time as well. Rule 4 specifically allows this action.
Strong Model 2: NET Framework 2.0 Although the ECMA memory model for the NET Framework specifies only weak guarantees, the only implementation of the NET Framework 1.x runtime was on x86 machines, which meant the NET runtime implemented a model much closer to the x86 model (it was not quite this model, however, because of just-in-time or JIT compiler optimizations). In version 2.0 of the runtime, support for weak memory model machines like those using Intel IA-64 processors creates a problem. Users of the NET Framework could easily have been relying on the strong memory model provided by the x86 implementation and their code would break (fail none deterministically) when run on platforms like the IA-64. Microsoft concluded that customers would be best served by a stronger memory model that would make most code that worked on x86 machines function correctly on other platforms as well. The x86model, however, was too strong. It does not allow reads to move much at all, and optimizing compilers really need the flexibility of rearranging reads to do its job well. Another memory model was needed, and the result is the NET Framework 2.0 runtime memory model. The rules for this model are:
1. All the rules that are contained in the ECMA model, in particular the three fundamental memory model rules as well as the ECMA rules for volatile. 2. Reads and writes cannot be introduced. 3. A read can only be removed if it is adjacent to another read to the same location from the same thread. A write can only be removed if it is adjacent to another write to the same location from the same thread. Rule 5 can be used to make reads or writes adjacent before applying this rule. 4. Writes cannot move past other writes from the same thread. 5. Reads can only move earlier in time, but never past a write to the same memory location from the same thread.
Reasoning about multithreaded programming is already hard enough. Even when using the locking protocol, you need to worry about how updates from multiple threads will interleave, but at least when using locks the granularity can be as fine or as coarse as you need..Without locks, the level of granularity is that of a single memory read or write, which increases the number of permutations dramatically. On top of this, in any memory model weaker than sequential consistency (which is every practical one), reads and writes can move with respect to one another, increasing the set of possible permutations even more dramatically. This is why so few low-lock techniques are actually correct, and why the problems can be so subtle
|