Putting checkpoints to work in thread level speculative execution
MetadataShow full item record
With the advent of Chip Multi Processors (CMPs), improving performance relies on the programmers/compilers to expose thread level parallelism to the underlying hardware. Unfortunately, this is a difficult and error-prone process for the programmers, while state of the art compiler techniques are unable to provide significant benefits for many classes of applications. An interesting alternative is offered by systems that support Thread Level Speculation (TLS), which relieve the programmer and compiler from checking for thread dependencies and instead use the hardware to enforce them. Unfortunately, data misspeculation results in a high cost since all the intermediate results have to be discarded and threads have to roll back to the beginning of the speculative task. For this reason intermediate checkpointing of the state of the TLS threads has been proposed. When the violation does occur, we now have to roll back to a checkpoint before the violating instruction and not to the start of the task. However, previous work omits study of the microarchitectural details and implementation issues that are essential for effective checkpointing. Further, checkpoints have only been proposed and evaluated for a narrow class of benchmarks. This thesis studies checkpoints on a state of the art TLS system running a variety of benchmarks. The mechanisms required for checkpointing and the costs associated are described. Hardware modifications required for making checkpointed execution efficient in time and power are proposed and evaluated. Further, the need for accurately identifying suitable points for placing checkpoints is established. Various techniques for identifying these points are analysed in terms of both effectiveness and viability. This includes an extensive evaluation of data dependence prediction techniques. The results show that checkpointing thread level speculative execution results in consistent power savings, and for many benchmarks leads to speedups as well.