Simultaneous multithreading

Simultaneous multithreading, acronym SMT, is a technique for improving the overall efficiency of superscalar CPUs with Hardware multithreading. SMT permits multiple independent threads of execution to better utilize the resources provided by modern computer architectures.

Multithreading is similar in concept to multitasking but is implemented at the thread level of execution in modern superscalar processors.

In processor design, there are two ways to increase on-chip parallelism with less resource requirements:

  1. Superscalar technique: which tries to increase Instruction level parallelism (ILP) by executing multiple instructions at the same time (termed: simultaneously); by "simultaneously" dispatching instructions (termed: instruction dispatching) to multiple redundant execution units built inside the processor.
  2. Chip-level multithreading (CMT) technique: using Thread level parallelism (TLP) in order to executes instructions from multiple threads within one processor chip at the same time.

There are many ways to support more than one thread inside a chip, namely:

  1. Interleaved multithreading (IMT) : Interleaved issue of multiple instructions from different threads, also referred to as Temporal multithreading. It can be further divided into fine-grain multithreading or coarse-grain multithreading depending on the frequency of interleaved issues. Fine-grain multithreading issues instructions for different threads after every cycle, while coarse-grain multithreading only switches to issue instructions from another thread when the current executing thread causes some long latency events (like page fault etc.). Coarse-grain multithreading is more common for less context switch between threads. For processors with one pipeline per core, interleaved multithreading is the only possible way, because it can only issue up to one instruction per cycle.
  2. Simultaneous multithreading (SMT): Issue multiple instructions from multiple threads in one cycle. The processor must be superscalar to do so.
  3. Chip-level multiprocessing (CMP or Multi-core processor): integrates two or more superscalar processors into one chip, each executes threads independently.
  4. Any combination of IMT/SMT/CMP

The key factor to distinguish them is to look at how many instructions the processor can issue in one cycle and how many threads from which the instructions come.

Examples of modern SMT CPUs

  1. The Intel Pentium 4 was the first modern desktop processor to implement simultaneous multithreading, starting from the 3.06GHz model released in 2002, and since introduced into a number of their processors. Intel calls the functionality Hyper-Threading Technology (HTT), and provides a basic two-thread SMT engine. Intel claims up to a 30% speed improvement compared against an otherwise identical, non-SMT Pentium 4.
  2. The latest MIPS architecture designs include an SMT system known as "MIPS MT".
  3. The IBM POWER5, announced in May 2004, comes as either a dual core DCM, or quad-core or 8-core MCM, with each core including a two-thread SMT engine. IBM's implementation is more sophisticated than the previous ones, because it can assign a different priority to the various threads, is more fine-grained, and the SMT engine can be turned on and off dynamically, to better execute those workloads where an SMT processor would not increase performance. This is IBM's second implementation of generally available hardware multithreading.
  4. The Intel Atom, released in 2008, is the first Intel product to feature SMT (marketed as Hyper-threading) without supporting instruction reordering, speculative execution, or register renaming.

Related pages

References

  • LE Shar and ES Davidson, "A Multiminiprocessor System Implemented through Pipelining", Computer Feb 1974
  • D.M. Tullsen, S.J. Eggers, and H.M. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism," In 22nd Annual International Symposium on Computer Architecture, June, 1995
  • D.M. Tullsen, S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, and R.L. Stamm, "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor," In 23rd Annual International Symposium on Computer Architecture, May, 1996

Other websites