- Title: Idempotent Processor Architecture
- Authors: Marc de Kruijf, Karthikeyan Sankaralingam
- Venue: MICRO 2011
- Keywords: Microarchitecture Design, In-Order Execution, Performance Optimization, Antidependence, Pipeline
Problem to be solved
In-order retirement is the dependence of many features on modern processors, such as speculative execution. However, the order of instruction completion may be different from the order of transmission on modern processors. Modern processors have some complex and high-cost designs such as reorder buffer and speculative register files to support in-order retirement. This paper tried to simplify these designs without affecting the correctness of program execution.
The key observation of this paper was that: we can construct some code regions (a sequence of instructions) which can be executed many times and result in being executed once. For example, the RAW and WAR after RAW dependence chain can be idempotent. The author also designed the idempotent processor which can out-of-order retire instructions in idempotent regions when exceptions are encountered.
This paper introduced idempotent architecture, which allows out-of-order retires and uses idempotent regions to easily recover from exceptions on in-order processors. The design includes how to construct idempotent regions, how the idempotent compiler works, and the design of the idempotent processor.
The compiler will check data dependency to find a special kind of dependence called clobber antidependence (which is a WAR dependence not proceeded by a RAW dependence). Then idempotent regions are around those clobber antidependences. The paper used LLVM to construct idempotent regions by removing clobber antidependences which are not strictly necessary according to program semantics. Some more details are discussed in another paper: Compiler construction of idempotent regions.
Once you have idempotence in a region, it’s safe to do out-of-order retirement in there, so it’s possible to simplify the pipeline of the processor. The paper explained how in-order retirement complicated the design of the pipeline and cache miss handling, then showed the design of an idempotent processor which is much simpler. The idempotent processor can do out-of-order retirement in idempotent regions, once all instructions in one region are retired, the processor will go on to the next region. The paper also explained some special designs of the idempotent processor, for instance, the slice data buffer (SDB), which can improve the performance when processing out-of-order issues.
The idempotent architecture can simplify the design of in-order processors. It can also improve the performance of in-order processors (for 4.4% on average and up to 25%).
- The idea is novel which saves a lot of structures to keep precise exceptions in conventional architecture.
- It not only reduces the hardware complexities but also improves the performance of in-order execution processors.
- The idempotent design can’t adapt to modern heterogeneous processors with big.LITTLE architecture, the idempotent compiler can not predict the behavior of the scheduler (maybe the compiler can flag the code as “never run on OoO cores”, but it will lose performance).
- To ensure security, the idempotence-aware compiler may split one instruction into two, which will lead to more instructions and larger binary files.
- The idempotent region sizes are limited.
- The core of the idempotent architecture is only about a half of the paper (a lot of backgrounds, although it’s not bad). I’m not sure if the papers submitted to conferences are strictly limited to no more or fewer pages.
- I hate the font of the template, it’s too thin to read. (I know that’s not a paper presentation problem but this problem slows down my reading speed much which made me uncomfortable.)
From the aspect of 2022, the background is different from this paper’s. For instance, Moore’s law reaches the end, and side-channel attacks like Meltdown and Spectre have been discovered. (This section can be also written in “weakness”, but I don’t think a paper in 2011 must have the foresight to consider side-channel attacks so it’s just okay.)
Another aspect is that giving the code the ability to control the re-execution may lead to some security problems. But it’s not fair to require an 11-year-old paper to consider side-channel attacks.
Takeaways and questions
When I read the paper, I thought this idea was novel. But this paper was actually written 10 years ago, there is not so much research built on it, why (or are there some difficulties the author didn’t imagine)?
By the way, this paper pushed me to read a bit more about precise exceptions and in-order retirement complexities.
本站基于 Creactive Commons BY-NC-SA 4.0 License 允许并欢迎您在注明来源和非商业使用前提下自由地对本文进行复制、分享或基于本文进行创作。