253 阿裏雲技術社區[雲棲]

CPU Cache Flushing Fallacy

Even from highly experienced technologists I often hear talk about how certain operations cause a CPU cache to “flush”. This seems to be illustrating a very common fallacy about how CPU caches work, and how the cache sub-system interacts with the execution cores. In this article I will attempt to explain the function CPU caches fulfil, and how the cores, which execute our programs of instructions, interact with them. For a concrete example I will dive into one of the latest Intel x86 server CPUs. Other CPUs use similar techniques to achieve the same ends.

Most modern systems that execute our programs are shared-memory multi-processor systems in design. A shared-memory system has a single memory resource that is accessed by 2 or more independent CPU cores. Latency to main memory is highly variable from 10s to 100s of nanoseconds. Within 100ns it is possible for a 3.0GHz CPU to process up to 1200 instructions. Each Sandy Bridge core is capable of retiring up to 4 instructions-per-cycle (IPC) in parallel. CPUs employ cache sub-systems to hide this latency and allow them to exercise their huge capacity to process instructions. Some of these caches are small, very fast, and local to each core; others are slower, larger, and shared across cores. Together with registers and main-memory, these caches make up our non-persistent memory hierarchy.

Next time you are developing an important algorithm, try pondering that a cache-miss is a lost opportunity to have executed ~500 CPU instructions! This is for a single-socket system, on a multi-socket system you can effectively double the lost opportunity as memory requests cross socket interconnects.

文章轉自並發編程網-ifeve.com

最後更新：2017-05-23 10:31:57

CPU Cache Flushing Fallacy

上一篇：跟著實例學習ZooKeeper的用法：隊列

下一篇：通過Axon和Disruptor處理1M tps

相關內容

熱門內容

最新內容

CPU Cache Flushing Fallacy

上一篇： 跟著實例學習ZooKeeper的用法： 隊列

下一篇： 通過Axon和Disruptor處理1M tps

相關內容

熱門內容

最新內容

上一篇：跟著實例學習ZooKeeper的用法：隊列

下一篇：通過Axon和Disruptor處理1M tps