閱讀253 返回首頁    go 阿裏雲 go 技術社區[雲棲]


CPU Cache Flushing Fallacy

Even from highly experienced technologists I often hear talk about how certain operations cause a CPU cache to “flush”.  This seems to be illustrating a very common fallacy about how CPU caches work, and how the cache sub-system interacts with the execution cores.  In this article I will attempt to explain the function CPU caches fulfil, and how the cores, which execute our programs of instructions, interact with them.  For a concrete example I will dive into one of the latest Intel x86 server CPUs.  Other CPUs use similar techniques to achieve the same ends.

Most modern systems that execute our programs are shared-memory multi-processor systems in design.  A shared-memory system has a single memory resource that is accessed by 2 or more independent CPU cores.  Latency to main memory is highly variable from 10s to 100s of nanoseconds.  Within 100ns it is possible for a 3.0GHz CPU to process up to 1200 instructions.  Each Sandy Bridge core is capable of retiring up to 4 instructions-per-cycle (IPC) in parallel.  CPUs employ cache sub-systems to hide this latency and allow them to exercise their huge capacity to process instructions.  Some of these caches are small, very fast, and local to each core; others are slower, larger, and shared across cores.  Together with registers and main-memory, these caches make up our non-persistent memory hierarchy.

Next time you are developing an important algorithm, try pondering that a cache-miss is a lost opportunity to have executed ~500 CPU instructions!  This is for a single-socket system, on a multi-socket system you can effectively double the lost opportunity as memory requests cross socket interconnects.


文章轉自 並發編程網-ifeve.com

最後更新:2017-05-23 10:31:57

  上一篇:go  跟著實例學習ZooKeeper的用法: 隊列
  下一篇:go  通過Axon和Disruptor處理1M tps