Article

过去问-计算机组成原理-2022

拆解东京大学 2022 年计组过去问中关于存储层次、CPI、二级缓存与页表规模的关键思路。

April 24, 2026 修考 16 min read

Problem 2

Suppose a computer comprising memory hierarchy presented in Fig.

(1) Describe the characteristics of the cache and the characteristics of the auxiliary memory (secondary storage) in comparison with the main memory. Present a specific example of a memory device utilized for the cache and a specific example of a memory device utilized for the auxiliary memory.

(2) Suppose that the computer presented in Fig. achieved a CPI (Cycles Per Instruction) of 1.50 in executing a program A which induced no memory stalls, while the computer yielded an instruction cache miss rate of 1 % and a data cache miss rate of 4 % in executing another program B. In either case, no inputs or outputs to the auxiliary memory were observed, all the instructions were executed serially, and the branch prediction was not performed. Assume that 60 % of all the instructions of the program B were load-store instructions, and a penalty of 100 cycles was incurred for a cache miss. Find the CPI of the computer in executing the program B.

(3) Suppose that, in the case of (2), after secondary cache was installed between the cache and the main memory, the rate of accessing the main memory for the instruction fetch and the data load-store was reduced to 0.5 % in executing the program B. In this case, no inputs or outputs to the auxiliary memory were observed, all the instructions were executed serially, and the branch prediction was not performed. Assume that an access to the secondary cache incurred 10 cycles. Find the speedup ratio that the secondary cache achieved for the execution of the program B.

(4) A controller of virtual memory usually performs a write to the auxiliary memory in the write-back policy. Describe the adequacy of employing the write-back policy for the controller in terms of the execution speed of computers and the persistence of data.

(5) Suppose virtual memory where the virtual address length is 32 bits, the page size is 4096 bytes, and the page entry length is 4 bytes. Assume that the page table is structured with a single array. Find the size of the page table necessary for executing a program which invokes 100 concurrent processes.

(6) Briefly describe two effective approaches to reduce the size of the page table.

image-20260424082844591


反思

  1. 英语的题目阅读对我有难度,后面不能够使用中文学习和解题了。
  2. 题目多来自于教材的练习题和一些大段落的文字。

1

computer memory hierarchy pyramid,AI 生成

In a computer’s memory hierarchy, the main memory (typically DRAM) acts as the central hub. Both cache and auxiliary memory serve to optimize this system but reside on opposite ends of the performance and capacity spectrum.

1. Characteristics Comparison

FeatureCache Memory (vs. Main Memory)Auxiliary Memory (vs. Main Memory)
SpeedFaster: Designed to match the processor’s speed, reducing wait times.Slower: Significantly slower access times (often by orders of magnitude).
CapacitySmaller: Limited by high cost and physical space on or near the CPU.Larger: Provides massive storage for files and programs not currently in use.
CostHigher: Much more expensive per bit of storage.Lower: The most cost-effective way to store large volumes of data.
VolatilityVolatile: Loses data when power is removed.Non-volatile: Retains data permanently without power.
PlacementIntegrated into the CPU chip or placed very close to it.Connected via I/O interfaces (SATA, NVMe, USB).

2. Specific Examples of Memory Devices

Cache Memory: SRAM (Static Random Access Memory)

Cache is almost exclusively implemented using SRAM.

  • Why: SRAM does not need to be “refreshed” like DRAM (main memory), allowing for near-instantaneous access. It uses a flip-flop circuit (typically 6 transistors per bit) to store data, which makes it very fast but physically bulky and expensive, explaining why cache sizes are relatively small (KB to MB).

Auxiliary Memory: SSD (Solid State Drive) or HDD (Hard Disk Drive)

Auxiliary memory is commonly represented by an SSD (using NAND Flash memory) or an HDD (using magnetic platters).

  • Why: These devices are non-volatile and can store Terabytes (TB) of data at a fraction of the cost of RAM. While main memory holds the data the CPU is currently processing, the auxiliary memory acts as long-term storage for the operating system, applications, and user files.

2-3

To find the CPI and the speedup ratio, we need to calculate the total cycles per instruction by adding the base execution cycles to the stall cycles caused by cache misses.

(2) Finding the CPI for Program B

The total CPI is calculated using the following formula:

CPItotal=CPIbase+Memory Stalls per Instruction\text{CPI}_{\text{total}} = \text{CPI}_{\text{base}} + \text{Memory Stalls per Instruction}

1. Determine the Base CPI (CPIbase\text{CPI}_{\text{base}}):

The problem states that for Program A, which induced no memory stalls, the CPI was 1.50. Since we are using the same computer for Program B, the base execution cycles (excluding stalls) remains the same:

CPIbase=1.50\text{CPI}_{\text{base}} = 1.50

2. Calculate Memory Stall Cycles for Program B:

Memory stalls occur for both Instruction Cache misses and Data Cache misses.

  • Instruction Stall: Every instruction requires a fetch.

    Inst Stall=Miss Rateinst×Miss Penalty=0.01×100=1.0 cycles/inst\text{Inst Stall} = \text{Miss Rate}_{\text{inst}} \times \text{Miss Penalty} = 0.01 \times 100 = 1.0 \text{ cycles/inst}

  • Data Stall: Only load-store instructions (60% of total) trigger data cache accesses.

    Data Stall=PercentageLS×Miss Ratedata×Miss Penalty\text{Data Stall} = \text{Percentage}_{\text{LS}} \times \text{Miss Rate}_{\text{data}} \times \text{Miss Penalty}

    Data Stall=0.60×0.04×100=2.4 cycles/inst\text{Data Stall} = 0.60 \times 0.04 \times 100 = 2.4 \text{ cycles/inst}

3. Total CPI for Program B:

CPIB=1.50+1.0+2.4=4.90\text{CPI}_B = 1.50 + 1.0 + 2.4 = \mathbf{4.90}


(3) Finding the Speedup Ratio with Secondary Cache

这道题许多ai的结果都不一样

因为对一句话有误区

“the rate of accessing the main memory for the instruction fetch and the data load-store was reduced to 0.5 %

看完完整题目后,我非常有信心地告诉你:ChatGPT 的理解偏离了题目在体系结构上的考查重点,我的逻辑更准确。

这道题是典型的计算机系统结构(Computer Architecture)*考题,考点在于理解“取指(Instruction Fetch)”和“数据访问(Data Access)”是两个*独立且并行发生的内存请求。

为什么 ChatGPT 在第 (3) 问错了?

ChatGPT 将主存停顿简化为 0.005×1000.005 \times 100,这实际上假设了主存访问概率是按“指令总数”来算的。但题目原文是:

“…the rate of accessing the main memory for the instruction fetch and the data load-store was reduced to 0.5%…”

在体系结构语境下,这意味着:

  1. Instruction Access (100% 的指令): 它的主存访问率从 1%1\% 降到了 0.5%0.5\%
  2. Data Access (60% 的指令): 它的主存访问率从 4%4\% 降到了 0.5%0.5\%

如果你把它们合并,就忽略了 Load/Store 指令其实会造成两次潜在的内存访问(一次取指,一次读写数据)。


(3) 的精确解析与判断

我们将内存停顿拆解为 L2 访问造成的停顿主存(Main Memory)造成的停顿

第一步:计算 L2 访问带来的额外 Stall

无论最终是否要去主存,只要 L1 没中(Miss),就必须访问一次 L2,支付 10 cycles。

  • 指令端 L1 Miss 率1%=0.011\% = 0.01
  • 数据端 L1 Miss 率60%×4%=0.6×0.04=0.02460\% \times 4\% = 0.6 \times 0.04 = 0.024
  • L2 总 Stall = (0.01+0.024)×10=0.34 cycles/inst(0.01 + 0.024) \times 10 = 0.34 \text{ cycles/inst}

第二步:计算访问主存带来的额外 Stall

根据题目,现在的 Global Miss Rate(即最终去主存的比例)对于取指访存各自都是 0.5%0.5\%

  • 指令端主存访问1×0.5%=0.0051 \times 0.5\% = 0.005
  • 数据端主存访问0.6×0.5%=0.0030.6 \times 0.5\% = 0.003
  • 主存总 Stall = (0.005+0.003)×100=0.8 cycles/inst(0.005 + 0.003) \times 100 = 0.8 \text{ cycles/inst}

第三步:计算新 CPI 和加速比

  • CPInewCPI_{new} = CPIideal+StallL2+StallMemCPI_{ideal} + \text{Stall}_{L2} + \text{Stall}_{Mem}
  • CPInewCPI_{new} = 1.50+0.34+0.8=2.641.50 + 0.34 + 0.8 = 2.64
  • Speedup = CPIoldCPInew=4.902.641.856\frac{CPI_{old}}{CPI_{new}} = \frac{4.90}{2.64} \approx \mathbf{1.856}

为什么我的判断更稳?

  1. 逻辑一致性:在第 (2) 问中,大家一致认同数据缺失要乘以 0.60.6。那么在第 (3) 问计算主存访问时,逻辑必须保持一致——数据访问的主存缺失同样要乘以 0.60.6。ChatGPT 在这一步突然“跳槽”到了总概率逻辑,这在严谨的考题解析中是不成立的。
  2. 上下文衔接:题目第 (5) (6) 问考查虚拟内存和页表,这类题目非常强调“地址转换(Address Translation)”的次数。Load/Store 指令需要两次转换(指令+数据),这印证了该题背景下,指令和数据访存是必须分开独立核算的。

给你的 Flashcard 总结建议

你可以直接用这个逻辑来记:

Stall = (Access Frequency × Miss Rate × Penalty)

  • 如果有 L2,就分两层累加。
  • Instruction 层:频率恒为 11
  • Data 层:频率为 Load-Store 指令占比(如 0.60.6)。
  • 两层之和才是 Total Stall。

所以,最终答案应该是:CPI = 4.90Speedup ≈ 1.86