Cache on the Barrelhead: Speeding Up Memory Access

[From the last episode: We looked at subroutinesA small portion of a program that’s set aside and given a name. It helps isolate key code in an easy-to-understand way, and it makes changes easier and less error prone. It’s a really important way to make programs easier to understand. May also be called a function. in computer programs.]

We saw a couple weeks ago that some memories are big, but slow (flash memoryA type of memory that can retain its contents even with the power off. Thumb drives use flash memory.). Others are fast, but not so big – and they’re energy-hungry to boot (SRAMStands for "static random access memory." This is also temporary memory in a computer. It's very fast compared to other kinds of memory, but it's also very expensive and burns a lot of energy, so you don't have nearly so much.). This sets up an interesting problem.

When the processor needs to get to something stored in slow memory, does that just slow the whole machineIn our context, a machine is anything that isn't human (or living). That includes electronic equipment like computers and phones. down? Well, it could – if it had to go back to slow memory every time. But a number of strategies have been devised to minimize this speed penalty. They all involve some form of caching.

Caching the Program

We already saw that we have to store a program has in flash so that it stays around when the powerThe rate of energy consumption. For electricity, it’s measured in watts (W). is off. In order to run more quickly than flash allows, the instructions can be cached in – that is, copied out into – faster memory. Now, when the processor needs to get to it, it has faster access.

With larger processors, however, the program might be stored in DRAMStands for "dynamic random access memory." This is temporary working memory in a computer. When the goes off, the memory contents are lost. It's not super fast, but it's very cheap, so there's lots of it. instead of flash. On your laptop, for instance, when you start a program, it loads it from the hard driveA type of persistent (non-volatile) memory built from rotating platters and “read heads” that sense the data on the platters. and stores it in DRAM. But that DRAM is still pretty slow, so you’d want some way to get at it faster – especially since the bigger processors are also usually intended to go really fast.

So here’s where things get interesting – and complicated (most of which we’ll steer clear of). You really want the instructions in SRAM, which is the fastest memory. But there’s not much of it, so you may not be able simply to take a big chunk of the program and clog SRAM up with it.

Branching and Subroutines

The other problem is that, as we’ve seen, programs don’t just execute in a straight line from beginning to end. We already saw the issue of branching back in the discussion of speculation. But, as we saw last week, programs aren’t just one big thing: they tend to get broken up into a small program that calls a bunch of subroutines or functionsA small portion of a program that’s set aside and given a name. It helps isolate key code in an easy-to-understand way, and it makes changes easier and less error prone. It’s a really important way to make programs easier to understand. May also be called a subroutine., and each of those calls other subroutines or functions. And so on. One rule of thumb is that an entire (sub)routine should fit on a single screen. (I’m sure not everyone agrees on that.) That makes it easier to follow. If it’s too big, then break it up somehow and put some of it into subroutines.

But here’s the thing with subroutines, as we saw last week: they take the branching issue to a whole new level. As subroutines get called, execution bounces all around the memory. It’s nothing close to a straight line. So just copying a big chunk of the program into cacheA place to park data coming from slow memory so that you can access it quickly. It's built with SRAM memory, but it's organized so that you can figure out which parts haven't been used in a while so that they can be replaced with new cached items. Also, if you make a change to cached data, that change has to make its way back into the original slow storage. wouldn’t work; you probably would not get all of the subroutines in there.

Reading and Caching Instructions

So, instead, computer designers invented a special way of organizing memory; it’s called – surprise – a cache, and it’s built out of SRAM, but organized differently from a simple block of SRAM. Getting something from memory then looks something like this:

Ask for the data (which might be an instruction)
It checks to see if it’s in the cache already; if it is, sweet! Go grab it and off you go.
If it’s not in the cache, then go get a line out of memory (a line being a bunch of memory cells around the one you want – useful if you’re going to need instructions following the one you are asking for). Put that line into the cache, and now you have fast access for next time (or for the following instructions that came in with that line).
What if the cache is already full of lines from before? Well, use some kind of gauge – like which one has been there the longest, or which one has gone the longest without being used. Evict that line and replace it with the new one.

Writing Data

This is pretty straightforward for instructions, since you’re only reading them. But data may be stored somewhere as well. You don’t change instructions, but you may change data. If you’ve pulled data out of slow memory and put it in the cache, and then you need to change the data, what happens then?

There are a number of strategies for dealing with this – part of the complication we won’t get into – but the idea is that you make the change in cache, and then, while the processor is off doing other things, that change gets written back into the original memory. That way the processor doesn’t have to wait around for a long write into memory.

You can get an idea of how cache works in the following video.

It’s actually possible to have multiple levels of cache, but we’ll put that discussion off to another day. As we’ll see, the kinds of processorsA computer chip that does computing work for a computer. It may do general work (like in your home computer) or it may do specialized work (like some of the processors in your smartphone). used in simple IoTThe Internet of Things. A broad term covering many different applications where "things" are interconnected through the internet. devices may run relatively slowly, and may therefore not need cache at all. Processors in the cloudA generic phrase referring to large numbers of computers located somewhere far away and accessed over the internet. For the IoT, computing may be local, done in the same system or building, or in the cloud, with data shipped up to the cloud and then the result shipped back down., by contrast, need all the cache they can get.

Cache on the Barrelhead

Caching the Program

Branching and Subroutines

Reading and Caching Instructions

Writing Data

Bryon Moyer, Technology Writer

Get In Touch

Additional Resources

About Bryon Moyer

Caching the Program

Branching and Subroutines

Reading and Caching Instructions

Writing Data

Reader Interactions

Leave a Reply Cancel reply

Bryon Moyer, Technology Writer

Get In Touch

Additional Resources

About Bryon Moyer