Distributed Computing: When One Computer Won't Do

[From the last episode: We looked at how the “concurrency” of multiple threads on a single CPUStands for "central processing unit." Basically, it's a microprocessor - the main one in the computer. Things get a bit more complicated because, these days, there may be more than one microprocessor. But you can safely think of all of them together as the CPU. was actually illusory – but still useful.]

Last time we talked about concurrency, by which we mean multiple threads or programs being executed at the same time. Which, as we saw, can’t really happen in a literal sense if you have only one CPU, since a CPU can literally do only one thing at a time.

So… how might we be able to do more than one thing at a time if we want to? Well, one obvious way is to use more than one computer. And this is probably a good time to make an important distinction when we’re talking about more than one “thing,” and we’ll use a personal-computer example to illustrate the concepts.

Different Program; Different Computer

Let’s say that you’ve got a monstrous Excel spreadsheet on an old computer. And when it has to calculate, it goes off and disappears for a good long while. (This isn’t too far-fetched: I have a spreadsheet that’s complex enough that, even on a modernish machine, inserting a column can result in Excel “disappearing” for five minutes or so to sort everything out.)

Because it’s an old computer doing only one thing at a time, the risk is that this Excel stuff will take over the computer, making it unavailable for anything else until it finishes the calculation. But you want to work on a document while that’s happening – and you can’t do that because the Excel computation has locked everything out temporarily.

Well, one obvious solution would be to use a different machine for the document work. One machine is running Excel; the other is running a word-processing program. And they’re literally happening at the same time – because each gets its own CPU.

That’s pretty obvious. (Even if not convenient.) And, even though it’s an example of two things happening concurrently, it’s not really what we’re going to focus on. What is important for us is when a single program (or process) does more than one thing at a time. For example, what if we could use that second computer to help the Excel calculation finish faster?

That’s a harder problem. If you’re running two different programs on the two machinesIn our context, a machine is anything that isn't human (or living). That includes electronic equipment like computers and phones., then neither of the programs knows or cares about what’s happening on the second computer. But splitting a single program across the two computers? That takes some careful planning due to dependenciesIn a computer program, refers to the situation where one piece of the program depends on the result of some other part of the program. If we calculate 1+1, then there’s no dependency. If we calculate a+b, then our result depends on whatever prior calculations produce a and b..

Dependencies in Excel

And Excel is probably the best illustration of dependencies that we could get. Many of the spreadsheet cells will have formulas that refer to other cells. If cell B3 uses the contents of cell A3, then B3 is dependent on A3. Excel keeps track of all of those dependencies so that it can calculate efficiently. For instance, it would be dumb for it to try to calculate B3 before A3 was ready. You could try, starting with B3 and then realizing you need A3 first, and then park the B3 calculation temporarily until A3 is ready. But it’s better if you know to start with A3.

You can trace all of these dependencies yourself in Excel. Pick a cell with a formula in it and then go to the Formulas ribbon. In the “Formula Auditing” section, you can “Trace Precedents” and “Trace Dependents.” In our example, if you selected B3 and traced precedents, it would show an arrow between the cells. Same thing if you selected A3 and traced dependents. (You have to “clear arrows” between these steps.)

The next figure shows this. In the example, A3=Today(), a built-in Excel function. B3 shows a week from today: B3=A3+7. So before B3 can calculate, Excel has to calculate A3, which means getting today’s date. Whether you select A3 and trace dependents or select B3 and trace precedents, you get the same thing: an arrow that points from the first thing that needs calculating to the thing that relies on it – in this case, an arrow from A3 to B3.

How to Split this Up

Let’s bring this back to our main topic. Let’s say we wanted to use two computers for this. That probably wouldn’t really help, since we can’t do the two calculations at the same time. We need A3 to be complete before doing B3, so they have to go one after the other. That’s an important point: sometimes things simply have to go in order. But there may be other calculations that are completely independent of this one, so perhaps we could do the A3/B3 thing and that other thing at the same time.

In order for that to happen, something has to act as a “dispatcher.” That dispatcher would say, “OK, we’re going to do the chain of calculations that starts with A3, and I’m going to assign that to this computer. At the same time, I’m going to start that other independent chain of calculations, and I’m going to assign it to the other computer. So to do that, I’m going to send those numbers to the other computer so that it can do its work and return the answer to me.”

And that last bit is a key thing: there has to be a way of communicating between the two computers. Yeah, that could be over the network or WiFiA common type of wireless network used to connect computers and phones to each other and the internet., but that would be pretty slow. Where we expect serious work – like in the cloudA generic phrase referring to large numbers of computers located somewhere far away and accessed over the internet. For the IoT, computing may be local, done in the same system or building, or in the cloud, with data shipped up to the cloud and then the result shipped back down. – companies build what are called computer farmsA collection of computers that are interconnected so that they can share and distribute work. For our purposes, it’s the same as a data center, but the focus is on the collection of computers. or data centersA collection of computers that are interconnected so that they can share and distribute work. For our purposes, it’s the same as a computer farm, but the focus is on its application for processing data. – racks of computers interconnected with high-speed networksA collection of items like computers, printers, phones, and other electronic items that are connected together by switches and routers. A network allows the connected devices to talk to each other electronically. The internet is an example of an extremely large network. Your home network, if you have one, is an example of a small local network. like PCI or ones that are even more specialized.

This process of taking a problem and breaking it up, with different computers solving different parts of the problem, is referred to as distributed computingA way of breaking up calculations and letting more than one computer work on different pieces at the same time in order to speed up the solution.. We take a problem and distribute the solution across a bunch of computers. And it’s important – especially historically – but there’s a more effective option these days that we’ll talk about next.

Dependencies Can Break your Brain!

By the way, this whole dependency thing can get really complex. We looked at an obvious example where you can’t calculate one thing until something else is ready. There’s also kind of a reverse situation, where you, say, calculate A3 – which B3 will use – but then, after some time, A3 will change to a new value for some reason. But you can’t really do that until B3 has finished using the old value of A3. So, in the first case, B3 had to wait for A3 to be ready; in this second case, the new value of A3 has to wait until B3 is finished using the old value. And there are more kinds of situations than this – it’s kind of a mind-bending topic, and brilliant engineers can get tripped up over it. So, if you have to read this a couple times for it to make sense, well, you’re in really good company.

Distributed Computing

Different Program; Different Computer

Dependencies in Excel

How to Split this Up

Dependencies Can Break your Brain!

Bryon Moyer, Technology Writer

Get In Touch

Additional Resources

About Bryon Moyer

Different Program; Different Computer

Dependencies in Excel

How to Split this Up

Dependencies Can Break your Brain!

Reader Interactions

Leave a Reply Cancel reply

Bryon Moyer, Technology Writer

Get In Touch

Additional Resources

About Bryon Moyer