You may have heard that developers need skills with algorithms. What is an algorithm? And that data structures are important. What the heck do those two words even mean? These concepts are so fundamental to programming that experienced developers often use the words without realizing they can be foreign, confusing, and abstract for someone starting out.
As Donald Knuth wrote in The Art of Computer Programming– which could be described as the encyclopedia of algorithms and data structures– it’s a confusing word that looks like someone mashed up logarithm and arithmetic.
The word’s origin is in mathematics, but in computing, its usage is a little different. The term came about to describe the Euclidean Algorithm, which is a step-by-step process that you can use to find the greatest common divisor of two numbers.
But no need to fear! When writing computer programs, it’s very seldom that you’ll need to worry about greatest common divisors, factoring, or crazy mathematicians from 300 BC.
What is an algorithm?
Instead, the word algorithm is used to describe the “step-by-step” approach where there is exactly one correct next step. In an algorithm, given the current phase of the process and the steps that are outlined, there is one single, correct way to proceed.
Let’s jump into a real-world example of an algorithm that is used very frequently. We’re going to cover the algorithm that is used to describe the concept in CS courses across the country. And that’s the Binary Search Algorithm. Sound scary? It really isn’t.
Let’s rewind 10 years and pretend that phone books are actually practical. Phone books have an interesting property: the names in the phone book are sorted alphabetically by last name. Say you want to give Justin Bieber a ring on the phone. Assuming Justin is in the phone book, finding his name could be achieved in a few different ways.
Linear Search Algorithm
The simplest way to find his number would be to look through every single entry and compare it to the name you’re looking for. For example:
James Aarner, that’s not a match.
Justin Aark, that’s not a match.
And continue the process indefinitely until you find Justin Bieber, likely after tens of thousands of comparisons. And if you wanted to have a chat with Warren Zevon, you’d likely have hundreds of thousands or even millions of comparisons since his last name starts with a Z.
Linear search is the process of going start to finish through a list and comparing values. It’s brute force. But it’s also quite simple. And there are many situations in which using this approach to find an item in a list makes sense.
Rather than searching through a full phone book, if you were searching for your friend’s phone number on a sheet of paper with only 10 other people on it, going from top to bottom would likely be a smart move.
Chunking Search Algorithm
Most humans don’t have the patience to perform a full linear search algorithm to find a name in the phone book. If I were to hand most humans a phone book, they’d take a different, more practical approach: chunking.
The process of chunking involves first finding the general area where an entry would be, then proceeding to check every entry. Say you were looking for Bill Maher in the phone book. You would start by moving 100 pages in. You might see that the last names in this area start with C. Next, you could move 200 pages further, and you might see last names starting with K. If you moved another 75 pages in, you might get to L. The letter L is only 1 letter away from M, so from there, you could start moving slower.
This process is how most reasonable people search for elements in a phone book. But we as humans often perform behaviors in suboptimal ways, and this is one such case.
Binary Search Algorithm
The most efficient way to find a person in the phone book is to metaphorically split the phone book in half, determine which half of the phone book the entry is in, quickly removing the entire other half from the equation.
Say the phone book is 400 pages. If you’re looking for Vince Offer in the phone book, and he happens to be on page 291, you could find him by using a binary search. You would split the phone book in half and likely see an M or an N, the 13th and 14th letters of the alphabet.
O is after M and N. That means we can state for a fact:
Vince Offer is located on pages [200-400].
As a human, you know that O is right after the letter N. You might be tempted to look forward a few pages. But following a binary search algorithm, you don’t. You would keep halving the problem. That means you would check page 300. You would probably end up around the letter S. The letter O is before the letter S, so now you know the following:
Vince Offer is located on pages [200-300].
Initially, we started out with 400 pages. Now, we’ve narrowed it down with just 2 comparisons to reduce the problem to ¼ its original size. Given he’s located on page 291, we’d make the following comparisons:
[200-300] → [250 – 300] → [275-300] → [287 – 300] → [287 – 293] → [290, 293] → [290, 291] → 291
This means with a 400-page phone book, you can isolate the specific page you’re looking for in about 8 comparisons. That’s an astonishingly small number of comparisons we’d have to make.
The number of comparisons you need to make when systematically halving a problem like we did here can be calculated by log2(n). We found Vince Offer in 8 comparisons.
log2(400) = 8.64385618977, which means that in the worst case rounding up, it would take 9 comparisons.
Let’s talk about a bigger phone book. Let’s go huge. Let’s take a phone book with 4 million pages. That’s, 4,000,000 pages! Take a guess. How many comparisons would you need to make to find the specific page with Vince Offer?
log2(4,000,000) = 21.9315685693, which means it would take at most 22 comparisons.
Compare finding the element using a linear approach vs. our “halving approach.”
The worst case if we just went through the dictionary from beginning to end would be if the entry happened to be on the last page. For a phone book with 4,000,000 pages, it would take 4,000,000 comparisons. However, with the halving approach, it would only take 22 comparisons.
Learning algorithms is learning how to break problems down in the manner we outlined here. It’s all about breaking things down into different steps, and in a manner where there is exactly one behavior that someone could take given a scenario.
Algorithms are all about understanding the process that was taken, and then converting that process into code. Generally speaking, deeply understanding the process is harder than converting it into working code. The above process can be performed in ruby in about 10 lines of code.
Tricks that you need in your tool belt
We didn’t invent the concept of a binary search for an element. Various computer scientists have a number of tricks that all programmers should have in their arsenal. Learning algorithms is all about mastering the art of process development and converting it to code that can be run by a computer.
As with the example above, you can see that using the right trick in the right scenario can be the difference between something taking 4,000,000 comparisons and 22.
And once you can convert a handful of the common computer science algorithms into code in any programming language, you’ll have the skills you need to conquer any code the world has to throw at you.
Other useful tricks of the trade, which you’d be unlikely to figure out on your own (and instead should learn from computer scientists) are algorithms like:
- Depth-first search
- Breadth-first search
- Writing sorting algorithms
The key to solving them is to first understand how they work intuitively, like we did when we described the phone book example, and then translate it into code.
You might be wondering where “Data Structures” fall into the mix. It turns out that in order to implement some of the algorithms that computer scientists came up with, you often need additional tools in your tool belt. To better understand a concrete data structure, let’s talk about queues.
A queue is a concrete data structure that you NEED if you’re going to implement the breadth-first search algorithm. If you don’t understand the various data structures, you’re operating as a programmer with tools missing from your metaphorical tool belt. And you will have a difficult time implementing certain algorithms.
You can find a great example of a queue simply by thinking about a typical experience at a deli. When you get in line, you pull a number and wait for your number to be called. Everyone who pulled a number before you will be given service before you, but when you’re the customer who has been in line the longest, your number will be called next.
Queues have two properties:
- Enqueue is when you “start waiting.” It’s the equivalent of pulling a number in the deli example.
- Dequeue is when it’s your turn to be served. It’s the equivalent of having your number called at the deli.
And in ruby code, you can easily implement the “deli queue” behavior by using an array and two simple methods. Queues are said to be FIFO, or “first-in-first-out.”
So in short, algorithms are the patterns and procedures used to accomplish the goal at hand. Data structures are like tools in your tool belt. You don’t necessarily need to know about them, but using the right tool for the job will make your code cleaner and easier to write. This will make you a better programmer.
Now that you understand what algorithms are and why they’re important, share this post with a friend!