Binary Search: Useful Variants

algorithm

Binary search is one of the most fundamental algorithms in computer science, and often one of the earliest taught algorithms in University (depending on major, of course). Alone, binary search lets us quickly determine the index of some value in a sorted dataset. If the value doesn’t exist, ideally it will return some sentinel value indicative of that. So in short:

In: (sortedArray, value)
Out: position | -1

A classic implementation binary search in C++ is as follows:

We set up our left and right bound index variables, and while they are not inverted, we keep zeroing in on the element we’re trying to find. Note, for an input array with no elements, the left and right variables are inverted by default, and thus the while loop will not be entered.

Limitations & Useful Variants

Even the return value of binary search is quite binary. We either get the index of the value we’re searching for, or something telling us it wasn’t there. Furthermore, if the dataset contains duplicates, we get an index of a random occurrence of the one we’re searching for. Often this is fine, but in a lot of competitive programming questions it would be nice to handle duplicates in a way that is more useful. Perhaps getting information like the index of:

The first occurrence of a value
The last occurrence of a value

… or in the case where the value we’re searching for doesn’t exist (DNE), the index of the:

Closest value
Next largest value
Next smallest value

… or some combination of the above sections! The rest of this article will be focused on building useful variants of the classic binary search, and discussing some of their underlying logic and intuition.

First Occurrence

Binary search typically stops immediately when arr[mid] is the value we’re looking for. In this variant, we want to modify the logic hit when we find an occurrence to not immediately return the index of mid, but instead begin searching for earlier occurrences that may appear in the range [left, mid]. If we’re interested in the range [left, mid] (inclusive) after we find one occurrence, we can just set right = mid to continue searching. Setting right = mid (instead of right = mid - 1) includes the occurrence we just found when searching our “lower half”, so in the following possible cases:

All values in the range [left, mid - 1] happen to be less than arr[mid], we don’t lose the occurrence we found
There is an earlier occurrence in the range [left, mid - 1], we’ll find it with the rest of our logic

It’s important to know when to stop searching though; for example, how do we know when we’ve found the earliest occurrence of a value? Well, when mid == 1eft (after we’ve found an occurrence), there is no earlier range to search through because our “lower half” is effectively gone. So at this point, arr[mid] must be the earliest occurrence, and we’ll return it!

Last Occurrence

We’re going to use a similar tactic as we did above to find the last occurrence of a value appearing in a list potential duplicates, by bringing the lower boundary index variable up to mid (inclusive) when we’ve found an occurrence, we give the second half of the list a fair chance of producing another occurrence. In the last example, once we’ve found the value we’re looking for, we check to see if we have room in the lower half to check for earlier occurrences with:

if (left != mid) { //...

… and set right = mid if that condition is true.

For this variant, it’s tempting to use if (mid != right) to see if we have room in the second half of our list to check for later occurrences, and then set left = mid if so, but consider the array [1, 1]. Our index variables are initialized as left = 0, right = 1, and then mid will be set to 0. Here it’s true that mid != right, which may indicate that we have room in our second half, but if we set left = mid, in this case, we’re never changing our bounds, and we enter an infinite loop. This is because the only time mid == right is when all index variables are pointing to the same element, making the check mid != right not indicative of a viable second half.

Instead we should be checking left != mid again. If left != mid, we can set left = mid safely and search our second half, but if left == mid our boundaries consist of either one or two elements. We know the occurrence we’re looking for is definitely at index left (also mid), and that the only later place it could appear is right. If the list bounds contain:

Two elements at this point, we’ll give preference to right and see if it is a later occurrence
One element at this point, giving preference to right is the same as returning left or mid

So if left == mid, we can essentially perform return (arr[right] == value) ? right : left.

Closest Value

Finding the index of the closest value if the value we’re searching for DNE is actually fairly simple. Consider the case in which the value we are attempting to find DNE. Here we’ll never be returning from within our while loop, and will eventually be thrown outside of it. We know that once we’re thrown outside of the loop, our left and right index variables must be inverted. This can happen in one of the following ways:

All values in the array are greater than what we’re looking for, in which right = -1, left = 0
All values in the array are lower than what we’re looking for, in which left = arr.size(), right = arr.size() - 1
We just couldn’t find it somewhere in the middle; left and right are both safe indices, just inverted

The first two are easy to check for and save us from returning an invalid index, since only left or right can be valid and thus safe to return. The final case means the value we are searching for would appear between indices left and right, leaving us needing to test which value is closest to the one we’re looking for.

Next Largest & Next Smallest

I’ll leave these as an exercise to the reader, since they’re simple variants of the already-simple Closest Value variant. To see some of these algorithms in action, check out the Searching and Sorting folder of my domfarolino/algorithms repository on GitHub :).

Written on August 15, 2018