17 Radix Sort¶
In this chapter, you’ll look at a completely different model of sorting. So far, you’ve been relying on comparisons to determine the sorting order. Radix sort is a non-comparative algorithm for sorting integers. The word radix means base, as in the base of a number system. For example, decimal is base 10 and binary is base 2. You can use any base with a radix sort, but to keep things simple, this chapter will focus on sorting base-10 integers.
Radix sort relies on the position of digits within a number. For example, there are four digits in the number 1772. The least significant digit is 2 since this is in the ones place. The most significant digit is 1 since this is the thousands-place value, greater than any other place value in this particular number. The following diagram shows the place values of a four-digit number:
1thousands7hundreds7tens2ones
There are multiple implementations of radix sort that focus on different problems. One type sorts by the least significant digit (LSD) and another by the most significant digit (MSD).
You’ll learn about both LSD- and MSD-radix sorting in this chapter.
Sorting by Least Significant Digit¶
Before you implement a radix sort in code, go through the following example to get an intuitive idea of how the sort works.
Example¶
An LSD-radix sort starts by looking at the least significant digits in a list of numbers. For example, the image below shows the numbers 88, 410, 1772 and 20vertically aligned with a box drawn around the ones place:
80028127471
You’ve got one 8 from the final digit of 88, a 0 from the end of 410, a 2 from the last digit of 1772 and another 0 from the least significant digit of 20. A radix sort goes through multiple rounds of sorting, but these final digits are what you’ll use to sort the numbers in the first round.
Round One¶
You start by making ten buckets to sort the numbers into. You can think of this like sorting fruit. You put the apples in the apple bucket, the oranges in the oranges bucket, and the pears in the pear bucket.
Note
You use ten buckets because you’re working with decimal numbers. If you were using binary numbers, then you would use two buckets. Or if you were using hexadecimal numbers, then 16 buckets.
With an LSD-radix sort, you put the numbers that end in 0 in the zero bucket, the numbers that end in 1 in the ones bucket, the numbers that end in 2 in the twos bucket, and so on up to the nines bucket. This is known as a bucket sort.
0123456789
In this case, since 88 ends in 8, you put it in the eights bucket. Likewise, 410 goes in the zeros bucket since it ends with 0, and 1772 goes in the twos bucket. Since 20 also ends in 0, it goes in the zeros bucket along with 410. The buckets maintain insertion order so 20 is after 410 in the zeros bucket.
020410177288123456789
Note
The radix sort algorithm described here is stable since the order within the buckets is maintained. Even though 410 and 20 are different numbers, the 0 digit used for this round of the sort is the same.
If you extract all of the numbers out of the buckets in order now, you have the list 410, 20, 1772, 88:
00821428771
This finishes round one.
Round Two¶
For round two, you look at the next significant digit, the tens-place value:
00824128771
Again, you make ten buckets for the ten digits of base ten:
0123456789
Sort the numbers starting at the beginning of the list. That means 410 is first. Since the second digit of 410 is 1, it goes in the ones bucket. Likewise, 20 goes in the twos bucket, 1772 goes in the sevens bucket, and 88 goes in the eights bucket:
020881234567894101772
This time it turns out they are all in different buckets. All that sorting you did in round one was for nothing, but you didn’t know that at the time.
Combining the numbers in order, you still have 410, 20, 1772, 88. The order didn’t change.
00824128771
Time for round three now.
Round Three¶
This time look at the hundreds place. Since 20 and 88 are less than 100, you can use 0s for the hundreds place:
0082128700471
Again, make ten buckets and add the numbers in order according to their hundreds-place digit. Although 20 and 88 are both in the zeros bucket (020 and 088), 20 was first in the list, so it also maintains the first position in the bucket.
012345678941088201772
Combine the numbers in order from the buckets and you have the new order of 20, 88, 410, 1772:
0820287107041
The list is already sorted in this case, but radix sort doesn’t know that yet. There’s still one more round to go.
Round Four¶
In this algorithm of radix sort, there’s one round for every significant digit. Since 1722 has four digits, this fourth round will ensure that the thousands-place value is also sorted. Pad any numbers less than one thousand with zeros in front:
0820287100700401
Make your ten buckets again. 20, 88 and 410 all go in the zeros bucket since they’re less than 1000. However, within the bucket, they maintain their previous order:
012345678917728820410
Finally, you can combine the numbers from the buckets in order, and they’re indeed sorted as 20, 88, 410, and 1772:
08202871741
The entire list was sorted without any comparisons!
Implementation¶
To implement LSD-radix sort, you’ll use a while
loop for each round as you step through the place values. Your ten buckets will be ten integer lists.
Open up the starter project for this chapter. Create a folder named lib in the root of the project. Then create a new file named radix_sort.dart in lib.
Add the following to the file:
extension RadixSort on List<int> {
void radixSort() {
const base = 10;
var done = false;
// 1
var place = 1;
while (!done) {
done = true;
// 2
final buckets = List.generate(base, (_) => <int>[]);
forEach((number) {
// 3
final remainingPart = number ~/ place;
final digit = remainingPart % base;
// 4
buckets[digit].add(number);
if (remainingPart ~/ base > 0) {
done = false;
}
});
// 5
place *= base;
clear();
addAll(buckets.expand((element) => element));
}
}
}
Here are the main points:
- Loop through each place value, where
place
is first1
, then10
, then100
, and so on through the largest place value in the list. - Create your ten buckets. The type for
buckets
isList<List<int>>
. - Find the significant digit of the current
number
. - Put
number
in the appropriate bucket. - Take the numbers from the buckets in their new order and put them back in the original list. Since
buckets
is a list of lists,expand
helps you flatten them back into a single-dimensional list.
Testing it Out¶
With that, you’ve implemented your first non-comparative sorting algorithm. Head back to bin/start.dart and replace the contents of the file with the following code:
import 'package:starter/radix_sort.dart';
void main() {
final list = [88, 410, 1772, 20];
print("Original list: $list");
list.radixSort();
print("Radix sorted: $list");
}
Run that and you should see the following console output:
Original: [88, 410, 1772, 20]
Radix sorted: [20, 88, 410, 1772]
Radix sort is one of the fastest sorting algorithms. The average time complexity of this LSD-radix sort is O(k × n), where k is the number of significant digits in the largest number, and n is the number of integers in the list.
Sorting by Most Significant Digit¶
The MSD-radix sort uses the most significant digit to sort a list. You might think that’s a little strange when you look at a list of numbers sorted in this way:
36909514509431459
What, 459 comes after 1345? Yes, it does. The reason is that the most significant digit of 459 is 4, which comes after 1, the most significant digit of 1345.
Although MSD sorting might look strange for numbers, it makes a lot more sense when you’re sorting words:
daodptedaneodleeooz
You don’t want the short words sorted before the long words. You want them sorted in alphabetical order, or if you use the more technical term, lexicographical order.
Note
Some implementations of lexicographical ordering do sort shorter elements before longer ones, regardless of the elements’ initial values. This chapter will not.
A key point about words, at least English words, is that they’re a sequence of letters. And when you have a sequence of letters, you can treat them like a sequence of digits. That means you can use the MSD-radix sort for lexicographical ordering of a list of words. For the sake of simplicity, though, the following example will use numbers to show you how MSD-radix sort works.
Example¶
Start with the following unsorted list:
[46, 500, 459, 1345, 13, 999]
Beginning the Sort¶
As with LSD-radix sort, you first create ten buckets. However, this time you add the numbers to the buckets according to their most significant digit, that is, the first digit on their left:
012345678913134550099945946
If the values all had the same number of digits, you could proceed as you did with the LSD-radix sort, looping round after round. However, the values don’t have the same number of digits. Imagine a list with mostly small numbers but a few really large numbers. It would be inefficient to keep looping over all of the short numbers. That kind of thing can easily happen with lists of strings, for example.
So, instead of doing a full bucket sort for every digit, you’ll do a recursive bucket sort. That means if any buckets in a particular round have more than one number, you’ll recursively perform another bucket sort on that bucket.
Recursively Sorting 1345 and 13¶
In the previous image, the ones bucket and the fours bucket both have more than one element. Start with the ones bucket, which contains 1345 and 13. Take those numbers and perform another bucket sort, this time on the second most significant digit:
0123456789131345
1345 and 13 are still together because the second digit for both of them is 3.
Perform another bucket sort on the next digit. Since the next digit of 1345 is 4, it goes in the fours bucket. The number 13 doesn’t have a third digit, so there is no place to put it. You can’t put it in the zeros bucket because then how would you distinguish 13 and 130? Instead, you’ll put it in a special priority bucket. It gets sorted before any of the other buckets.
0123456789priority131345
There’s nothing more to sort in this recursive branch, so pop back out a few levels, returning the sorted list [13, 1345]
.
Recursively Sorting 46 and 459¶
Now you need to do the same thing with 46 and 459. Perform a bucket sort on the second digit. That gives the following:
012345678945946
Since none of the buckets have more than one item, you can stop this branch of the recursion here and return the sorted sublist [459, 46]
.
Combining All the Buckets¶
There were no other buckets with more than one number from the original round, so you’re finished. Combine all of the buckets and sorted sublists into the final sorted result of [13, 1345, 459, 46, 500, 999]
:
36909514509431459
Hopefully you have a better intuitive grasp of how MSD-radix sort works now. You’ll implement this algorithm in the next section.
Implementation¶
To start off, you’ll write a few helper methods.
Getting the Number of Digits¶
Open lib/radix_sort.dart again and add the following int
extension to the file:
extension Digits on int {
static const _base = 10;
int digits() {
var count = 0;
var number = this;
while (number != 0) {
count += 1;
number ~/= _base;
}
return count;
}
}
This helper method tells you how many digits are in a particular integer. Since there isn’t a length
property on the int
type, you count how many times you have to divide by 10
before you get 0
.
Go back to bin/starter.dart and run the following in main
to try out a few examples:
print(13.digits()); // 2
print(999.digits()); // 3
print(1345.digits()); // 4
Finding the Digit at Some Position¶
Add the following import to lib/radix_sort.dart:
import 'dart:math';
Then add the following additional method to the Digits
extension on int
:
int? digitAt(int position) {
if (position >= digits()) {
return null;
}
var number = this;
while (number ~/ pow(_base, position + 1) != 0) {
number ~/= _base;
}
return number % _base;
}
digitAt
returns the digit at a given position. Like with lists, the leftmost position is zero. Thus, the digit for position zero of the value 1345 is 1. The digit for position three is 5. Since there are only four digits, the digit for position five will return null
.
The implementation of digitAt
works by repeatedly chopping a digit off the end of the number until the requested digit is at the end. It’s then extracted using the remainder operator.
Test your new int
extension out in bin/starter.dart with the following code:
print(1345.digitAt(0)); // 1
print(1345.digitAt(1)); // 3
print(1345.digitAt(2)); // 4
print(1345.digitAt(3)); // 5
print(1345.digitAt(4)); // null
print(1345.digitAt(5)); // null
Finding the Max Digits in the List¶
Go back to lib/radix_sort.dart and add a new List
extension with the following method:
extension MsdRadixSort on List<int> {
int maxDigits() {
if (isEmpty) return 0;
return reduce(max).digits();
}
// more to come
}
Given some list of numbers, maxDigits
will tell you the number of digits in the largest number.
Test in out in main
like so:
final list = [46, 500, 459, 1345, 13, 999];
print(list.maxDigits()); // 4
The result is 4 since the largest number, 1345, has four digits.
Now you’re ready to write the actual sort implementation.
Adding the Recursive Sort Methods¶
Go back to the MsdRadixSort
extension on List
in lib/radix_sort.dart. You’re going to add two more methods to this extension. One will be public and the other a private recursive helper method.
First, add the public lexicographicalSort
extension method to MsdRadixSort
:
void lexicographicalSort() {
final sorted = _msdRadixSorted(this, 0);
clear();
addAll(sorted);
}
// more to come
This method wraps a currently unimplemented recursive helper method that does the real work of sorting the list.
Next add that helper method after lexicographicalSort
:
// 1
List<int> _msdRadixSorted(List<int> list, int position) {
// 2
if (list.length < 2 || position >= list.maxDigits()) {
return list;
}
// 3
final buckets = List.generate(10, (_) => <int>[]);
// 4
var priorityBucket = <int>[];
// more to come
}
Here’s where the real work starts happening:
- The
list
that you pass to this recursive method will be the full list on the first round (whenposition
is0
), but after that, it’ll be the smaller bucket lists. If writing private MSD recursive methods was on your bucket list, you can cross it off now. - As with all recursive operations, you need to set a terminating condition that stops the recursion. Recursion should halt if there’s only one element in the list or if you’ve exceeded the max number of digits.
- Similar to the LSD-radix sort, you instantiate a two-dimensional list for the buckets.
- The
priorityBucket
is a special bucket that stores values with fewer digits than the current position. Values that go inpriorityBucket
will be ordered first.
Continue by adding the following for
loop at the bottom of _msdRadixSorted
:
for (var number in list) {
final digit = number.digitAt(position);
if (digit == null) {
priorityBucket.add(number);
continue;
}
buckets[digit].add(number);
}
// more to come
For every number in the list, you find the digit at the current position and use it to place the number in the appropriate bucket.
Finally, finish off _msdRadixSorted
by adding the following code at the bottom of the method:
// 1
final bucketOrder = buckets.reduce((result, bucket) {
if (bucket.isEmpty) return result;
// 2
final sorted = _msdRadixSorted(bucket, position + 1);
return result..addAll(sorted);
});
// 3
return priorityBucket..addAll(bucketOrder);
This bit of code is perhaps the hardest to understand because it includes both an anonymous function and a recursive method. It’s not so terrible when you grasp the parts, though:
- The higher-order function
reduce
takes a collection and reduces it to a single value. Since your collection here is a list of lists,reduce
will give you a single list. The values in that list will be numbers in the order that they came from the buckets.reduce
works by keeping a runningresult
list which you can add to based on the currentbucket
that you’re iterating to. Ifreduce
and its anonymous function are too confusing, you could rewrite them as afor
loop. - For every non-empty bucket that you come to, recursively sort that bucket at the next digit position.
- Everything in the priority bucket goes first, but then add any other values that were in the other buckets to the end of the list.
That was a bit more involved than LSD-radix sort, wasn’t it? You’re done now, though, so it’s time to try it out!
Testing it Out¶
Open bin/starter.dart again and run the following code in main
:
final list = [46, 500, 459, 1345, 13, 999];
list.lexicographicalSort();
print(list);
You should see the output below in the console:
[13, 1345, 459, 46, 500, 999]
And that’s what you want if you’re asking for the lexicographical order!
Challenges¶
The challenges below will help strengthen your understanding of radix sorts. You can find the answers in the Challenge Solutions section or in the supplementary materials that accompany this book.
Challenge 1: What Are in the Buckets?¶
Add a print
statement to your radixSort
implementation so that it’ll tell you what’s in the buckets after each round of sorting.
Do the same for each recursion of lexicographicalSort
.
Use the following list for both sorts:
var list = [46, 500, 459, 1345, 13, 999];
Challenge 2: Unique Characters¶
Write a function that returns the total number of unique characters used in a list of words.
You can use the following list:
final words = ['done', 'ad', 'eel', 'zoo', 'adept', 'do'];
If you had a bucket for each unique character, how many buckets would you need?
Challenge 3: Optimization¶
Given the following list:
[88, 410, 1772, 20, 123456789876543210]
Your current implementation of radixSort
would take 18 rounds, 14 of which are completely unnecessary. How could you optimize radix sort for cases where a single number is much larger than the others.
Key Points¶
- Unlike the other sorting algorithms you’ve implemented in previous chapters, radix sort doesn’t rely on comparing two values. It leverages bucket sort, which is a way to sort numbers by their digits.
- The word radix means base, as in base-10 or base-2 numbering systems. The internal bucket sort will use one bucket for each base.
- Radix sort can be one of the fastest sorting algorithms for sorting values with positional notation.
- A least-significant-digit (LSD) radix sort begins sorting with the right-most digit.
- Another way to implement radix sort is the most-significant-digit (MSD) form. This form sorts by prioritizing the left-most digits over the lesser ones to the right. It’s best illustrated by the sorting behavior of the
String
type.