跳转至

17 Radix Sort

In this chapter, you’ll look at a completely different model of sorting. So far, you’ve been relying on comparisons to determine the sorting order. Radix sort is a non-comparative algorithm for sorting integers. The word radix means base, as in the base of a number system. For example, decimal is base 10 and binary is base 2. You can use any base with a radix sort, but to keep things simple, this chapter will focus on sorting base-10 integers.

Radix sort relies on the position of digits within a number. For example, there are four digits in the number 1772. The least significant digit is 2 since this is in the ones place. The most significant digit is 1 since this is the thousands-place value, greater than any other place value in this particular number. The following diagram shows the place values of a four-digit number:

1thousands7hundreds7tens2ones

There are multiple implementations of radix sort that focus on different problems. One type sorts by the least significant digit (LSD) and another by the most significant digit (MSD).

You’ll learn about both LSD- and MSD-radix sorting in this chapter.

Sorting by Least Significant Digit

Before you implement a radix sort in code, go through the following example to get an intuitive idea of how the sort works.

Example

An LSD-radix sort starts by looking at the least significant digits in a list of numbers. For example, the image below shows the numbers 88, 410, 1772 and 20vertically aligned with a box drawn around the ones place:

80028127471

You’ve got one 8 from the final digit of 88, a 0 from the end of 410, a 2 from the last digit of 1772 and another 0 from the least significant digit of 20. A radix sort goes through multiple rounds of sorting, but these final digits are what you’ll use to sort the numbers in the first round.

Round One

You start by making ten buckets to sort the numbers into. You can think of this like sorting fruit. You put the apples in the apple bucket, the oranges in the oranges bucket, and the pears in the pear bucket.

Note

You use ten buckets because you’re working with decimal numbers. If you were using binary numbers, then you would use two buckets. Or if you were using hexadecimal numbers, then 16 buckets.

With an LSD-radix sort, you put the numbers that end in 0 in the zero bucket, the numbers that end in 1 in the ones bucket, the numbers that end in 2 in the twos bucket, and so on up to the nines bucket. This is known as a bucket sort.

0123456789

In this case, since 88 ends in 8, you put it in the eights bucket. Likewise, 410 goes in the zeros bucket since it ends with 0, and 1772 goes in the twos bucket. Since 20 also ends in 0, it goes in the zeros bucket along with 410. The buckets maintain insertion order so 20 is after 410 in the zeros bucket.

020410177288123456789

Note

The radix sort algorithm described here is stable since the order within the buckets is maintained. Even though 410 and 20 are different numbers, the 0 digit used for this round of the sort is the same.

If you extract all of the numbers out of the buckets in order now, you have the list 410, 20, 1772, 88:

00821428771

This finishes round one.

Round Two

For round two, you look at the next significant digit, the tens-place value:

00824128771

Again, you make ten buckets for the ten digits of base ten:

0123456789

Sort the numbers starting at the beginning of the list. That means 410 is first. Since the second digit of 410 is 1, it goes in the ones bucket. Likewise, 20 goes in the twos bucket, 1772 goes in the sevens bucket, and 88 goes in the eights bucket:

020881234567894101772

This time it turns out they are all in different buckets. All that sorting you did in round one was for nothing, but you didn’t know that at the time.

Combining the numbers in order, you still have 410, 20, 1772, 88. The order didn’t change.

00824128771

Time for round three now.

Round Three

This time look at the hundreds place. Since 20 and 88 are less than 100, you can use 0s for the hundreds place:

0082128700471

Again, make ten buckets and add the numbers in order according to their hundreds-place digit. Although 20 and 88 are both in the zeros bucket (020 and 088), 20 was first in the list, so it also maintains the first position in the bucket.

012345678941088201772

Combine the numbers in order from the buckets and you have the new order of 20, 88, 410, 1772:

0820287107041

The list is already sorted in this case, but radix sort doesn’t know that yet. There’s still one more round to go.

Round Four

In this algorithm of radix sort, there’s one round for every significant digit. Since 1722 has four digits, this fourth round will ensure that the thousands-place value is also sorted. Pad any numbers less than one thousand with zeros in front:

0820287100700401

Make your ten buckets again. 20, 88 and 410 all go in the zeros bucket since they’re less than 1000. However, within the bucket, they maintain their previous order:

012345678917728820410

Finally, you can combine the numbers from the buckets in order, and they’re indeed sorted as 20, 88, 410, and 1772:

08202871741

The entire list was sorted without any comparisons!

Implementation

To implement LSD-radix sort, you’ll use a while loop for each round as you step through the place values. Your ten buckets will be ten integer lists.

Open up the starter project for this chapter. Create a folder named lib in the root of the project. Then create a new file named radix_sort.dart in lib.

Add the following to the file:

extension RadixSort on List<int> {
  void radixSort() {
    const base = 10;
    var done = false;
    // 1
    var place = 1;
    while (!done) {
      done = true;
      // 2
      final buckets = List.generate(base, (_) => <int>[]);
      forEach((number) {
        // 3
        final remainingPart = number ~/ place;
        final digit = remainingPart % base;
        // 4
        buckets[digit].add(number);
        if (remainingPart ~/ base > 0) {
          done = false;
        }
      });
      // 5
      place *= base;
      clear();
      addAll(buckets.expand((element) => element));
    }
  }
}

Here are the main points:

  1. Loop through each place value, where place is first 1, then 10, then 100, and so on through the largest place value in the list.
  2. Create your ten buckets. The type for buckets is List<List<int>>.
  3. Find the significant digit of the current number.
  4. Put number in the appropriate bucket.
  5. Take the numbers from the buckets in their new order and put them back in the original list. Since buckets is a list of lists, expand helps you flatten them back into a single-dimensional list.

Testing it Out

With that, you’ve implemented your first non-comparative sorting algorithm. Head back to bin/start.dart and replace the contents of the file with the following code:

import 'package:starter/radix_sort.dart';

void main() {
  final list = [88, 410, 1772, 20];
  print("Original list: $list");
  list.radixSort();
  print("Radix sorted: $list");
}

Run that and you should see the following console output:

Original: [88, 410, 1772, 20]
Radix sorted: [20, 88, 410, 1772]

Radix sort is one of the fastest sorting algorithms. The average time complexity of this LSD-radix sort is O(k × n), where k is the number of significant digits in the largest number, and n is the number of integers in the list.

Sorting by Most Significant Digit

The MSD-radix sort uses the most significant digit to sort a list. You might think that’s a little strange when you look at a list of numbers sorted in this way:

36909514509431459

What, 459 comes after 1345? Yes, it does. The reason is that the most significant digit of 459 is 4, which comes after 1, the most significant digit of 1345.

Although MSD sorting might look strange for numbers, it makes a lot more sense when you’re sorting words:

daodptedaneodleeooz

You don’t want the short words sorted before the long words. You want them sorted in alphabetical order, or if you use the more technical term, lexicographical order.

Note

Some implementations of lexicographical ordering do sort shorter elements before longer ones, regardless of the elements’ initial values. This chapter will not.

A key point about words, at least English words, is that they’re a sequence of letters. And when you have a sequence of letters, you can treat them like a sequence of digits. That means you can use the MSD-radix sort for lexicographical ordering of a list of words. For the sake of simplicity, though, the following example will use numbers to show you how MSD-radix sort works.

Example

Start with the following unsorted list:

[46, 500, 459, 1345, 13, 999]

Beginning the Sort

As with LSD-radix sort, you first create ten buckets. However, this time you add the numbers to the buckets according to their most significant digit, that is, the first digit on their left:

012345678913134550099945946

If the values all had the same number of digits, you could proceed as you did with the LSD-radix sort, looping round after round. However, the values don’t have the same number of digits. Imagine a list with mostly small numbers but a few really large numbers. It would be inefficient to keep looping over all of the short numbers. That kind of thing can easily happen with lists of strings, for example.

So, instead of doing a full bucket sort for every digit, you’ll do a recursive bucket sort. That means if any buckets in a particular round have more than one number, you’ll recursively perform another bucket sort on that bucket.

Recursively Sorting 1345 and 13

In the previous image, the ones bucket and the fours bucket both have more than one element. Start with the ones bucket, which contains 1345 and 13. Take those numbers and perform another bucket sort, this time on the second most significant digit:

0123456789131345

1345 and 13 are still together because the second digit for both of them is 3.

Perform another bucket sort on the next digit. Since the next digit of 1345 is 4, it goes in the fours bucket. The number 13 doesn’t have a third digit, so there is no place to put it. You can’t put it in the zeros bucket because then how would you distinguish 13 and 130? Instead, you’ll put it in a special priority bucket. It gets sorted before any of the other buckets.

0123456789priority131345

There’s nothing more to sort in this recursive branch, so pop back out a few levels, returning the sorted list [13, 1345].

Recursively Sorting 46 and 459

Now you need to do the same thing with 46 and 459. Perform a bucket sort on the second digit. That gives the following:

012345678945946

Since none of the buckets have more than one item, you can stop this branch of the recursion here and return the sorted sublist [459, 46].

Combining All the Buckets

There were no other buckets with more than one number from the original round, so you’re finished. Combine all of the buckets and sorted sublists into the final sorted result of [13, 1345, 459, 46, 500, 999]:

36909514509431459

Hopefully you have a better intuitive grasp of how MSD-radix sort works now. You’ll implement this algorithm in the next section.

Implementation

To start off, you’ll write a few helper methods.

Getting the Number of Digits

Open lib/radix_sort.dart again and add the following int extension to the file:

extension Digits on int {
  static const _base = 10;

  int digits() {
    var count = 0;
    var number = this;
    while (number != 0) {
      count += 1;
      number ~/= _base;
    }
    return count;
  }
}

This helper method tells you how many digits are in a particular integer. Since there isn’t a length property on the int type, you count how many times you have to divide by 10 before you get 0.

Go back to bin/starter.dart and run the following in main to try out a few examples:

print(13.digits());    // 2
print(999.digits());   // 3
print(1345.digits());  // 4

Finding the Digit at Some Position

Add the following import to lib/radix_sort.dart:

import 'dart:math';

Then add the following additional method to the Digits extension on int:

int? digitAt(int position) {
  if (position >= digits()) {
    return null;
  }
  var number = this;
  while (number ~/ pow(_base, position + 1) != 0) {
    number ~/= _base;
  }
  return number % _base;
}

digitAt returns the digit at a given position. Like with lists, the leftmost position is zero. Thus, the digit for position zero of the value 1345 is 1. The digit for position three is 5. Since there are only four digits, the digit for position five will return null.

The implementation of digitAt works by repeatedly chopping a digit off the end of the number until the requested digit is at the end. It’s then extracted using the remainder operator.

Test your new int extension out in bin/starter.dart with the following code:

print(1345.digitAt(0)); // 1
print(1345.digitAt(1)); // 3
print(1345.digitAt(2)); // 4
print(1345.digitAt(3)); // 5
print(1345.digitAt(4)); // null
print(1345.digitAt(5)); // null

Finding the Max Digits in the List

Go back to lib/radix_sort.dart and add a new List extension with the following method:

extension MsdRadixSort on List<int> {
  int maxDigits() {
    if (isEmpty) return 0;
    return reduce(max).digits();
  }

  // more to come
}

Given some list of numbers, maxDigits will tell you the number of digits in the largest number.

Test in out in main like so:

final list = [46, 500, 459, 1345, 13, 999];
print(list.maxDigits()); // 4

The result is 4 since the largest number, 1345, has four digits.

Now you’re ready to write the actual sort implementation.

Adding the Recursive Sort Methods

Go back to the MsdRadixSort extension on List in lib/radix_sort.dart. You’re going to add two more methods to this extension. One will be public and the other a private recursive helper method.

First, add the public lexicographicalSort extension method to MsdRadixSort:

void lexicographicalSort() {
  final sorted = _msdRadixSorted(this, 0);
  clear();
  addAll(sorted);
}

// more to come

This method wraps a currently unimplemented recursive helper method that does the real work of sorting the list.

Next add that helper method after lexicographicalSort:

// 1
List<int> _msdRadixSorted(List<int> list, int position) {
  // 2
  if (list.length < 2 || position >= list.maxDigits()) {
    return list;
  }
  // 3
  final buckets = List.generate(10, (_) => <int>[]);
  // 4
  var priorityBucket = <int>[];

  // more to come
}

Here’s where the real work starts happening:

  1. The list that you pass to this recursive method will be the full list on the first round (when position is 0), but after that, it’ll be the smaller bucket lists. If writing private MSD recursive methods was on your bucket list, you can cross it off now.
  2. As with all recursive operations, you need to set a terminating condition that stops the recursion. Recursion should halt if there’s only one element in the list or if you’ve exceeded the max number of digits.
  3. Similar to the LSD-radix sort, you instantiate a two-dimensional list for the buckets.
  4. The priorityBucket is a special bucket that stores values with fewer digits than the current position. Values that go in priorityBucket will be ordered first.

Continue by adding the following for loop at the bottom of _msdRadixSorted:

for (var number in list) {
  final digit = number.digitAt(position);
  if (digit == null) {
    priorityBucket.add(number);
    continue;
  }
  buckets[digit].add(number);
}

// more to come

For every number in the list, you find the digit at the current position and use it to place the number in the appropriate bucket.

Finally, finish off _msdRadixSorted by adding the following code at the bottom of the method:

// 1
final bucketOrder = buckets.reduce((result, bucket) {
  if (bucket.isEmpty) return result;
  // 2
  final sorted = _msdRadixSorted(bucket, position + 1);
  return result..addAll(sorted);
});
// 3
return priorityBucket..addAll(bucketOrder);

This bit of code is perhaps the hardest to understand because it includes both an anonymous function and a recursive method. It’s not so terrible when you grasp the parts, though:

  1. The higher-order function reduce takes a collection and reduces it to a single value. Since your collection here is a list of lists, reduce will give you a single list. The values in that list will be numbers in the order that they came from the buckets. reduce works by keeping a running result list which you can add to based on the current bucket that you’re iterating to. If reduce and its anonymous function are too confusing, you could rewrite them as a for loop.
  2. For every non-empty bucket that you come to, recursively sort that bucket at the next digit position.
  3. Everything in the priority bucket goes first, but then add any other values that were in the other buckets to the end of the list.

That was a bit more involved than LSD-radix sort, wasn’t it? You’re done now, though, so it’s time to try it out!

Testing it Out

Open bin/starter.dart again and run the following code in main:

final list = [46, 500, 459, 1345, 13, 999];
list.lexicographicalSort();
print(list);

You should see the output below in the console:

[13, 1345, 459, 46, 500, 999]

And that’s what you want if you’re asking for the lexicographical order!

Challenges

The challenges below will help strengthen your understanding of radix sorts. You can find the answers in the Challenge Solutions section or in the supplementary materials that accompany this book.

Challenge 1: What Are in the Buckets?

Add a print statement to your radixSort implementation so that it’ll tell you what’s in the buckets after each round of sorting.

Do the same for each recursion of lexicographicalSort.

Use the following list for both sorts:

var list = [46, 500, 459, 1345, 13, 999];

Challenge 2: Unique Characters

Write a function that returns the total number of unique characters used in a list of words.

You can use the following list:

final words = ['done', 'ad', 'eel', 'zoo', 'adept', 'do'];

If you had a bucket for each unique character, how many buckets would you need?

Challenge 3: Optimization

Given the following list:

[88, 410, 1772, 20, 123456789876543210]

Your current implementation of radixSort would take 18 rounds, 14 of which are completely unnecessary. How could you optimize radix sort for cases where a single number is much larger than the others.

Key Points

  • Unlike the other sorting algorithms you’ve implemented in previous chapters, radix sort doesn’t rely on comparing two values. It leverages bucket sort, which is a way to sort numbers by their digits.
  • The word radix means base, as in base-10 or base-2 numbering systems. The internal bucket sort will use one bucket for each base.
  • Radix sort can be one of the fastest sorting algorithms for sorting values with positional notation.
  • A least-significant-digit (LSD) radix sort begins sorting with the right-most digit.
  • Another way to implement radix sort is the most-significant-digit (MSD) form. This form sorts by prioritizing the left-most digits over the lesser ones to the right. It’s best illustrated by the sorting behavior of the String type.