跳转至

4 Strings

Numbers are essential in programming, but they aren’t the only type of data you need to work with in your apps. Text is also a common data type, representing things such as people’s names, their addresses, or even the complete text of a book. All of these are examples of text that an app might have to handle.

Most computer programming languages store text in a data type called a string. This chapter introduces you to strings, first by giving you background on the concept, and then by showing you how to use them in Dart.

How Computers Represent Strings

Computers think of strings as a collection of individual characters. Numbers are the language of CPUs, and all code, in every programming language, can be reduced to raw numbers. Strings are no different.

That may sound very strange. How can characters be numbers? At its base, a computer needs to be able to translate a character into the computer’s own language, and it does so by assigning each character a different number. This two-way mapping between characters and numbers is called a character set.

When you press a character key on your keyboard, you’re actually communicating the number of the character to the computer. Your computer converts that number into a picture of the character and finally, presents that picture to you.

Unicode

In isolation, a computer is free to choose whatever character set mapping it likes. If the computer wants the letter a to equal the number 10, then so be it. But when computers start talking to each other, they need to use a common character set.

If two computers used different character sets, then when one computer transferred a string to another computer, they would end up thinking the strings contained different characters.

There have been several standards over the years, but the modern standard is Unicode. It defines the character set mapping that almost all computers use today.

Note

You can read more about Unicode at its official website, unicode.org.

As an example, consider the word cafe. The Unicode standard tells us that the letters of this word should be mapped to numbers like so:

img

The number associated with each character is called a code point. So in the example above, c uses code point 99, a uses code point 97, and so on.

Of course, Unicode is not just for the simple Latin characters used in English, such as c, a, f and e. It also lets you map characters from languages around the world. The word cafe, as you’re probably aware, is derived from French, in which it’s written as café. Unicode maps these characters like so:

img

And here’s an example using simplified Chinese characters that mean “I love you”:

img

You’re familiar with the small pictures called emojis that you can use when texting your friends. These pictures are, in fact, just normal characters and are also mapped by Unicode. For example:

img

These are only two characters. The code points for them are very large numbers, but each is still only a single code point. The computer considers them no different than any other two characters.

Note

The word “emoji” comes from the Japanese 絵文字, where “e” means picture and “moji” means character.

The numbers for each of the characters above were written in decimal notation, but you usually write Unicode code points in hexadecimal format. Here they are again in hex:

img

Using base-16 makes the numbers more compact, easier to find in the Unicode character code charts and generally nicer to work with while programming.

Strings and Characters in Dart

Dart, like any good programming language, can work directly with strings. It does so through the String data type. In the remainder of this chapter, you’ll learn about this data type and how to work with it.

You’ve already seen a Dart string back in Chapter 1 where you printed one:

print('Hello, Dart!');

You can extract that same string as a named variable:

var greeting = 'Hello, Dart!';
print(greeting);

The right-hand side of this expression is known as a string literal. Due to type inference, Dart knows that greeting is of type String. Since you used the var keyword, you’re allowed to reassign the value of greeting as long as the new value is still a string.

var greeting = 'Hello, Dart!';
greeting = 'Hello, Flutter!';

Even though you changed the value of greeting here, you didn’t modify the string itself. That’s because strings are immutable in Dart. It’s not like you replaced Dart in the first string with Flutter. No, you completely discarded the string 'Hello, Dart!' and replaced it with a whole new string whose value was 'Hello, Flutter!'.

Note

The code examples that follow contain emoji characters that may be difficult to input on your keyboard. You can find all of them to conveniently copy-and-paste by opening starter/bin/starter.dart in the Chapter 4 supplemental materials for this book.

Alternatively, you can use emojipedia.org with the following search terms: “dart”, “Mongolia flag” and “man woman girl boy”. Or on macOS, you can also press Command-Control-Space to open the Character Viewer and search for emojis.

Getting Characters

If you’re familiar with other programming languages, you may be wondering about a Character or char type. Dart doesn’t have that. Take a look at this example:

const letter = 'a';

So here, even though letter is only one character long, it’s still of type String.

But strings are a collection of characters, right? What if you want to know the underlying number value of the character? No problem. Keep reading.

Dart strings are a collection of UTF-16 code units. UTF-16 is a way to encode Unicode values by using 16-bit numbers. If you want to find out what those UTF-16 codes are, you can do it like so:

var salutation = 'Hello!';
print(salutation.codeUnits);

This will print the following list of numbers in decimal notation:

[72, 101, 108, 108, 111, 33]

H is 72, e is 101, and so on.

These UTF-16 code units have the same value as Unicode code points for most of the characters you see on a day-to-day basis. However, 16 bits only give you 65,536 combinations, and believe it or not, there are more than 65,536 characters in the world! Remember the large numbers that the emojis had in the last section? You’ll need more than 16 bits to represent those values.

UTF-16 has a special way of encoding code points higher than 65,536 by using two code units called surrogate pairs.

const dart = '🎯';
print(dart.codeUnits);
// [55356, 57263] 

The code point for 🎯 is 127919, but the surrogate pair for that in UTF-16 is 55356and 57263. No one wants to mess with surrogate pairs. It would be much nicer to just get Unicode code points directly. And you can! Dart calls them runes.

const dart = '🎯';
print(dart.runes);
// (127919) 

Problem solved, right? If only it were.

Unicode Grapheme Clusters

Unfortunately, language is messy and so is Unicode. Have a look at this example:

const flag = '🇲🇳';
print(flag.runes);
// (127474, 127475)  

Why are there two Unicode code points!? Well, it’s because Unicode doesn’t create a new code point every time there is a new country flag. It uses a pair of code points called regional indicator symbols to represent a flag. That’s what you see in the example for the Mongolian flag above. 127474 is the code for the regional indicator symbol letter M, and 127475 is the code for the regional indicator symbol letter N. MNrepresents Mongolia.

Note

Windows computers may not display the flag emojis. See the forumsfor more information.

If you thought that was complicated, look at this one:

const family = '👨‍👩‍👧‍👦';
print(family.runes);
// (128104, 8205, 128105, 8205, 128103, 8205, 128102)         

That list of Unicode code points is a man, a woman, a girl and a boy all glued together with a character called a Zero Width Joiner or ZWJ.

Now imagine trying to find the length of that string:

const family = '👨‍👩‍👧‍👦';

family.length;           // 11
family.codeUnits.length; // 11
family.runes.length;     // 7         

Getting the length of the string with family.length is equivalent to finding the number of UTF-16 code units: There are surrogate pairs for each of the four people plus the three ZWJ characters for a total of 11. Finding the runes gives you the seven Unicode code points that make up the emoji: man + ZWJ + woman + ZWJ + girl + ZWJ + boy. However, neither 11 nor 7 is what you’d expect. The family emoji looks like it’s just one character. You’d think the length should be one!

When a string with multiple code points looks like a single character, this is known as a user-perceived character. In technical terms, it’s called a Unicode extended grapheme cluster, or more commonly, just grapheme cluster.

Although the creators of Dart didn’t support grapheme clusters in the language itself, they did make an add-on package that handles them.

Adding the Characters Package

This is a good opportunity to try out your first Pub package. In the root folder of your project, open pubspec.yaml.

Note

If you don’t see pubspec.yaml, go back to Chapter 1 to see how to create a new project. Alternatively, open the starter project that comes with the supplemental materials for this chapter.

Add the following two lines at the bottom of the file:

dependencies:
  characters: ^1.2.1

Here are some things to note:

  • dependencies is the section name. Large Dart projects will typically include multiple dependencies. If the project you’re working on already has a dependencies section, then you only need to add the characters line.
  • You may also notice another section called dev_dependencies. These are the list of dependencies you’ll use during development but are unneeded for a published app. For example, the lints package helps you find problems in your code as you write it.
  • Indentation is important in .yaml files, so make sure to indent the package name with two spaces. dependencies should have no spaces in front of it.
  • The ^ carat character means that any version higher than or equal to 1.2.1 but lower than 2.0.0 is OK to use in your project. This is known as semantic versioning.

Now press Command+S on a Mac or Control+S on a PC to save the changes to pubspec.yaml. VS Code will automatically fetch the package from Pub. Alternatively, you can press the Get Packages button in the top right. It looks like a down arrow:

img

Both of these methods are equivalent to running the following command in the root folder of your project using the terminal:

dart pub get

Note

Whenever you download and open a new Dart project that contains Pub packages, you’ll need to run dart pub get first. This includes the final and challenge projects included in the supplemental materials for this chapter.

Now that you’ve added the characters package to your project, go back to your Dart code file and add the following import to the top of the page:

import 'package:characters/characters.dart';

Now you can use the code in the characters package to handle grapheme clusters. This package adds extra functionality to the String type.

const family = '👨‍👩‍👧‍👦';
family.characters.length; // 1         

Aha! Now that’s what you’d hope to see: just one character for the family emoji. The characters package extended String to include a collection of grapheme clusters called characters.

In your own projects, you can decide whether you want to work with UTF-16 code units, Unicode code points or grapheme clusters. However, as a general rule, you should strongly consider using grapheme clusters any time you’re receiving text input from the outside world. That includes fetching data over the network or users typing things into your app.

Single Quotes vs. Double Quotes

Dart allows you to use either single quotes or double quotes for string literals. Both of these are fine:

'I like cats'
"I like cats"

Although Dart doesn’t have a recommended practice, the Flutter style guide does recommend using single quotes, so this book will also follow that practice.

You might want to use double quotes, though, if your string includes any apostrophes.

"my cat's food"

Otherwise, you would need to use the backslash \ as an escape character so that Dart knows that the string isn’t ending early:

'my cat\'s food'

Concatenation

You can do much more than create simple strings. Sometimes you need to manipulate a string, and one common way to do so is to combine it with another string. This is called concatenation…with no relation to the aforementioned felines.

In Dart, you can concatenate strings simply by using the addition operator. Just as you can add numbers, you can add strings:

var message = 'Hello' + ' my name is ';
const name = 'Ray';
message += name;
// 'Hello my name is Ray'

You need to declare message as a variable, rather than a constant, because you want to modify it. You can add string literals together, as in the first line, and you can add string variables or constants together, as in the third line.

If you find yourself doing a lot of concatenation, you should use a string buffer, which is more efficient. You’ll learn how to do this in Chapter 1, “String Manipulation”, in Dart Apprentice: Beyond the Basics.

Interpolation

You can also build up a string by using interpolation, which is a special Dart syntax that lets you build a string in a manner that’s easy for other people reading your code to understand:

const name = 'Ray';
const introduction = 'Hello my name is $name';
// 'Hello my name is Ray'

This is much more readable than the example in the previous section. It’s an extension of the string literal syntax, in which you replace certain parts of the string with other values. You add a dollar sign ($) in front of the value that you want to insert.

The syntax works in the same way to build a string from other data types such as numbers:

const oneThird = 1 / 3;
const sentence = 'One third is $oneThird.';

Here, you use a double for the interpolation. Your sentence constant will contain the following value:

One third is 0.3333333333333333.

Of course, it would actually take an infinite number of characters to represent one-third as a decimal because it’s a repeating decimal. You can control the number of decimal places shown on a double by using toStringAsFixed along with the number of decimal places to show:

final sentence = 'One third is ${oneThird.toStringAsFixed(3)}.';

There are a few items of interest here:

  • You’re requesting the string to show only three decimal places.
  • Since you’re performing an operation on oneThird, you need to surround the expression with curly braces after the dollar sign. This lets Dart know that the dot (.) after oneThird isn’t just a regular period.
  • The sentence variable needs to be final now instead of const because toStringAsFixed(3) is calculated at runtime.

Here’s the result:

One third is 0.333.

Exercises

  1. Create a string constant called firstName and initialize it to your first name. Also create a string constant called lastName and initialize it to your last name.
  2. Create a string constant called fullName by adding the firstName and lastName constants together, separated by a space.
  3. Using interpolation, create a string constant called myDetails that uses the fullName constant to create a string introducing yourself. For example, Ray Wenderlich’s string would read: Hello, my name is Ray Wenderlich.

Multi-Line Strings

Dart has a neat way to express strings that contain multiple lines, which can be rather useful when you need to use very long strings in your code.

You can support multi-line text like so:

const bigString = '''
You can have a string
that contains multiple
lines
by
doing this.''';
print(bigString);

The three single quotes (''') signify that this is a multi-line string. Three double quotes (""") would have done the same thing.

The example above will print the following:

You can have a string
that contains multiple
lines
by
doing this.

Notice that all of the newline locations are preserved. If you just want to use multiple lines in code but don’t want the output string to contain newline characters, then you can surround each line with single quotes:

const oneLine = 'This is only '
    'a single '
    'line '
    'at runtime.';

That’s because Dart ignores whitespace outside of quoted text. This does the same thing as if you concatenated each of those lines with the + operator:

const oneLine = 'This is only ' +
    'a single ' +
    'line ' +
    'at runtime.';

Either way, this is what you get:

This is only a single line at runtime.

Like many languages, if you want to insert a newline character, you can use \n.

const twoLines = 'This is\ntwo lines.';

Printing this gives:

This is
two lines.

But sometimes you want to ignore any special characters that a string might contain. To do that, you can create a raw string by putting r in front of the string literal.

const rawString = r'My name \n is $name.';

And that’s exactly what you get:

My name \n is $name.

Inserting Characters From Their Codes

Similar to the way you can insert a newline character into a string using the \n escape sequence, you can also add Unicode characters if you know their codes. Take the following example:

print('I \u2764 Dart\u0021');

Here, you’ve used \u, followed by a four-digit hexadecimal code unit value. 2764 is the hex value for the heart emoji, and 21 is the hex value for an exclamation mark. Since 21 is only two digits, you pad it with extra zeros as 0021.

This prints:

img

For code points with values higher than hexadecimal FFFF, you need to surround the code with curly braces:

print('I love \u{1F3AF}');

This prints:

img

In this way, you can form any Unicode string from its codes.

You’ve come to the end of this chapter, but you can look forward to Chapter 1, “String Manipulation”, in Dart Apprentice: Beyond the Basics to take you to the next level of working with text in Dart.

Challenges

Before moving on, here are some challenges to test your knowledge of strings. It’s best if you try to solve them yourself, but solutions are available with the supplementary materials for this book if you get stuck.

As described in the Getting Characters section above, you can find the required emoji characters in the starter project or from emojipedia.org where you can use the search terms “Chad flag”, “Romania flag” and “thumbs up dark skin tone”.

Challenge 1: Same Same, but Different

This string has two flags that look the same. But they aren’t! One of them is the flag of Chad and the other is the flag of Romania.

const twoCountries = '🇹🇩🇷🇴';      

Which is which?

Hint: Romania’s regional indicator sequence is RO, and R is 127479 in decimal. Chad, which is Tishād in Arabic and Tchad in French, has a regional indicator sequence of TD. Sequence letter T is 127481 in decimal.

Challenge 2: How Many?

Given the following string:

const vote = 'Thumbs up! 👍🏿';  
  • How many UTF-16 code units are there?
  • How many Unicode code points are there?
  • How many Unicode grapheme clusters are there?

Challenge 3: Find the Error

What is wrong with the following code?

const name = 'Ray';
name += ' Wenderlich';

Challenge 4: In Summary

What is the value of the constant named summary?

const number = 10;
const multiplier = 5;
final summary = '$number \u00D7 $multiplier = ${number * multiplier}';

Key Points

  • Unicode is the standard representation for mapping characters to numbers.
  • Dart uses UTF-16 values known as code units to encode Unicode strings.
  • A single mapping in Unicode is called a code point, which is known as a rune in Dart.
  • User-perceived characters may be composed of one or more code points and are called grapheme characters.
  • You can combine strings by using the addition operator.
  • You can make multi-line strings using three single quotes or double quotes.
  • You can use string interpolation to build a string in place.