TCLWISE
An introduction to the Tcl programming language

Sponsored Project: The Jim interpreter
A small footprint implementation of Tcl

Send a comment to the author


6. LISTS AND STRINGS


Because in Tcl the list is a such central data structure, it's often of interest to write programs in terms of list manipulation even if the goal is to work with a different kind of data. This is expecially true when performing complex operations on strings: instead to directly work on the string, it's possible to convert the string into a list, perform operations on the resulting list, and then convert it back to a string. The natural way to convert a string in a list, is to convert it into a list where every character of the original string is an element of the list. The same concept is often used in Lisp dialects, like the Scheme programming language. Before to continue to explore this powerful programming concept, you need to learn two new Tcl commands in order to be able to convert a string into a list, and vice versa. This two commands are split and join.

6.1 Converting strings to lists


The split command converts a string into a list, where every element of the list is obtained splitting the string in parts. The position where to split the string, is specified in terms of a set of characters: the string is splitted in every position where one of the specified characters appears. The structure of the split command is:


split string ?splitChars?


If splitChars is omitted, it defaults to a single space character, so the string will be converted in a list where every element is separated by a space in the original string.

For example the sting "abracadabra" can be splitted into a six elements list splitting using "a" as splitChars argument:

% split abracadabra a {} br c d br {}

Note how the first and last element of the list are empty strings because the string we splitted started and ended with a. Splitting instead for b, the result is the following:

% split abracadabra b a racada ra

We obtained a three element list. What about if we specify both b and c as split chars?

% split abracadabra bc a ra ada ra

The string is splitted where there is a b or a c character, and not when the string "bc" appears in the string. The following graph shows the split points in the previous split example:

abracadabra | || | 0 12 3

splitChar is "bc" in the example, so the string is splitted where there is a b or a c character. Because there are 4 characters in the string that are either b or c, there are 4 split points as shown in the picture, so the resulting list's elements are "a", "ra", "ada", "ra".

As example of usage of split, we can try to parse a line of the /etc/passwd file present in Unix systems. The first line in my system looks like this:

root:x:0:0:root:/root:/bin/zsh

Assuming that we need to get the fifth field, how to do it? One of the simplest solution is to convert the string into a list of elements, using ":" as elements separator, then to use lindex to get the fifth element.

root:x:0:0:root:/root:/bin/zsh % lindex [split $line :] 5 /root

The split output produces a list used as input for lindex. This code is idiomatic in Tcl, being the simplest way to parse many strings containing fileds separated by single characters.

Note that there is a class of strings where the separator between elements is just a space, like "this is a string". You may wonder why it's useful to use split in such a case, because this string it's actually a valid Tcl list so it is possible to directly use lindex or other list manipulation commnds:

% lindex "this is a string" 1 is

But that's a simple case because there are strings containing space separated fields that are not valid Tcl lists:

% set mystring "this is { a string" this is { a string % lindex $mystring 1 unmatched open brace in list while evaluating {lindex $mystring 1}

Because $mystring didn't contained a valid Tcl list (there is an open brance but not the corresponding closed one), the result of calling lindex against it is an error. Also note that even with the closing bracket still the meaning can be different: If the string is a space separated list of fields the { character should be just an element like any other.

To fix all this problems just call split against space separated strings exacly as we did for the /etc/passwd line:

% set mystring "this is { a string" this is { a string % lindex [split $mystring] 2 {

Note that in the above example the string with the separator characters was omitted, because it defaults to a single space character.

6.2 From strings to list of chars


When the split command is called with an empty string as splitChars argument, the behaviour of the command is to convert the string in a list, where every char of the original string is now an element of the list:

% split "foobar" {} f o o b a r

This allows the programmer to manipulate strings as lists as we will see later in this chapter, but before to cover this important argument, it's better to understand how the reverse conversion is done: how to turn lists into strings.

6.3 Converting lists to strings


The join command is the complementary of split. It joins every element of a list into a string, using another string as separator between elements. The command structure is:


join list ?joinString?


the joinString argument can be omitted and defaults to null. The command usage is very simple:

% join [list 1 2 3] . 1.2.3 % join [list 1 2 3] "000" 100020003 % join [list 1 2 3] {} 123

The last example is interesting because it's the reverse of splitting a string using an empty separator. If every element of a list is a character to use join with an empty string as joinString argument will convert the list back to a string.

Another interesting use of join is related to expr: Frequently you may have a Tcl list containing numbers, and you may want to compute the sum of all the numbers in the list. The first solution you may think is something like this:

set list {1 2 3 4 5} set sum 0 for {set i 0} {$i < [llength $list]} {incr i} { incr sum [lindex $list $i] }

This is, more or less, how you should do it in C, but in Tcl there are other more comfortable solutions. One involves the use of join, using the "+" string as elements separator:

% join {1 2 3 4 5} + 1+2+3+4+5

As you can see, the resulting string is a valid expr expression, so we can rewrite the above code in this more coincise way:

set list {1 2 3 4 5} set sum [expr [join $list +]]

6.4 Manipulating strings as lists


Now we have enough knowledge to turn a string into a list, and then back into a string. This simple fact is very powerful because Tcl's list related commands are much more powerful than the string commands in some way: for instance the foreach or lsort command have no equivalent for strings.

let's start with a real example: the goal is to write a Tcl script that given a string, returns a string composed of all the characters that appear on the string repeated only one time, that's the alphabet needed in order to write this word. For example, for the string "apple" the alphabet is "aelp", while for the string "Tcl is cool" the alphabet is "cilost". This is the simple code we need to write to perform the operation:

% join [lsort -unique [split "supercalifragilistichespiralitoso" {}]] {} acefghiloprstu

So the world "supercalifragilistichespiralitoso" (the italian version of the world), is composed of only 14 different characters. Now that the effects of the script are known, we'll try to figure how it works. The split command converts the string into a list of characters, that is processed by lsort with the -unique option: the effect of this command is to remove the duplicated elements and to sort the list. This new list is then converted back to a string using the join command.

We can use a simple technique to write a procedure to test if two strings are one the anagram of the other:

proc isanagram {word1 word2} { set sorted1 [join [lsort [split $word1 {}]] {}] set sorted2 [join [lsort [split $word2 {}]] {}] string equal $sorted1 $sorted2 }

This time we don't use the -unique option for lsort, so ordered1 and ordered2 will just contain a string where characters of word1 and word2 are sorted: if the two strings are composed of exactly the same letters the ordered version of the two strings will be the same.

For example an anagram of "more" is "rome", we can check that's true using the tclsh directly:

% join [lsort [split rome {}]] {} emor % join [lsort [split more {}]] {} emor

The sorted version of the two words is the same. The procedure isanagram returns 1 if word1 is the anagram of word2, otherwise it returns 0 because of its last line where string equal compares the two sorted words (remember that a Tcl procedure returns the return value of the last command executed for default if there isn't a return command in the execution path).

Similarly to lsort and split, the foreach command can be used against lists of characters. The trivial case is used in order to iterate over every character of a string:

% foreach x [split "mystring" {}] {puts $x} m y s t r i n g

But there is no reason to don't exploit the full power of foreach if needed. The following program invert the position of every two characters in a string:

% set var {} % foreach {a b} [split "mystring" {}] {append var $b$a} % set var ymtsirgn %

Tcl strings are binary safe, so for instance you can use the above code to translate a file composed of 16-bit numbers stored in little endian, in 16-bit numbers stored in big endian.

Another example involves the use of the lreverse command we wrote some section before (it just returns a version of the input list with the order of elements reversed):

% set string "hello world" hello world % join [lreverse [split $string {}]] {} dlrow olleh

This is a comfortable way to reverse a string.

The last example is a bit more complex, but also may be more interesting. Assume you have two strings, and want to know if the two strings have at least one common character. It's possible to write such a code in a compact form using Tcl's list commands and the ability to convert strings to lists:

proc commonChars {a b} { set a [split $a {}] set b [split $b {}] set union [concat [lsort -unique $a] [lsort -unique $b]] expr {[llength $union] != [llength [lsort -unique $union]]} }

The concat command join strings together using a space as separator: if the strings are valid lists, the resulting string will be a valid list, so concat can be used to create a list that is the concatenation of more lists. Now that you know this new command, it should not too hard to understand how the commonChars command works: The first two lines will just convert 'a' and 'b' to list of characters. The second line will put a list formed concatenating 'a' and 'b' in the variable 'union'. This is just a list of characters, where there are both the characters of 'a' and 'b'. If 'a' and 'b' have characters in common, the command [lsort -unique $union] will make the list shorter (because there are duplicated elements), so the last line of the procedure compares the length of the original list in 'union', with the length after the list after the duplicated elements were removed. If the two lengths are the same, the two strings don't have common characters, otherwise they have.

And that's the usage:

% commonChars foo bar 0 % commonChars tcl char 1

That's all. I hope that this chapter shows that strings and lists are very related in Tcl, and that can be a good idea to convert strings in lists of characters to exploit the powerful list related commands of Tcl.

Other Tcl/Tk books
Index
2.1 Anatomy of a command
2.2 Grouping
2.3 Program structure
2.4 Substitution of commands
2.5 Substitution of variables
2.6 More on interpolation
2.7 Comments
2.8 That's it
3.1 User defined procedures
3.2 The if command
4.1 Tcl list
4.2 The foreach command
4.3 The lrange command
4.4 The lappend command
4.5 The lset command
4.6 The lsort command
4.7 List values against variable names
5.1 The append command
5.2 The string command
5.3 string range
5.4 string index
5.5 string equal
5.6 string compare
5.7 string match
5.8 string map
5.9 string is
5.10 More string subcommands
5.11 Advanced string matching
6.1 Converting strings to lists
6.2 From strings to list of chars
6.3 Converting lists to strings
6.4 Manipulating strings as lists
7.1 Local variables
7.2 Top level
7.3 Global variables
7.4 Procedures arguments and pass by value
7.5 Procedures with a variable number of arguments
7.6 Procedures with default arguments
7.7 Recursion
7.8 Recursion limit
8.1 The switch command
8.2 The for command
8.3 break and continue
8.4 The lack of goto
9.1 Programs executing programs: the eval command
9.2 Breaking the rules with uplevel
9.3 Passing variable names to procedures
9.4 Mapping scripts to lists
9.5 The rename command
9.6 Expanding lists into arguments in Tcl 8.5
Additional 20 chapters in the printed version.

Related man pages


Links
Author Home
Tclers Wiki


Copyright © 2004 Salvatore Sanfilippo. All rights reserved.
This online book is for personal use only.
It cannot be copied to other web sites or further distributed in any form.