TCLWISE
An introduction to the Tcl programming language
Sponsored Project: The Jim interpreter A small footprint implementation of Tcl |
Send a comment to the author
|
6. LISTS AND STRINGS
Because in Tcl the list is a such central data structure, it's often
of interest to write programs in terms of list manipulation even
if the goal is to work with a different kind of data. This is expecially
true when performing complex operations on strings: instead to directly
work on the string, it's possible to convert the string into a list,
perform operations on the resulting list, and then convert it back
to a string. The natural way to convert a string in a list, is to
convert it into a list where every character of the original string
is an element of the list. The same concept is often used in Lisp dialects,
like the Scheme programming language. Before to continue to explore
this powerful programming concept, you need to learn two new Tcl commands
in order to be able to convert a string into a list, and vice versa.
This two commands are split and join.
6.1 Converting strings to lists The split command converts a string into a list, where every element
of the list is obtained splitting the string in parts. The position
where to split the string, is specified in terms of a set of characters:
the string is splitted in every position where one of the specified characters
appears. The structure of the split command is:
split string ?splitChars? If splitChars is omitted, it defaults to a single space character,
so the string will be converted in a list where every element is
separated by a space in the original string.
For example the sting "abracadabra" can be splitted into a six elements
list splitting using "a" as splitChars argument:
Note how the first and last element of the list are empty strings because
the string we splitted started and ended with a.
Splitting instead for b, the result is the following:
We obtained a three element list. What about if we specify both
b and c as split chars?
The string is splitted where there is a b or a c character,
and not when the string "bc" appears in the string. The
following graph shows the split points in the previous split
example:
splitChar is "bc" in the example, so the string is splitted where
there is a b or a c character. Because there are 4 characters in the
string that are either b or c, there are 4 split points as shown in the
picture, so the resulting list's elements are "a", "ra", "ada", "ra".
As example of usage of split, we can try to parse a line of the
/etc/passwd file present in Unix systems. The first line in my
system looks like this:
Assuming that we need to get the fifth field, how to do it?
One of the simplest solution is to convert the string into a list
of elements, using ":" as elements separator, then to use lindex to
get the fifth element.
The split output produces a list used as input for lindex.
This code is idiomatic in Tcl, being the simplest way to parse
many strings containing fileds separated by single characters.
Note that there is a class of strings where the separator between
elements is just a space, like "this is a string".
You may wonder why it's useful to use split in such a case, because
this string it's actually a valid Tcl list so it is possible
to directly use lindex or other list manipulation commnds:
But that's a simple case because there are strings containing space
separated fields that are not valid Tcl lists:
Because $mystring didn't contained a valid Tcl list (there is
an open brance but not the corresponding closed one), the
result of calling lindex against it is an error.
Also note that even with the closing bracket still the meaning
can be different: If the string is a space separated list of
fields the { character should be just an element like any other.
To fix all this problems just call split against space separated
strings exacly as we did for the /etc/passwd line:
Note that in the above example the string with the separator
characters was omitted, because it defaults to a single space
character.
6.2 From strings to list of chars When the split command is called with an empty string
as splitChars argument, the behaviour of the command
is to convert the string in a list, where every char
of the original string is now an element of the list:
This allows the programmer to manipulate strings as lists
as we will see later in this chapter, but before to
cover this important argument, it's better to understand how
the reverse conversion is done: how to turn lists into strings.
6.3 Converting lists to strings The join command is the complementary of split.
It joins every element of a list into a string, using
another string as separator between elements.
The command structure is:
join list ?joinString? the joinString argument can be omitted and defaults to null.
The command usage is very simple:
The last example is interesting because it's the reverse of
splitting a string using an empty separator. If every element
of a list is a character to use join with an empty string
as joinString argument will convert the list back to a string.
Another interesting use of join is related to expr:
Frequently you may have a Tcl list containing numbers,
and you may want to compute the sum of all the numbers
in the list. The first solution you may think is something like
this:
This is, more or less, how you should do it in C, but in Tcl there
are other more comfortable solutions. One involves the use
of join, using the "+" string as elements separator:
As you can see, the resulting string is a valid expr expression,
so we can rewrite the above code in this more coincise way:
6.4 Manipulating strings as lists Now we have enough knowledge to turn a string into a list,
and then back into a string. This simple fact is very powerful
because Tcl's list related commands are much more powerful than
the string commands in some way: for instance the foreach or
lsort command have no equivalent for strings.
let's start with a real example: the goal is to write a Tcl
script that given a string, returns a string composed of
all the characters that appear on the string repeated only
one time, that's the alphabet needed in order to write this
word. For example, for the string "apple" the alphabet is "aelp",
while for the string "Tcl is cool" the alphabet is "cilost".
This is the simple code we need to write to perform the operation:
So the world "supercalifragilistichespiralitoso" (the italian version
of the world), is composed of only 14 different characters.
Now that the effects of the script are known, we'll try to figure
how it works. The split command converts the string into a list
of characters, that is processed by lsort with the -unique option:
the effect of this command is to remove the duplicated elements
and to sort the list. This new list is then converted back to a string
using the join command.
We can use a simple technique to write a procedure to test if
two strings are one the anagram of the other:
This time we don't use the -unique option for lsort, so
ordered1 and ordered2 will just contain a string
where characters of word1 and word2 are sorted: if
the two strings are composed of exactly the same letters
the ordered version of the two strings will be the same.
For example an anagram of "more" is "rome", we can check
that's true using the tclsh directly:
The sorted version of the two words is the same. The procedure
isanagram returns 1 if word1 is the anagram of word2,
otherwise it returns 0 because of its last line where
string equal compares the two sorted words (remember
that a Tcl procedure returns the return value of the last
command executed for default if there isn't a return command
in the execution path).
Similarly to lsort and split, the foreach command can be used
against lists of characters. The trivial case is used in order to
iterate over every character of a string:
But there is no reason to don't exploit the full power of foreach
if needed. The following program invert the position of every two
characters in a string:
Tcl strings are binary safe, so for instance you can use the above
code to translate a file composed of 16-bit numbers stored in
little endian, in 16-bit numbers stored in big endian.
Another example involves the use of the lreverse command we
wrote some section before (it just returns a version of the input
list with the order of elements reversed):
This is a comfortable way to reverse a string.
The last example is a bit more complex, but also may be more interesting.
Assume you have two strings, and want to know if the two strings have
at least one common character. It's possible to write such a code in
a compact form using Tcl's list commands and the ability to convert
strings to lists:
The concat command join strings together using a space as separator:
if the strings are valid lists, the resulting string will be a valid
list, so concat can be used to create a list that is the concatenation
of more lists. Now that you know this new command, it should not too
hard to understand how the commonChars command works:
The first two lines will just convert 'a' and 'b' to list of characters.
The second line will put a list formed concatenating 'a' and 'b'
in the variable 'union'. This is just a list of characters, where there
are both the characters of 'a' and 'b'. If 'a' and 'b' have characters
in common, the command [lsort -unique $union] will make the list shorter
(because there are duplicated elements), so the last line of the procedure
compares the length of the original list in 'union', with the length
after the list after the duplicated elements were removed. If the two
lengths are the same, the two strings don't have common characters,
otherwise they have.
And that's the usage:
That's all. I hope that this chapter shows that strings and lists are
very related in Tcl, and that can be a good idea to convert strings in
lists of characters to exploit the powerful list related commands of Tcl.
|
Other Tcl/Tk books Index
2.1 Anatomy of a command
2.2 Grouping
2.3 Program structure
2.4 Substitution of commands
2.5 Substitution of variables
2.6 More on interpolation
2.7 Comments
2.8 That's it
3.1 User defined procedures
3.2 The if command
4.1 Tcl list
4.2 The foreach command
4.3 The lrange command
4.4 The lappend command
4.5 The lset command
4.6 The lsort command
4.7 List values against variable names
5.1 The append command
5.2 The string command
5.3 string range
5.4 string index
5.5 string equal
5.6 string compare
5.7 string match
5.8 string map
5.9 string is
5.10 More string subcommands
5.11 Advanced string matching
6.1 Converting strings to lists
6.2 From strings to list of chars
6.3 Converting lists to strings
6.4 Manipulating strings as lists
7.1 Local variables
7.2 Top level
7.3 Global variables
7.4 Procedures arguments and pass by value
7.5 Procedures with a variable number of arguments
7.6 Procedures with default arguments
7.7 Recursion
7.8 Recursion limit
8.1 The switch command
8.2 The for command
8.3 break and continue
8.4 The lack of goto
9.1 Programs executing programs: the eval command
9.2 Breaking the rules with uplevel
9.3 Passing variable names to procedures
9.4 Mapping scripts to lists
9.5 The rename command
9.6 Expanding lists into arguments in Tcl 8.5
Additional 20 chapters in the printed version.
Related man pages
Links
Author HomeTclers Wiki |
Copyright © 2004 Salvatore Sanfilippo. All rights reserved. This online book is for personal use only. It cannot be copied to other web sites or further distributed in any form. |