TCLWISE
An introduction to the Tcl programming language

Sponsored Project: The Jim interpreter
A small footprint implementation of Tcl

Send a comment to the author


5. STRINGS


This chapter shows interesting Tcl commands to operate on strings for basic string manipulation, string matching, regular expressions, conversion of strings to list and vice versa. The set of string related commands on Tcl is large as you can guess, being the string particuarly important for the semantic of the language itself, and not just a data type among the others. Fortunately this is one of the part of the languages better organized, so that many commands are not hard to remember.

5.1 The append command


The append command is very similar to lappend but instead to append elements to a list, it appends strings to a string. The command's structure is:


append varName ?value value ...?


Every argument following varName is appended to the current content of the varName variable, and the new content of the variable returned. Example:

% set s "foo" % foo % append s bar % foobar % append s x y z [string length $s] % foobarxyz6

The append command is very efficient, It's faster to write "append a $b" then "set a $a$b", but both solutions work. Still it's a bit habit to consider the speed issues when programming with Very High Level Programming Languages such Tcl, because they are not as fast as lower level languages like C.

5.2 The string command


Instead to have different commands to perform different string operations Tcl uses a single string manipulation command called string, that takes as first argument the operation to do. The rest of the arguments have different meaning in relation to the operation to perform. In Tcl slang different operations are called subcommands.

For instance to get the length of a string, the first argument to provide to the string command is length, that's the name of the operation to do, or the subcommand if you prefer. The other argument is the string itself.

% string length "Tcl is a string processor" 25 %

The number 25 is of course the number of characters that are inside the string "Tcl is a string processor". It's important to know that Tcl strings are binary safe, so every kind of character can be inside a string, including the byte with value zero:

% string length "ab\000xy" 5

It's better to understand this concept now because in Tcl programming you will not use string only when you need to read a text file, but for general programming when binary data is involved too.

The string command have many other subcommands, we will show a subset including the more interesting in this chapter.

5.3 string range


The range subcommand is used to extract parts of a string. The way it works is very similar to the lrange command. Indexes can also be in the form of end-<index>. The formal command structure is:


string range string start-index end-index


Example:

% string range "Dante Alighieri is a Tcl user" 7 end-10 Alighieri is %

5.4 string index


The index subcommand just extracts a single character from the whole string.


string index string index


Example:

% string index "foobar" 3 b % string index "foobar" end r

As a more interesting real-world application of the string index command is the following procedure that inverts the order of the characters in a string, transforming for example "Tcl" in "lcT". Because the final string is reversed the procedure is called stringReverse.

proc stringReverse s { set res {} for {set i 0} {$i < [string length $s]} {incr i} { append res [string index $s end-$i] } return $res }

Even if you typed the procedure into a file, for example rev.tcl, you can still test it with tclsh for some interactive experimenting using the source command.

% source stringReverse.tcl % stringReverse "string to reverse" esrever ot gnirts %

The source command tells Tcl to execute the content of the specified file as it was typed in place of it. So after the "source stringReverse.tcl" call, the procedure stringReverse is defined and can be called.

5.5 string equal


An operation that occurs very frequently is to compare two strings. String equal does it searching for an exact match, that's, the strings must match character by character to be considered the same for the command. The return value is 1 if the two strings passed as value are the same, otherwise 0 is returned:

% string equal foo bar 0 % string equal tcl tcl 1 % string equal tcl TCL 0

"tcl" and "TCL" are not the same for string equal. If you want to compare in a case insensitive way, there is a -nocase option to change the behaviour and consider characters of different case the same:

% string equal -nocase tcl TCL 1

Another interesting option is -length num, that limits the comparison to the first num characters:

% string equal Petroleum Peter 0 % string equal -length 3 Petroleum Peter 1

The two options -nocase and -length ca be combined.

5.6 string compare


This subcommand is very similar to equal, but instead to return true or false if the strings are the same or not, the command will return:

-1 if the first string is < than the second 0 if the first string is the same as the second 1 if the first string is > than the second

This gives more information compared to string equal that may be useful for sorting or other tasks.

5.7 string match


When there is the need for more powerful string matching capabilties, string match can be used in place of string equal, because instead to compare two strings, the command compares a string against a pattern.

String match supports patterns composed of normal characters, and the following special sequences:

* Matches any sequence of characters. Even an empty string. ? Matches any single character. [chars] Matches the set of characeters specified. It's possible to specify a squence in the x-y form, like [a-z], that will match every character from a to z. \x Matches exactly x without to interpret it in a special way. This is used in order to match *, ?, [, ], \, as single characters.

This is some example of pattern, and what it may match, in order to make it simpler to understand how it works:

*xyz* can match xyz, fooxyz, fooxyzbar, and so on. x?z can match xaz, xxz, x8z, but can't match xz. [ab]c can match ac, bc. [a-z]*[0-9] can match alf4, biz5, but can't match 123, 2foo1

The command structure for string match is:


string match ?-nocase? pattern string


The return value is 1 or 0 respectively if string matches pattern or not. The -nocase option can be used to don't care about the case when matching. Example:

% string match {[0-9]} 5 1 % string match foo* foo 1 % string match foo* foobar 1 % string match foo* barfoo 0 % string match ?*@*.* antirez@invece.org 1 % string match ?*@?*.?* antirez@invece.org 1 % string match ?*@?*.?* antirez@org 0 %

Note that pattern containing the [x-y] form must be grouped using braces, or quoted using \, to prevent that Tcl try to substitute it as a command.

The last pattern in the example shows how it's possible to match everthing is at least N chars in length using N question marks followed by an asterisk. "???*" will match at least 3 chars, and so on. Tcl supports more advanced pattern matching using regular expressions, still string match is very interesting because in most cases it's enough to express in a simpler way a pattern, and works much faster than regular expressions commands.

5.8 string map


String map is a powerful tool able to substitute occurrences of strings with other strings. The substitution is driven by a key-value pairs list. For example the list {foo bar x {} y yy} will replace every occurence of "foo" with "bar", will remove every occurrence of "x", and will duplicate every occurrence of "y". The command structure is the following:


string match ?-nocase? pattern string


Substitutions are done in an ordered way: starting from the first character of the original string, every key in the key-value pairs list is searched. If there is no match, the character is appended to the result that will be returned, and the process continues from the next character. If instead there is a match, the value relative to the matching key is appended to the result, and the process continues from the character just after the matching key.

The above description may appear pedanting and complex, actually it's not hard at all to understand how string map works. It turns every occurence of a key in the key-value pair to the occurrence of the coresponding value. Once the programmer will get comfortable with string map, he will probably want know the details of the substitution process, so the above text will be more useful later when you will be a more experieced Tcl programmer.

Examples:

% string map {x {}} exchange echange % string map {1 Tcl 2 great} "1 is 2" Tcl is great

Note how string map iterates just one time on the original string, so a pattern can't match as effect of an early substitution:

% string map {{ } xx x yyy} "Hello World" HelloxxWorld

When the key value paris list is not constant it's better to use the list command to create it:

% set a foo foo % set b bar bar % string map [list $a $b $b $a] foobar barfoo %

Similarly to many other string subcommands, map can take a -nocase option in order to turn the matching process case insensitive.

5.9 string is


String is tests if a string is a member of a given class, like integers, alphanumeric characters, spaces, and so on. The structure of the command is:


string is class ?-strict? ?-failindex varname? string


For default the command returns 1 for empty strings, so the -strict option is used to invert the behaviour and return 0 on empty strings (i.e. to don't consider the empty string a member of the given class).

The class can be one of the following:

alnum alphabet or digit character alpha alphabet character ascii every character in the 7-bit ASCII range boolean any form allowed for Tcl booleans (0, 1, yes, no, ...) control a control character digit a digit character double a valid Tcl double precision number false any form allowed for Tcl boolean with false value graph a printing character, except space integer any valid form of 32-bit integers lower a lovercase alphabet character print a printing character including space punct punctuation character space any space character true any form allowed for Tcl boolean with true value upper an uppercase lphabet character wordchar any word character. alphanumeric, puntuation, underscore xdigit an hexadecimal digit

As you can see some classes are oriented to a single character (like alnum), and some are useful for strings, (like integer). If strings composed of more then a single character are tested against classes oriented to characters, every element of the string must belong to the class for the command to return 1. Some example:

% string is integer 33902123 1 % string is integer foobar 0 % string is upper K 1 % string is lower K 0 % string is upper "KKK" 0 % string is upper "KKz" 0

If the -failidnex option followed by the name of a variable is used, the command will store the index of the first character that failed the test in the variable.

5.10 More string subcommands


There are a big number of string subcommands that we don't cover. The reader may like to look at the string man page to check what's available: it's very important to know what can be done with the built-in Tcl functionality to avoid to reimplement a feature already available.

5.11 Advanced string matching


Tcl string matching capabilities include two powerful commands, [regexp] and [regsub], to exploit egrep-like regular expressions facilities. This commands will be explored in chapter FIXME of this book.

Other Tcl/Tk books
Index
2.1 Anatomy of a command
2.2 Grouping
2.3 Program structure
2.4 Substitution of commands
2.5 Substitution of variables
2.6 More on interpolation
2.7 Comments
2.8 That's it
3.1 User defined procedures
3.2 The if command
4.1 Tcl list
4.2 The foreach command
4.3 The lrange command
4.4 The lappend command
4.5 The lset command
4.6 The lsort command
4.7 List values against variable names
5.1 The append command
5.2 The string command
5.3 string range
5.4 string index
5.5 string equal
5.6 string compare
5.7 string match
5.8 string map
5.9 string is
5.10 More string subcommands
5.11 Advanced string matching
6.1 Converting strings to lists
6.2 From strings to list of chars
6.3 Converting lists to strings
6.4 Manipulating strings as lists
7.1 Local variables
7.2 Top level
7.3 Global variables
7.4 Procedures arguments and pass by value
7.5 Procedures with a variable number of arguments
7.6 Procedures with default arguments
7.7 Recursion
7.8 Recursion limit
8.1 The switch command
8.2 The for command
8.3 break and continue
8.4 The lack of goto
9.1 Programs executing programs: the eval command
9.2 Breaking the rules with uplevel
9.3 Passing variable names to procedures
9.4 Mapping scripts to lists
9.5 The rename command
9.6 Expanding lists into arguments in Tcl 8.5
Additional 20 chapters in the printed version.

Related man pages


Links
Author Home
Tclers Wiki


Copyright © 2004 Salvatore Sanfilippo. All rights reserved.
This online book is for personal use only.
It cannot be copied to other web sites or further distributed in any form.