TCLWISE
An introduction to the Tcl programming language
Sponsored Project: The Jim interpreter A small footprint implementation of Tcl |
Send a comment to the author
|
5. STRINGS
This chapter shows interesting Tcl commands to operate on strings
for basic string manipulation, string matching, regular expressions,
conversion of strings to list and vice versa. The set of string related
commands on Tcl is large as you can guess, being the string particuarly
important for the semantic of the language itself, and not just
a data type among the others. Fortunately this is one of the
part of the languages better organized, so that many commands are
not hard to remember.
5.1 The append command The append command is very similar to lappend but instead to
append elements to a list, it appends strings to a string.
The command's structure is:
append varName ?value value ...? Every argument following varName is appended to the current content
of the varName variable, and the new content of the variable
returned. Example:
The append command is very efficient, It's faster to write "append a $b"
then "set a $a$b", but both solutions work. Still it's a bit habit to
consider the speed issues when programming with Very High Level Programming
Languages such Tcl, because they are not as fast as lower level languages
like C.
5.2 The string command Instead to have different commands to perform different string operations
Tcl uses a single string manipulation command called string,
that takes as first argument the operation to do. The rest of the
arguments have different meaning in relation to the operation to perform.
In Tcl slang different operations are called subcommands.
For instance to get the length of a string, the first argument to
provide to the string command is length, that's the name of
the operation to do, or the subcommand if you prefer. The other
argument is the string itself.
The number 25 is of course the number of characters that are inside
the string "Tcl is a string processor". It's important to know that
Tcl strings are binary safe, so every kind of character can be
inside a string, including the byte with value zero:
It's better to understand this concept now because in Tcl programming
you will not use string only when you need to read a text file, but
for general programming when binary data is involved too.
The string command have many other subcommands, we will show a subset
including the more interesting in this chapter.
5.3 string range The range subcommand is used to extract parts of a string. The way
it works is very similar to the lrange command. Indexes can also be
in the form of end-<index>. The formal command structure is:
string range string start-index end-index Example:
5.4 string index The index subcommand just extracts a single character from the
whole string.
string index string index Example:
As a more interesting real-world application of the string index
command is the following procedure that inverts the
order of the characters in a string, transforming for example
"Tcl" in "lcT". Because the final string is reversed the procedure
is called stringReverse.
Even if you typed the procedure into a file, for example rev.tcl,
you can still test it with tclsh for some interactive experimenting
using the source command.
The source command tells Tcl to execute the content of the
specified file as it was typed in place of it. So after
the "source stringReverse.tcl" call, the procedure stringReverse
is defined and can be called.
5.5 string equal An operation that occurs very frequently is to compare two strings.
String equal does it searching for an exact match, that's, the
strings must match character by character to be considered the same
for the command. The return value is 1 if the two strings passed
as value are the same, otherwise 0 is returned:
"tcl" and "TCL" are not the same for string equal. If you want to
compare in a case insensitive way, there is a -nocase option
to change the behaviour and consider characters of different case
the same:
Another interesting option is -length num, that limits the comparison
to the first num characters:
The two options -nocase and -length ca be combined.
5.6 string compare This subcommand is very similar to equal, but instead to return
true or false if the strings are the same or not, the command
will return:
This gives more information compared to string equal that may be useful
for sorting or other tasks.
5.7 string match When there is the need for more powerful string matching capabilties,
string match can be used in place of string equal, because
instead to compare two strings, the command compares a string against
a pattern.
String match supports patterns composed of normal characters, and
the following special sequences:
* Matches any sequence of characters. Even an empty string.
? Matches any single character.
[chars] Matches the set of characeters specified. It's possible
to specify a squence in the x-y form, like [a-z], that
will match every character from a to z.
\x Matches exactly x without to interpret it in a special way.
This is used in order to match *, ?, [, ], \, as single
characters.
This is some example of pattern, and what it may match, in order to
make it simpler to understand how it works:
The command structure for string match is:
string match ?-nocase? pattern string The return value is 1 or 0 respectively if string matches
pattern or not. The -nocase option can be used to don't
care about the case when matching. Example:
Note that pattern containing the [x-y] form must be grouped using
braces, or quoted using \, to prevent that Tcl try to substitute it
as a command.
The last pattern in the example shows how it's possible to
match everthing is at least N chars in length using N question marks
followed by an asterisk. "???*" will match at least 3 chars, and so on.
Tcl supports more advanced pattern matching using
regular expressions, still string match is very interesting because
in most cases it's enough to express in a simpler way a pattern,
and works much faster than regular expressions commands.
5.8 string map String map is a powerful tool able to substitute occurrences of
strings with other strings. The substitution is driven by a key-value pairs
list. For example the list {foo bar x {} y yy} will replace
every occurence of "foo" with "bar", will remove every occurrence of
"x", and will duplicate every occurrence of "y".
The command structure is the following:
string match ?-nocase? pattern string Substitutions are done in an ordered way: starting from the first character
of the original string, every key in the key-value pairs list is searched.
If there is no match, the character is appended to the result that
will be returned, and the process continues from the next character.
If instead there is a match, the value relative to the matching key
is appended to the result, and the process continues from the character
just after the matching key.
The above description may appear pedanting and complex, actually it's
not hard at all to understand how string map works. It turns
every occurence of a key in the key-value pair to the occurrence of the
coresponding value. Once the programmer will get comfortable with
string map, he will probably want know the details of the substitution
process, so the above text will be more useful later when you will be
a more experieced Tcl programmer.
Examples:
Note how string map iterates just one time on the original string,
so a pattern can't match as effect of an early substitution:
When the key value paris list is not constant it's better to use
the list command to create it:
Similarly to many other string subcommands, map can take a
-nocase option in order to turn the matching process case insensitive.
5.9 string is String is tests if a string is a member of a given class, like
integers, alphanumeric characters, spaces, and so on.
The structure of the command is:
string is class ?-strict? ?-failindex varname? string For default the command returns 1 for empty strings, so
the -strict option is used to invert the behaviour and
return 0 on empty strings (i.e. to don't consider the empty
string a member of the given class).
The class can be one of the following:
As you can see some classes are oriented to a single character
(like alnum), and some are useful for strings, (like integer).
If strings composed of more then a single character are
tested against classes oriented to characters, every element
of the string must belong to the class for the command to return 1.
Some example:
If the -failidnex option followed by the name of a variable is used,
the command will store the index of the first character that failed
the test in the variable.
5.10 More string subcommands There are a big number of string subcommands that we don't cover.
The reader may like to look at the string man page to check what's
available: it's very important to know what can be done with the
built-in Tcl functionality to avoid to reimplement a feature already
available.
5.11 Advanced string matching Tcl string matching capabilities include two powerful commands,
[regexp] and [regsub], to exploit egrep-like regular expressions
facilities. This commands will be explored in chapter FIXME
of this book.
|
Other Tcl/Tk books Index
2.1 Anatomy of a command
2.2 Grouping
2.3 Program structure
2.4 Substitution of commands
2.5 Substitution of variables
2.6 More on interpolation
2.7 Comments
2.8 That's it
3.1 User defined procedures
3.2 The if command
4.1 Tcl list
4.2 The foreach command
4.3 The lrange command
4.4 The lappend command
4.5 The lset command
4.6 The lsort command
4.7 List values against variable names
5.1 The append command
5.2 The string command
5.3 string range
5.4 string index
5.5 string equal
5.6 string compare
5.7 string match
5.8 string map
5.9 string is
5.10 More string subcommands
5.11 Advanced string matching
6.1 Converting strings to lists
6.2 From strings to list of chars
6.3 Converting lists to strings
6.4 Manipulating strings as lists
7.1 Local variables
7.2 Top level
7.3 Global variables
7.4 Procedures arguments and pass by value
7.5 Procedures with a variable number of arguments
7.6 Procedures with default arguments
7.7 Recursion
7.8 Recursion limit
8.1 The switch command
8.2 The for command
8.3 break and continue
8.4 The lack of goto
9.1 Programs executing programs: the eval command
9.2 Breaking the rules with uplevel
9.3 Passing variable names to procedures
9.4 Mapping scripts to lists
9.5 The rename command
9.6 Expanding lists into arguments in Tcl 8.5
Additional 20 chapters in the printed version.
Related man pages
Links
Author HomeTclers Wiki |
Copyright © 2004 Salvatore Sanfilippo. All rights reserved. This online book is for personal use only. It cannot be copied to other web sites or further distributed in any form. |