initial import

author: Roland McGrath <roland@gnu.org> 1995-02-18 01:27:10 +0000
committer: Roland McGrath <roland@gnu.org> 1995-02-18 01:27:10 +0000
commit: 28f540f45bbacd939bfd07f213bcad2bf730b1bf (patch)
tree: 15f07c4c43d635959c6afee96bde71fb1b3614ee /manual/string.texi
download: glibc-28f540f45bbacd939bfd07f213bcad2bf730b1bf.tar
glibc-28f540f45bbacd939bfd07f213bcad2bf730b1bf.tar.gz
glibc-28f540f45bbacd939bfd07f213bcad2bf730b1bf.tar.bz2
glibc-28f540f45bbacd939bfd07f213bcad2bf730b1bf.zip
1 files changed, 947 insertions, 0 deletions
diff --git a/manual/string.texi b/manual/string.texi
new file mode 100644
index 0000000000..c638912229
--- /dev/null
+++ b/manual/string.texi
@@ -0,0 +1,947 @@
+@node String and Array Utilities, Extended Characters, Character Handling, Top
+@chapter String and Array Utilities
+
+Operations on strings (or arrays of characters) are an important part of
+many programs.  The GNU C library provides an extensive set of string
+utility functions, including functions for copying, concatenating,
+comparing, and searching strings.  Many of these functions can also
+operate on arbitrary regions of storage; for example, the @code{memcpy}
+function can be used to copy the contents of any kind of array.  
+
+It's fairly common for beginning C programmers to ``reinvent the wheel''
+by duplicating this functionality in their own code, but it pays to
+become familiar with the library functions and to make use of them,
+since this offers benefits in maintenance, efficiency, and portability.
+
+For instance, you could easily compare one string to another in two
+lines of C code, but if you use the built-in @code{strcmp} function,
+you're less likely to make a mistake.  And, since these library
+functions are typically highly optimized, your program may run faster
+too.
+
+@menu
+* Representation of Strings::   Introduction to basic concepts.
+* String/Array Conventions::    Whether to use a string function or an
+				 arbitrary array function.
+* String Length::               Determining the length of a string.
+* Copying and Concatenation::   Functions to copy the contents of strings
+				 and arrays.
+* String/Array Comparison::     Functions for byte-wise and character-wise
+				 comparison.
+* Collation Functions::         Functions for collating strings.
+* Search Functions::            Searching for a specific element or substring.
+* Finding Tokens in a String::  Splitting a string into tokens by looking
+				 for delimiters.
+@end menu
+
+@node Representation of Strings, String/Array Conventions,  , String and Array Utilities
+@section Representation of Strings
+@cindex string, representation of
+
+This section is a quick summary of string concepts for beginning C
+programmers.  It describes how character strings are represented in C
+and some common pitfalls.  If you are already familiar with this
+material, you can skip this section.
+
+@cindex string
+@cindex null character
+A @dfn{string} is an array of @code{char} objects.  But string-valued
+variables are usually declared to be pointers of type @code{char *}.
+Such variables do not include space for the text of a string; that has
+to be stored somewhere else---in an array variable, a string constant,
+or dynamically allocated memory (@pxref{Memory Allocation}).  It's up to
+you to store the address of the chosen memory space into the pointer
+variable.  Alternatively you can store a @dfn{null pointer} in the
+pointer variable.  The null pointer does not point anywhere, so
+attempting to reference the string it points to gets an error.
+
+By convention, a @dfn{null character}, @code{'\0'}, marks the end of a
+string.  For example, in testing to see whether the @code{char *}
+variable @var{p} points to a null character marking the end of a string,
+you can write @code{!*@var{p}} or @code{*@var{p} == '\0'}.
+
+A null character is quite different conceptually from a null pointer,
+although both are represented by the integer @code{0}.
+
+@cindex string literal
+@dfn{String literals} appear in C program source as strings of
+characters between double-quote characters (@samp{"}).  In ANSI C,
+string literals can also be formed by @dfn{string concatenation}:
+@code{"a" "b"} is the same as @code{"ab"}.  Modification of string
+literals is not allowed by the GNU C compiler, because literals
+are placed in read-only storage.
+
+Character arrays that are declared @code{const} cannot be modified
+either.  It's generally good style to declare non-modifiable string
+pointers to be of type @code{const char *}, since this often allows the
+C compiler to detect accidental modifications as well as providing some
+amount of documentation about what your program intends to do with the
+string.
+
+The amount of memory allocated for the character array may extend past
+the null character that normally marks the end of the string.  In this
+document, the term @dfn{allocation size} is always used to refer to the
+total amount of memory allocated for the string, while the term
+@dfn{length} refers to the number of characters up to (but not
+including) the terminating null character.
+@cindex length of string
+@cindex allocation size of string
+@cindex size of string
+@cindex string length
+@cindex string allocation
+
+A notorious source of program bugs is trying to put more characters in a
+string than fit in its allocated size.  When writing code that extends
+strings or moves characters into a pre-allocated array, you should be
+very careful to keep track of the length of the text and make explicit
+checks for overflowing the array.  Many of the library functions
+@emph{do not} do this for you!  Remember also that you need to allocate
+an extra byte to hold the null character that marks the end of the
+string.
+
+@node String/Array Conventions, String Length, Representation of Strings, String and Array Utilities
+@section String and Array Conventions
+
+This chapter describes both functions that work on arbitrary arrays or
+blocks of memory, and functions that are specific to null-terminated
+arrays of characters.
+
+Functions that operate on arbitrary blocks of memory have names
+beginning with @samp{mem} (such as @code{memcpy}) and invariably take an
+argument which specifies the size (in bytes) of the block of memory to
+operate on.  The array arguments and return values for these functions
+have type @code{void *}, and as a matter of style, the elements of these
+arrays are referred to as ``bytes''.  You can pass any kind of pointer
+to these functions, and the @code{sizeof} operator is useful in
+computing the value for the size argument.
+
+In contrast, functions that operate specifically on strings have names
+beginning with @samp{str} (such as @code{strcpy}) and look for a null
+character to terminate the string instead of requiring an explicit size
+argument to be passed.  (Some of these functions accept a specified
+maximum length, but they also check for premature termination with a
+null character.)  The array arguments and return values for these
+functions have type @code{char *}, and the array elements are referred
+to as ``characters''.
+
+In many cases, there are both @samp{mem} and @samp{str} versions of a
+function.  The one that is more appropriate to use depends on the exact
+situation.  When your program is manipulating arbitrary arrays or blocks of
+storage, then you should always use the @samp{mem} functions.  On the
+other hand, when you are manipulating null-terminated strings it is
+usually more convenient to use the @samp{str} functions, unless you
+already know the length of the string in advance.
+
+@node String Length, Copying and Concatenation, String/Array Conventions, String and Array Utilities
+@section String Length
+
+You can get the length of a string using the @code{strlen} function.
+This function is declared in the header file @file{string.h}.
+@pindex string.h
+
+@comment string.h
+@comment ANSI
+@deftypefun size_t strlen (const char *@var{s})
+The @code{strlen} function returns the length of the null-terminated
+string @var{s}.  (In other words, it returns the offset of the terminating
+null character within the array.)
+
+For example,
+@smallexample
+strlen ("hello, world")
+    @result{} 12
+@end smallexample
+
+When applied to a character array, the @code{strlen} function returns
+the length of the string stored there, not its allocation size.  You can
+get the allocation size of the character array that holds a string using
+the @code{sizeof} operator:
+
+@smallexample
+char string[32] = "hello, world"; 
+sizeof (string)
+    @result{} 32
+strlen (string)
+    @result{} 12
+@end smallexample
+@end deftypefun
+
+@node Copying and Concatenation, String/Array Comparison, String Length, String and Array Utilities
+@section Copying and Concatenation
+
+You can use the functions described in this section to copy the contents
+of strings and arrays, or to append the contents of one string to
+another.  These functions are declared in the header file
+@file{string.h}.
+@pindex string.h
+@cindex copying strings and arrays
+@cindex string copy functions
+@cindex array copy functions
+@cindex concatenating strings
+@cindex string concatenation functions
+
+A helpful way to remember the ordering of the arguments to the functions
+in this section is that it corresponds to an assignment expression, with
+the destination array specified to the left of the source array.  All
+of these functions return the address of the destination array.
+
+Most of these functions do not work properly if the source and
+destination arrays overlap.  For example, if the beginning of the
+destination array overlaps the end of the source array, the original
+contents of that part of the source array may get overwritten before it
+is copied.  Even worse, in the case of the string functions, the null
+character marking the end of the string may be lost, and the copy
+function might get stuck in a loop trashing all the memory allocated to
+your program.
+
+All functions that have problems copying between overlapping arrays are
+explicitly identified in this manual.  In addition to functions in this
+section, there are a few others like @code{sprintf} (@pxref{Formatted
+Output Functions}) and @code{scanf} (@pxref{Formatted Input
+Functions}).
+
+@comment string.h
+@comment ANSI
+@deftypefun {void *} memcpy (void *@var{to}, const void *@var{from}, size_t @var{size})
+The @code{memcpy} function copies @var{size} bytes from the object
+beginning at @var{from} into the object beginning at @var{to}.  The
+behavior of this function is undefined if the two arrays @var{to} and
+@var{from} overlap; use @code{memmove} instead if overlapping is possible.
+
+The value returned by @code{memcpy} is the value of @var{to}.
+
+Here is an example of how you might use @code{memcpy} to copy the
+contents of an array:
+
+@smallexample
+struct foo *oldarray, *newarray;
+int arraysize;
+@dots{}
+memcpy (new, old, arraysize * sizeof (struct foo));
+@end smallexample
+@end deftypefun
+
+@comment string.h
+@comment ANSI
+@deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size})
+@code{memmove} copies the @var{size} bytes at @var{from} into the
+@var{size} bytes at @var{to}, even if those two blocks of space
+overlap.  In the case of overlap, @code{memmove} is careful to copy the
+original values of the bytes in the block at @var{from}, including those
+bytes which also belong to the block at @var{to}.
+@end deftypefun
+
+@comment string.h
+@comment SVID
+@deftypefun {void *} memccpy (void *@var{to}, const void *@var{from}, int @var{c}, size_t @var{size})
+This function copies no more than @var{size} bytes from @var{from} to
+@var{to}, stopping if a byte matching @var{c} is found.  The return
+value is a pointer into @var{to} one byte past where @var{c} was copied,
+or a null pointer if no byte matching @var{c} appeared in the first
+@var{size} bytes of @var{from}.
+@end deftypefun
+
+@comment string.h
+@comment ANSI
+@deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size})
+This function copies the value of @var{c} (converted to an
+@code{unsigned char}) into each of the first @var{size} bytes of the
+object beginning at @var{block}.  It returns the value of @var{block}.
+@end deftypefun
+
+@comment string.h
+@comment ANSI
+@deftypefun {char *} strcpy (char *@var{to}, const char *@var{from})
+This copies characters from the string @var{from} (up to and including
+the terminating null character) into the string @var{to}.  Like
+@code{memcpy}, this function has undefined results if the strings
+overlap.  The return value is the value of @var{to}.
+@end deftypefun
+
+@comment string.h
+@comment ANSI
+@deftypefun {char *} strncpy (char *@var{to}, const char *@var{from}, size_t @var{size})
+This function is similar to @code{strcpy} but always copies exactly
+@var{size} characters into @var{to}.
+
+If the length of @var{from} is more than @var{size}, then @code{strncpy}
+copies just the first @var{size} characters.  Note that in this case
+there is no null terminator written into @var{to}.
+
+If the length of @var{from} is less than @var{size}, then @code{strncpy}
+copies all of @var{from}, followed by enough null characters to add up
+to @var{size} characters in all.  This behavior is rarely useful, but it
+is specified by the ANSI C standard.
+
+The behavior of @code{strncpy} is undefined if the strings overlap.
+
+Using @code{strncpy} as opposed to @code{strcpy} is a way to avoid bugs
+relating to writing past the end of the allocated space for @var{to}.
+However, it can also make your program much slower in one common case:
+copying a string which is probably small into a potentially large buffer.
+In this case, @var{size} may be large, and when it is, @code{strncpy} will
+waste a considerable amount of time copying null characters.
+@end deftypefun
+
+@comment string.h
+@comment SVID
+@deftypefun {char *} strdup (const char *@var{s})
+This function copies the null-terminated string @var{s} into a newly
+allocated string.  The string is allocated using @code{malloc}; see
+@ref{Unconstrained Allocation}.  If @code{malloc} cannot allocate space
+for the new string, @code{strdup} returns a null pointer.  Otherwise it
+returns a pointer to the new string.
+@end deftypefun
+
+@comment string.h
+@comment Unknown origin
+@deftypefun {char *} stpcpy (char *@var{to}, const char *@var{from})
+This function is like @code{strcpy}, except that it returns a pointer to
+the end of the string @var{to} (that is, the address of the terminating
+null character) rather than the beginning.
+
+For example, this program uses @code{stpcpy} to concatenate @samp{foo}
+and @samp{bar} to produce @samp{foobar}, which it then prints.
+
+@smallexample
+@include stpcpy.c.texi
+@end smallexample
+
+This function is not part of the ANSI or POSIX standards, and is not
+customary on Unix systems, but we did not invent it either.  Perhaps it
+comes from MS-DOG.
+
+Its behavior is undefined if the strings overlap.
+@end deftypefun
+
+@comment string.h
+@comment ANSI
+@deftypefun {char *} strcat (char *@var{to}, const char *@var{from})
+The @code{strcat} function is similar to @code{strcpy}, except that the
+characters from @var{from} are concatenated or appended to the end of
+@var{to}, instead of overwriting it.  That is, the first character from
+@var{from} overwrites the null character marking the end of @var{to}.
+
+An equivalent definition for @code{strcat} would be:
+
+@smallexample
+char *
+strcat (char *to, const char *from)
+@{
+  strcpy (to + strlen (to), from);
+  return to;
+@}
+@end smallexample
+
+This function has undefined results if the strings overlap.
+@end deftypefun
+
+@comment string.h
+@comment ANSI
+@deftypefun {char *} strncat (char *@var{to}, const char *@var{from}, size_t @var{size})
+This function is like @code{strcat} except that not more than @var{size}
+characters from @var{from} are appended to the end of @var{to}.  A
+single null character is also always appended to @var{to}, so the total
+allocated size of @var{to} must be at least @code{@var{size} + 1} bytes
+longer than its initial length.
+
+The @code{strncat} function could be implemented like this:
+
+@smallexample
+@group
+char *
+strncat (char *to, const char *from, size_t size)
+@{
+  strncpy (to + strlen (to), from, size);
+  return to;
+@}
+@end group
+@end smallexample
+
+The behavior of @code{strncat} is undefined if the strings overlap.
+@end deftypefun
+
+Here is an example showing the use of @code{strncpy} and @code{strncat}.
+Notice how, in the call to @code{strncat}, the @var{size} parameter
+is computed to avoid overflowing the character array @code{buffer}.
+
+@smallexample
+@include strncat.c.texi
+@end smallexample
+
+@noindent
+The output produced by this program looks like:
+
+@smallexample
+hello
+hello, wo
+@end smallexample
+
+@comment string.h
+@comment BSD
+@deftypefun {void *} bcopy (void *@var{from}, const void *@var{to}, size_t @var{size})
+This is a partially obsolete alternative for @code{memmove}, derived from
+BSD.  Note that it is not quite equivalent to @code{memmove}, because the
+arguments are not in the same order.
+@end deftypefun
+
+@comment string.h
+@comment BSD
+@deftypefun {void *} bzero (void *@var{block}, size_t @var{size})
+This is a partially obsolete alternative for @code{memset}, derived from
+BSD.  Note that it is not as general as @code{memset}, because the only
+value it can store is zero.
+@end deftypefun
+
+@node String/Array Comparison, Collation Functions, Copying and Concatenation, String and Array Utilities
+@section String/Array Comparison
+@cindex comparing strings and arrays
+@cindex string comparison functions
+@cindex array comparison functions
+@cindex predicates on strings
+@cindex predicates on arrays
+
+You can use the functions in this section to perform comparisons on the
+contents of strings and arrays.  As well as checking for equality, these
+functions can also be used as the ordering functions for sorting
+operations.  @xref{Searching and Sorting}, for an example of this.
+
+Unlike most comparison operations in C, the string comparison functions
+return a nonzero value if the strings are @emph{not} equivalent rather
+than if they are.  The sign of the value indicates the relative ordering
+of the first characters in the strings that are not equivalent:  a
+negative value indicates that the first string is ``less'' than the
+second, while a positive value indicates that the first string is 
+``greater''.
+
+The most common use of these functions is to check only for equality.
+This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}.
+
+All of these functions are declared in the header file @file{string.h}.
+@pindex string.h
+
+@comment string.h
+@comment ANSI
+@deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
+The function @code{memcmp} compares the @var{size} bytes of memory
+beginning at @var{a1} against the @var{size} bytes of memory beginning
+at @var{a2}.  The value returned has the same sign as the difference
+between the first differing pair of bytes (interpreted as @code{unsigned
+char} objects, then promoted to @code{int}).
+
+If the contents of the two blocks are equal, @code{memcmp} returns
+@code{0}.
+@end deftypefun
+
+On arbitrary arrays, the @code{memcmp} function is mostly useful for
+testing equality.  It usually isn't meaningful to do byte-wise ordering
+comparisons on arrays of things other than bytes.  For example, a
+byte-wise comparison on the bytes that make up floating-point numbers
+isn't likely to tell you anything about the relationship between the
+values of the floating-point numbers.
+
+You should also be careful about using @code{memcmp} to compare objects
+that can contain ``holes'', such as the padding inserted into structure
+objects to enforce alignment requirements, extra space at the end of
+unions, and extra characters at the ends of strings whose length is less
+than their allocated size.  The contents of these ``holes'' are
+indeterminate and may cause strange behavior when performing byte-wise
+comparisons.  For more predictable results, perform an explicit
+component-wise comparison.
+
+For example, given a structure type definition like:
+
+@smallexample
+struct foo
+  @{
+    unsigned char tag;
+    union
+      @{
+        double f;
+        long i;
+        char *p;
+      @} value;
+  @};
+@end smallexample
+
+@noindent
+you are better off writing a specialized comparison function to compare
+@code{struct foo} objects instead of comparing them with @code{memcmp}.
+
+@comment string.h
+@comment ANSI
+@deftypefun int strcmp (const char *@var{s1}, const char *@var{s2})
+The @code{strcmp} function compares the string @var{s1} against
+@var{s2}, returning a value that has the same sign as the difference
+between the first differing pair of characters (interpreted as
+@code{unsigned char} objects, then promoted to @code{int}).
+
+If the two strings are equal, @code{strcmp} returns @code{0}.
+
+A consequence of the ordering used by @code{strcmp} is that if @var{s1}
+is an initial substring of @var{s2}, then @var{s1} is considered to be
+``less than'' @var{s2}.
+@end deftypefun
+
+@comment string.h
+@comment BSD
+@deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2})
+This function is like @code{strcmp}, except that differences in case
+are ignored.
+
+@code{strcasecmp} is derived from BSD.
+@end deftypefun
+
+@comment string.h
+@comment BSD
+@deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n})
+This function is like @code{strncmp}, except that differences in case
+are ignored.
+
+@code{strncasecmp} is a GNU extension.
+@end deftypefun
+
+@comment string.h
+@comment ANSI
+@deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size})
+This function is the similar to @code{strcmp}, except that no more than
+@var{size} characters are compared.  In other words, if the two strings are
+the same in their first @var{size} characters, the return value is zero.
+@end deftypefun
+
+Here are some examples showing the use of @code{strcmp} and @code{strncmp}.
+These examples assume the use of the ASCII character set.  (If some
+other character set---say, EBCDIC---is used instead, then the glyphs
+are associated with different numeric codes, and the return values
+and ordering may differ.)
+
+@smallexample
+strcmp ("hello", "hello")
+    @result{} 0    /* @r{These two strings are the same.} */
+strcmp ("hello", "Hello")
+    @result{} 32   /* @r{Comparisons are case-sensitive.} */
+strcmp ("hello", "world")
+    @result{} -15  /* @r{The character @code{'h'} comes before @code{'w'}.} */
+strcmp ("hello", "hello, world")
+    @result{} -44  /* @r{Comparing a null character against a comma.} */
+strncmp ("hello", "hello, world"", 5)
+    @result{} 0    /* @r{The initial 5 characters are the same.} */
+strncmp ("hello, world", "hello, stupid world!!!", 5)
+    @result{} 0    /* @r{The initial 5 characters are the same.} */
+@end smallexample
+
+@comment string.h
+@comment BSD
+@deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
+This is an obsolete alias for @code{memcmp}, derived from BSD.
+@end deftypefun
+
+@node Collation Functions, Search Functions, String/Array Comparison, String and Array Utilities
+@section Collation Functions
+
+@cindex collating strings
+@cindex string collation functions
+
+In some locales, the conventions for lexicographic ordering differ from
+the strict numeric ordering of character codes.  For example, in Spanish
+most glyphs with diacritical marks such as accents are not considered
+distinct letters for the purposes of collation.  On the other hand, the
+two-character sequence @samp{ll} is treated as a single letter that is
+collated immediately after @samp{l}.
+
+You can use the functions @code{strcoll} and @code{strxfrm} (declared in
+the header file @file{string.h}) to compare strings using a collation
+ordering appropriate for the current locale.  The locale used by these
+functions in particular can be specified by setting the locale for the
+@code{LC_COLLATE} category; see @ref{Locales}.
+@pindex string.h
+
+In the standard C locale, the collation sequence for @code{strcoll} is
+the same as that for @code{strcmp}.
+
+Effectively, the way these functions work is by applying a mapping to
+transform the characters in a string to a byte sequence that represents
+the string's position in the collating sequence of the current locale.
+Comparing two such byte sequences in a simple fashion is equivalent to
+comparing the strings with the locale's collating sequence.
+
+The function @code{strcoll} performs this translation implicitly, in
+order to do one comparison.  By contrast, @code{strxfrm} performs the
+mapping explicitly.  If you are making multiple comparisons using the
+same string or set of strings, it is likely to be more efficient to use
+@code{strxfrm} to transform all the strings just once, and subsequently
+compare the transformed strings with @code{strcmp}.
+
+@comment string.h
+@comment ANSI
+@deftypefun int strcoll (const char *@var{s1}, const char *@var{s2})
+The @code{strcoll} function is similar to @code{strcmp} but uses the
+collating sequence of the current locale for collation (the
+@code{LC_COLLATE} locale).
+@end deftypefun
+
+Here is an example of sorting an array of strings, using @code{strcoll}
+to compare them.  The actual sort algorithm is not written here; it
+comes from @code{qsort} (@pxref{Array Sort Function}).  The job of the
+code shown here is to say how to compare the strings while sorting them.
+(Later on in this section, we will show a way to do this more
+efficiently using @code{strxfrm}.)
+
+@smallexample
+/* @r{This is the comparison function used with @code{qsort}.} */
+
+int
+compare_elements (char **p1, char **p2)
+@{
+  return strcoll (*p1, *p2);
+@}
+
+/* @r{This is the entry point---the function to sort}
+   @r{strings using the locale's collating sequence.} */
+
+void
+sort_strings (char **array, int nstrings)
+@{
+  /* @r{Sort @code{temp_array} by comparing the strings.} */
+  qsort (array, sizeof (char *),
+         nstrings, compare_elements);
+@}
+@end smallexample
+
+@cindex converting string to collation order
+@comment string.h
+@comment ANSI
+@deftypefun size_t strxfrm (char *@var{to}, const char *@var{from}, size_t @var{size})
+The function @code{strxfrm} transforms @var{string} using the collation
+transformation determined by the locale currently selected for
+collation, and stores the transformed string in the array @var{to}.  Up
+to @var{size} characters (including a terminating null character) are
+stored.
+
+The behavior is undefined if the strings @var{to} and @var{from}
+overlap; see @ref{Copying and Concatenation}.
+
+The return value is the length of the entire transformed string.  This
+value is not affected by the value of @var{size}, but if it is greater
+than @var{size}, it means that the transformed string did not entirely
+fit in the array @var{to}.  In this case, only as much of the string as
+actually fits was stored.  To get the whole transformed string, call
+@code{strxfrm} again with a bigger output array.
+
+The transformed string may be longer than the original string, and it
+may also be shorter.
+
+If @var{size} is zero, no characters are stored in @var{to}.  In this
+case, @code{strxfrm} simply returns the number of characters that would
+be the length of the transformed string.  This is useful for determining
+what size string to allocate.  It does not matter what @var{to} is if
+@var{size} is zero; @var{to} may even be a null pointer.
+@end deftypefun
+
+Here is an example of how you can use @code{strxfrm} when
+you plan to do many comparisons.  It does the same thing as the previous
+example, but much faster, because it has to transform each string only
+once, no matter how many times it is compared with other strings.  Even
+the time needed to allocate and free storage is much less than the time
+we save, when there are many strings.
+
+@smallexample
+struct sorter @{ char *input; char *transformed; @};
+
+/* @r{This is the comparison function used with @code{qsort}}
+   @r{to sort an array of @code{struct sorter}.} */
+
+int
+compare_elements (struct sorter *p1, struct sorter *p2)
+@{
+  return strcmp (p1->transformed, p2->transformed);
+@}
+
+/* @r{This is the entry point---the function to sort}
+   @r{strings using the locale's collating sequence.} */
+
+void
+sort_strings_fast (char **array, int nstrings)
+@{
+  struct sorter temp_array[nstrings];
+  int i;
+
+  /* @r{Set up @code{temp_array}.  Each element contains}
+     @r{one input string and its transformed string.} */
+  for (i = 0; i < nstrings; i++)
+    @{
+      size_t length = strlen (array[i]) * 2;
+
+      temp_array[i].input = array[i];
+
+      /* @r{Transform @code{array[i]}.}
+         @r{First try a buffer probably big enough.} */
+      while (1)
+        @{
+          char *transformed = (char *) xmalloc (length);
+          if (strxfrm (transformed, array[i], length) < length)
+            @{
+              temp_array[i].transformed = transformed;
+              break;
+            @}
+          /* @r{Try again with a bigger buffer.} */
+          free (transformed);
+          length *= 2;
+        @}
+    @}
+
+  /* @r{Sort @code{temp_array} by comparing transformed strings.} */
+  qsort (temp_array, sizeof (struct sorter),
+         nstrings, compare_elements);
+
+  /* @r{Put the elements back in the permanent array}
+     @r{in their sorted order.} */
+  for (i = 0; i < nstrings; i++)
+    array[i] = temp_array[i].input;
+
+  /* @r{Free the strings we allocated.} */
+  for (i = 0; i < nstrings; i++)
+    free (temp_array[i].transformed);
+@}
+@end smallexample
+
+@strong{Compatibility Note:}  The string collation functions are a new
+feature of ANSI C.  Older C dialects have no equivalent feature.
+
+@node Search Functions, Finding Tokens in a String, Collation Functions, String and Array Utilities
+@section Search Functions
+
+This section describes library functions which perform various kinds
+of searching operations on strings and arrays.  These functions are
+declared in the header file @file{string.h}.
+@pindex string.h
+@cindex search functions (for strings)
+@cindex string search functions
+
+@comment string.h
+@comment ANSI
+@deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size})
+This function finds the first occurrence of the byte @var{c} (converted
+to an @code{unsigned char}) in the initial @var{size} bytes of the
+object beginning at @var{block}.  The return value is a pointer to the
+located byte, or a null pointer if no match was found.
+@end deftypefun
+
+@comment string.h
+@comment ANSI
+@deftypefun {char *} strchr (const char *@var{string}, int @var{c})
+The @code{strchr} function finds the first occurrence of the character
+@var{c} (converted to a @code{char}) in the null-terminated string
+beginning at @var{string}.  The return value is a pointer to the located
+character, or a null pointer if no match was found.
+
+For example,
+@smallexample
+strchr ("hello, world", 'l')
+    @result{} "llo, world"
+strchr ("hello, world", '?')
+    @result{} NULL
+@end smallexample    
+
+The terminating null character is considered to be part of the string,
+so you can use this function get a pointer to the end of a string by
+specifying a null character as the value of the @var{c} argument.
+@end deftypefun
+
+@comment string.h
+@comment BSD
+@deftypefun {char *} index (const char *@var{string}, int @var{c})
+@code{index} is another name for @code{strchr}; they are exactly the same.
+@end deftypefun
+
+@comment string.h
+@comment ANSI
+@deftypefun {char *} strrchr (const char *@var{string}, int @var{c})
+The function @code{strrchr} is like @code{strchr}, except that it searches
+backwards from the end of the string @var{string} (instead of forwards
+from the front).
+
+For example,
+@smallexample
+strrchr ("hello, world", 'l')
+    @result{} "ld"
+@end smallexample
+@end deftypefun
+
+@comment string.h
+@comment BSD
+@deftypefun {char *} rindex (const char *@var{string}, int @var{c})
+@code{rindex} is another name for @code{strrchr}; they are exactly the same.
+@end deftypefun
+
+@comment string.h
+@comment ANSI
+@deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle})
+This is like @code{strchr}, except that it searches @var{haystack} for a
+substring @var{needle} rather than just a single character.  It
+returns a pointer into the string @var{haystack} that is the first
+character of the substring, or a null pointer if no match was found.  If
+@var{needle} is an empty string, the function returns @var{haystack}.
+
+For example,
+@smallexample
+strstr ("hello, world", "l")
+    @result{} "llo, world"
+strstr ("hello, world", "wo")
+    @result{} "world"
+@end smallexample
+@end deftypefun
+
+
+@comment string.h
+@comment GNU
+@deftypefun {void *} memmem (const void *@var{needle}, size_t @var{needle-len},@*const void *@var{haystack}, size_t @var{haystack-len})
+This is like @code{strstr}, but @var{needle} and @var{haystack} are byte
+arrays rather than null-terminated strings.  @var{needle-len} is the
+length of @var{needle} and @var{haystack-len} is the length of
+@var{haystack}.@refill
+
+This function is a GNU extension.
+@end deftypefun
+
+@comment string.h
+@comment ANSI
+@deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset})
+The @code{strspn} (``string span'') function returns the length of the
+initial substring of @var{string} that consists entirely of characters that
+are members of the set specified by the string @var{skipset}.  The order
+of the characters in @var{skipset} is not important.
+
+For example,
+@smallexample
+strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz")
+    @result{} 5
+@end smallexample
+@end deftypefun
+
+@comment string.h
+@comment ANSI
+@deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset})
+The @code{strcspn} (``string complement span'') function returns the length
+of the initial substring of @var{string} that consists entirely of characters
+that are @emph{not} members of the set specified by the string @var{stopset}.
+(In other words, it returns the offset of the first character in @var{string}
+that is a member of the set @var{stopset}.)
+
+For example,
+@smallexample
+strcspn ("hello, world", " \t\n,.;!?")
+    @result{} 5
+@end smallexample
+@end deftypefun
+
+@comment string.h
+@comment ANSI
+@deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset})
+The @code{strpbrk} (``string pointer break'') function is related to
+@code{strcspn}, except that it returns a pointer to the first character
+in @var{string} that is a member of the set @var{stopset} instead of the
+length of the initial substring.  It returns a null pointer if no such
+character from @var{stopset} is found.
+
+@c @group  Invalid outside the example.
+For example,
+
+@smallexample
+strpbrk ("hello, world", " \t\n,.;!?")
+    @result{} ", world"
+@end smallexample
+@c @end group
+@end deftypefun
+
+@node Finding Tokens in a String,  , Search Functions, String and Array Utilities
+@section Finding Tokens in a String
+
+@c !!! Document strsep, which is a better thing to use than strtok.
+
+@cindex tokenizing strings
+@cindex breaking a string into tokens
+@cindex parsing tokens from a string
+It's fairly common for programs to have a need to do some simple kinds
+of lexical analysis and parsing, such as splitting a command string up
+into tokens.  You can do this with the @code{strtok} function, declared
+in the header file @file{string.h}.
+@pindex string.h
+
+@comment string.h
+@comment ANSI
+@deftypefun {char *} strtok (char *@var{newstring}, const char *@var{delimiters})
+A string can be split into tokens by making a series of calls to the
+function @code{strtok}.
+
+The string to be split up is passed as the @var{newstring} argument on
+the first call only.  The @code{strtok} function uses this to set up
+some internal state information.  Subsequent calls to get additional
+tokens from the same string are indicated by passing a null pointer as
+the @var{newstring} argument.  Calling @code{strtok} with another
+non-null @var{newstring} argument reinitializes the state information.
+It is guaranteed that no other library function ever calls @code{strtok}
+behind your back (which would mess up this internal state information).
+
+The @var{delimiters} argument is a string that specifies a set of delimiters
+that may surround the token being extracted.  All the initial characters
+that are members of this set are discarded.  The first character that is
+@emph{not} a member of this set of delimiters marks the beginning of the
+next token.  The end of the token is found by looking for the next
+character that is a member of the delimiter set.  This character in the
+original string @var{newstring} is overwritten by a null character, and the
+pointer to the beginning of the token in @var{newstring} is returned.
+
+On the next call to @code{strtok}, the searching begins at the next
+character beyond the one that marked the end of the previous token.
+Note that the set of delimiters @var{delimiters} do not have to be the
+same on every call in a series of calls to @code{strtok}.
+
+If the end of the string @var{newstring} is reached, or if the remainder of
+string consists only of delimiter characters, @code{strtok} returns
+a null pointer.
+@end deftypefun
+
+@strong{Warning:} Since @code{strtok} alters the string it is parsing,
+you always copy the string to a temporary buffer before parsing it with
+@code{strtok}.  If you allow @code{strtok} to modify a string that came
+from another part of your program, you are asking for trouble; that
+string may be part of a data structure that could be used for other
+purposes during the parsing, when alteration by @code{strtok} makes the
+data structure temporarily inaccurate.
+
+The string that you are operating on might even be a constant.  Then
+when @code{strtok} tries to modify it, your program will get a fatal
+signal for writing in read-only memory.  @xref{Program Error Signals}.
+
+This is a special case of a general principle: if a part of a program
+does not have as its purpose the modification of a certain data
+structure, then it is error-prone to modify the data structure
+temporarily.
+
+The function @code{strtok} is not reentrant.  @xref{Nonreentrancy}, for
+a discussion of where and why reentrancy is important.
+
+Here is a simple example showing the use of @code{strtok}.
+
+@comment Yes, this example has been tested.
+@smallexample
+#include <string.h>
+#include <stddef.h>
+
+@dots{}
+
+char string[] = "words separated by spaces -- and, punctuation!";
+const char delimiters[] = " .,;:!-";
+char *token;
+
+@dots{}
+
+token = strtok (string, delimiters);  /* token => "words" */
+token = strtok (NULL, delimiters);    /* token => "separated" */
+token = strtok (NULL, delimiters);    /* token => "by" */
+token = strtok (NULL, delimiters);    /* token => "spaces" */
+token = strtok (NULL, delimiters);    /* token => "and" */
+token = strtok (NULL, delimiters);    /* token => "punctuation" */
+token = strtok (NULL, delimiters);    /* token => NULL */
+@end smallexample
author	Roland McGrath <roland@gnu.org>	1995-02-18 01:27:10 +0000
committer	Roland McGrath <roland@gnu.org>	1995-02-18 01:27:10 +0000
commit	28f540f45bbacd939bfd07f213bcad2bf730b1bf (patch)
tree	15f07c4c43d635959c6afee96bde71fb1b3614ee /manual/string.texi
download	glibc-28f540f45bbacd939bfd07f213bcad2bf730b1bf.tar glibc-28f540f45bbacd939bfd07f213bcad2bf730b1bf.tar.gz glibc-28f540f45bbacd939bfd07f213bcad2bf730b1bf.tar.bz2 glibc-28f540f45bbacd939bfd07f213bcad2bf730b1bf.zip