diff options
Diffstat (limited to 'manual/string.texi')
-rw-r--r-- | manual/string.texi | 3000 |
1 files changed, 0 insertions, 3000 deletions
diff --git a/manual/string.texi b/manual/string.texi deleted file mode 100644 index b8810d66b7..0000000000 --- a/manual/string.texi +++ /dev/null @@ -1,3000 +0,0 @@ -@node String and Array Utilities, Character Set Handling, Character Handling, Top -@c %MENU% Utilities for copying and comparing strings and arrays -@chapter String and Array Utilities - -Operations on strings (null-terminated byte sequences) are an important part of -many programs. @Theglibc{} provides an extensive set of string -utility functions, including functions for copying, concatenating, -comparing, and searching strings. Many of these functions can also -operate on arbitrary regions of storage; for example, the @code{memcpy} -function can be used to copy the contents of any kind of array. - -It's fairly common for beginning C programmers to ``reinvent the wheel'' -by duplicating this functionality in their own code, but it pays to -become familiar with the library functions and to make use of them, -since this offers benefits in maintenance, efficiency, and portability. - -For instance, you could easily compare one string to another in two -lines of C code, but if you use the built-in @code{strcmp} function, -you're less likely to make a mistake. And, since these library -functions are typically highly optimized, your program may run faster -too. - -@menu -* Representation of Strings:: Introduction to basic concepts. -* String/Array Conventions:: Whether to use a string function or an - arbitrary array function. -* String Length:: Determining the length of a string. -* Copying Strings and Arrays:: Functions to copy strings and arrays. -* Concatenating Strings:: Functions to concatenate strings while copying. -* Truncating Strings:: Functions to truncate strings while copying. -* String/Array Comparison:: Functions for byte-wise and character-wise - comparison. -* Collation Functions:: Functions for collating strings. -* Search Functions:: Searching for a specific element or substring. -* Finding Tokens in a String:: Splitting a string into tokens by looking - for delimiters. -* Erasing Sensitive Data:: Clearing memory which contains sensitive - data, after it's no longer needed. -* strfry:: Function for flash-cooking a string. -* Trivial Encryption:: Obscuring data. -* Encode Binary Data:: Encoding and Decoding of Binary Data. -* Argz and Envz Vectors:: Null-separated string vectors. -@end menu - -@node Representation of Strings -@section Representation of Strings -@cindex string, representation of - -This section is a quick summary of string concepts for beginning C -programmers. It describes how strings are represented in C -and some common pitfalls. If you are already familiar with this -material, you can skip this section. - -@cindex string -A @dfn{string} is a null-terminated array of bytes of type @code{char}, -including the terminating null byte. String-valued -variables are usually declared to be pointers of type @code{char *}. -Such variables do not include space for the text of a string; that has -to be stored somewhere else---in an array variable, a string constant, -or dynamically allocated memory (@pxref{Memory Allocation}). It's up to -you to store the address of the chosen memory space into the pointer -variable. Alternatively you can store a @dfn{null pointer} in the -pointer variable. The null pointer does not point anywhere, so -attempting to reference the string it points to gets an error. - -@cindex multibyte character -@cindex multibyte string -@cindex wide string -A @dfn{multibyte character} is a sequence of one or more bytes that -represents a single character using the locale's encoding scheme; a -null byte always represents the null character. A @dfn{multibyte -string} is a string that consists entirely of multibyte -characters. In contrast, a @dfn{wide string} is a null-terminated -sequence of @code{wchar_t} objects. A wide-string variable is usually -declared to be a pointer of type @code{wchar_t *}, by analogy with -string variables and @code{char *}. @xref{Extended Char Intro}. - -@cindex null byte -@cindex null wide character -By convention, the @dfn{null byte}, @code{'\0'}, -marks the end of a string and the @dfn{null wide character}, -@code{L'\0'}, marks the end of a wide string. For example, in -testing to see whether the @code{char *} variable @var{p} points to a -null byte marking the end of a string, you can write -@code{!*@var{p}} or @code{*@var{p} == '\0'}. - -A null byte is quite different conceptually from a null pointer, -although both are represented by the integer constant @code{0}. - -@cindex string literal -A @dfn{string literal} appears in C program source as a multibyte -string between double-quote characters (@samp{"}). If the -initial double-quote character is immediately preceded by a capital -@samp{L} (ell) character (as in @code{L"foo"}), it is a wide string -literal. String literals can also contribute to @dfn{string -concatenation}: @code{"a" "b"} is the same as @code{"ab"}. -For wide strings one can use either -@code{L"a" L"b"} or @code{L"a" "b"}. Modification of string literals is -not allowed by the GNU C compiler, because literals are placed in -read-only storage. - -Arrays that are declared @code{const} cannot be modified -either. It's generally good style to declare non-modifiable string -pointers to be of type @code{const char *}, since this often allows the -C compiler to detect accidental modifications as well as providing some -amount of documentation about what your program intends to do with the -string. - -The amount of memory allocated for a byte array may extend past the null byte -that marks the end of the string that the array contains. In this -document, the term @dfn{allocated size} is always used to refer to the -total amount of memory allocated for an array, while the term -@dfn{length} refers to the number of bytes up to (but not including) -the terminating null byte. Wide strings are similar, except their -sizes and lengths count wide characters, not bytes. -@cindex length of string -@cindex allocation size of string -@cindex size of string -@cindex string length -@cindex string allocation - -A notorious source of program bugs is trying to put more bytes into a -string than fit in its allocated size. When writing code that extends -strings or moves bytes into a pre-allocated array, you should be -very careful to keep track of the length of the text and make explicit -checks for overflowing the array. Many of the library functions -@emph{do not} do this for you! Remember also that you need to allocate -an extra byte to hold the null byte that marks the end of the -string. - -@cindex single-byte string -@cindex multibyte string -Originally strings were sequences of bytes where each byte represented a -single character. This is still true today if the strings are encoded -using a single-byte character encoding. Things are different if the -strings are encoded using a multibyte encoding (for more information on -encodings see @ref{Extended Char Intro}). There is no difference in -the programming interface for these two kind of strings; the programmer -has to be aware of this and interpret the byte sequences accordingly. - -But since there is no separate interface taking care of these -differences the byte-based string functions are sometimes hard to use. -Since the count parameters of these functions specify bytes a call to -@code{memcpy} could cut a multibyte character in the middle and put an -incomplete (and therefore unusable) byte sequence in the target buffer. - -@cindex wide string -To avoid these problems later versions of the @w{ISO C} standard -introduce a second set of functions which are operating on @dfn{wide -characters} (@pxref{Extended Char Intro}). These functions don't have -the problems the single-byte versions have since every wide character is -a legal, interpretable value. This does not mean that cutting wide -strings at arbitrary points is without problems. It normally -is for alphabet-based languages (except for non-normalized text) but -languages based on syllables still have the problem that more than one -wide character is necessary to complete a logical unit. This is a -higher level problem which the @w{C library} functions are not designed -to solve. But it is at least good that no invalid byte sequences can be -created. Also, the higher level functions can also much more easily operate -on wide characters than on multibyte characters so that a common strategy -is to use wide characters internally whenever text is more than simply -copied. - -The remaining of this chapter will discuss the functions for handling -wide strings in parallel with the discussion of -strings since there is almost always an exact equivalent -available. - -@node String/Array Conventions -@section String and Array Conventions - -This chapter describes both functions that work on arbitrary arrays or -blocks of memory, and functions that are specific to strings and wide -strings. - -Functions that operate on arbitrary blocks of memory have names -beginning with @samp{mem} and @samp{wmem} (such as @code{memcpy} and -@code{wmemcpy}) and invariably take an argument which specifies the size -(in bytes and wide characters respectively) of the block of memory to -operate on. The array arguments and return values for these functions -have type @code{void *} or @code{wchar_t}. As a matter of style, the -elements of the arrays used with the @samp{mem} functions are referred -to as ``bytes''. You can pass any kind of pointer to these functions, -and the @code{sizeof} operator is useful in computing the value for the -size argument. Parameters to the @samp{wmem} functions must be of type -@code{wchar_t *}. These functions are not really usable with anything -but arrays of this type. - -In contrast, functions that operate specifically on strings and wide -strings have names beginning with @samp{str} and @samp{wcs} -respectively (such as @code{strcpy} and @code{wcscpy}) and look for a -terminating null byte or null wide character instead of requiring an explicit -size argument to be passed. (Some of these functions accept a specified -maximum length, but they also check for premature termination.) -The array arguments and return values for these -functions have type @code{char *} and @code{wchar_t *} respectively, and -the array elements are referred to as ``bytes'' and ``wide -characters''. - -In many cases, there are both @samp{mem} and @samp{str}/@samp{wcs} -versions of a function. The one that is more appropriate to use depends -on the exact situation. When your program is manipulating arbitrary -arrays or blocks of storage, then you should always use the @samp{mem} -functions. On the other hand, when you are manipulating -strings it is usually more convenient to use the @samp{str}/@samp{wcs} -functions, unless you already know the length of the string in advance. -The @samp{wmem} functions should be used for wide character arrays with -known size. - -@cindex wint_t -@cindex parameter promotion -Some of the memory and string functions take single characters as -arguments. Since a value of type @code{char} is automatically promoted -into a value of type @code{int} when used as a parameter, the functions -are declared with @code{int} as the type of the parameter in question. -In case of the wide character functions the situation is similar: the -parameter type for a single wide character is @code{wint_t} and not -@code{wchar_t}. This would for many implementations not be necessary -since @code{wchar_t} is large enough to not be automatically -promoted, but since the @w{ISO C} standard does not require such a -choice of types the @code{wint_t} type is used. - -@node String Length -@section String Length - -You can get the length of a string using the @code{strlen} function. -This function is declared in the header file @file{string.h}. -@pindex string.h - -@comment string.h -@comment ISO -@deftypefun size_t strlen (const char *@var{s}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{strlen} function returns the length of the -string @var{s} in bytes. (In other words, it returns the offset of the -terminating null byte within the array.) - -For example, -@smallexample -strlen ("hello, world") - @result{} 12 -@end smallexample - -When applied to an array, the @code{strlen} function returns -the length of the string stored there, not its allocated size. You can -get the allocated size of the array that holds a string using -the @code{sizeof} operator: - -@smallexample -char string[32] = "hello, world"; -sizeof (string) - @result{} 32 -strlen (string) - @result{} 12 -@end smallexample - -But beware, this will not work unless @var{string} is the -array itself, not a pointer to it. For example: - -@smallexample -char string[32] = "hello, world"; -char *ptr = string; -sizeof (string) - @result{} 32 -sizeof (ptr) - @result{} 4 /* @r{(on a machine with 4 byte pointers)} */ -@end smallexample - -This is an easy mistake to make when you are working with functions that -take string arguments; those arguments are always pointers, not arrays. - -It must also be noted that for multibyte encoded strings the return -value does not have to correspond to the number of characters in the -string. To get this value the string can be converted to wide -characters and @code{wcslen} can be used or something like the following -code can be used: - -@smallexample -/* @r{The input is in @code{string}.} - @r{The length is expected in @code{n}.} */ -@{ - mbstate_t t; - char *scopy = string; - /* In initial state. */ - memset (&t, '\0', sizeof (t)); - /* Determine number of characters. */ - n = mbsrtowcs (NULL, &scopy, strlen (scopy), &t); -@} -@end smallexample - -This is cumbersome to do so if the number of characters (as opposed to -bytes) is needed often it is better to work with wide characters. -@end deftypefun - -The wide character equivalent is declared in @file{wchar.h}. - -@comment wchar.h -@comment ISO -@deftypefun size_t wcslen (const wchar_t *@var{ws}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{wcslen} function is the wide character equivalent to -@code{strlen}. The return value is the number of wide characters in the -wide string pointed to by @var{ws} (this is also the offset of -the terminating null wide character of @var{ws}). - -Since there are no multi wide character sequences making up one wide -character the return value is not only the offset in the array, it is -also the number of wide characters. - -This function was introduced in @w{Amendment 1} to @w{ISO C90}. -@end deftypefun - -@comment string.h -@comment GNU -@deftypefun size_t strnlen (const char *@var{s}, size_t @var{maxlen}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -If the array @var{s} of size @var{maxlen} contains a null byte, -the @code{strnlen} function returns the length of the string @var{s} in -bytes. Otherwise it -returns @var{maxlen}. Therefore this function is equivalent to -@code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) : @var{maxlen})} -but it -is more efficient and works even if @var{s} is not null-terminated so -long as @var{maxlen} does not exceed the size of @var{s}'s array. - -@smallexample -char string[32] = "hello, world"; -strnlen (string, 32) - @result{} 12 -strnlen (string, 5) - @result{} 5 -@end smallexample - -This function is a GNU extension and is declared in @file{string.h}. -@end deftypefun - -@comment wchar.h -@comment GNU -@deftypefun size_t wcsnlen (const wchar_t *@var{ws}, size_t @var{maxlen}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@code{wcsnlen} is the wide character equivalent to @code{strnlen}. The -@var{maxlen} parameter specifies the maximum number of wide characters. - -This function is a GNU extension and is declared in @file{wchar.h}. -@end deftypefun - -@node Copying Strings and Arrays -@section Copying Strings and Arrays - -You can use the functions described in this section to copy the contents -of strings, wide strings, and arrays. The @samp{str} and @samp{mem} -functions are declared in @file{string.h} while the @samp{w} functions -are declared in @file{wchar.h}. -@pindex string.h -@pindex wchar.h -@cindex copying strings and arrays -@cindex string copy functions -@cindex array copy functions -@cindex concatenating strings -@cindex string concatenation functions - -A helpful way to remember the ordering of the arguments to the functions -in this section is that it corresponds to an assignment expression, with -the destination array specified to the left of the source array. Most -of these functions return the address of the destination array; a few -return the address of the destination's terminating null, or of just -past the destination. - -Most of these functions do not work properly if the source and -destination arrays overlap. For example, if the beginning of the -destination array overlaps the end of the source array, the original -contents of that part of the source array may get overwritten before it -is copied. Even worse, in the case of the string functions, the null -byte marking the end of the string may be lost, and the copy -function might get stuck in a loop trashing all the memory allocated to -your program. - -All functions that have problems copying between overlapping arrays are -explicitly identified in this manual. In addition to functions in this -section, there are a few others like @code{sprintf} (@pxref{Formatted -Output Functions}) and @code{scanf} (@pxref{Formatted Input -Functions}). - -@comment string.h -@comment ISO -@deftypefun {void *} memcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{memcpy} function copies @var{size} bytes from the object -beginning at @var{from} into the object beginning at @var{to}. The -behavior of this function is undefined if the two arrays @var{to} and -@var{from} overlap; use @code{memmove} instead if overlapping is possible. - -The value returned by @code{memcpy} is the value of @var{to}. - -Here is an example of how you might use @code{memcpy} to copy the -contents of an array: - -@smallexample -struct foo *oldarray, *newarray; -int arraysize; -@dots{} -memcpy (new, old, arraysize * sizeof (struct foo)); -@end smallexample -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun {wchar_t *} wmemcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{wmemcpy} function copies @var{size} wide characters from the object -beginning at @var{wfrom} into the object beginning at @var{wto}. The -behavior of this function is undefined if the two arrays @var{wto} and -@var{wfrom} overlap; use @code{wmemmove} instead if overlapping is possible. - -The following is a possible implementation of @code{wmemcpy} but there -are more optimizations possible. - -@smallexample -wchar_t * -wmemcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, - size_t size) -@{ - return (wchar_t *) memcpy (wto, wfrom, size * sizeof (wchar_t)); -@} -@end smallexample - -The value returned by @code{wmemcpy} is the value of @var{wto}. - -This function was introduced in @w{Amendment 1} to @w{ISO C90}. -@end deftypefun - -@comment string.h -@comment GNU -@deftypefun {void *} mempcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{mempcpy} function is nearly identical to the @code{memcpy} -function. It copies @var{size} bytes from the object beginning at -@code{from} into the object pointed to by @var{to}. But instead of -returning the value of @var{to} it returns a pointer to the byte -following the last written byte in the object beginning at @var{to}. -I.e., the value is @code{((void *) ((char *) @var{to} + @var{size}))}. - -This function is useful in situations where a number of objects shall be -copied to consecutive memory positions. - -@smallexample -void * -combine (void *o1, size_t s1, void *o2, size_t s2) -@{ - void *result = malloc (s1 + s2); - if (result != NULL) - mempcpy (mempcpy (result, o1, s1), o2, s2); - return result; -@} -@end smallexample - -This function is a GNU extension. -@end deftypefun - -@comment wchar.h -@comment GNU -@deftypefun {wchar_t *} wmempcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{wmempcpy} function is nearly identical to the @code{wmemcpy} -function. It copies @var{size} wide characters from the object -beginning at @code{wfrom} into the object pointed to by @var{wto}. But -instead of returning the value of @var{wto} it returns a pointer to the -wide character following the last written wide character in the object -beginning at @var{wto}. I.e., the value is @code{@var{wto} + @var{size}}. - -This function is useful in situations where a number of objects shall be -copied to consecutive memory positions. - -The following is a possible implementation of @code{wmemcpy} but there -are more optimizations possible. - -@smallexample -wchar_t * -wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, - size_t size) -@{ - return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t)); -@} -@end smallexample - -This function is a GNU extension. -@end deftypefun - -@comment string.h -@comment ISO -@deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@code{memmove} copies the @var{size} bytes at @var{from} into the -@var{size} bytes at @var{to}, even if those two blocks of space -overlap. In the case of overlap, @code{memmove} is careful to copy the -original values of the bytes in the block at @var{from}, including those -bytes which also belong to the block at @var{to}. - -The value returned by @code{memmove} is the value of @var{to}. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun {wchar_t *} wmemmove (wchar_t *@var{wto}, const wchar_t *@var{wfrom}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@code{wmemmove} copies the @var{size} wide characters at @var{wfrom} -into the @var{size} wide characters at @var{wto}, even if those two -blocks of space overlap. In the case of overlap, @code{wmemmove} is -careful to copy the original values of the wide characters in the block -at @var{wfrom}, including those wide characters which also belong to the -block at @var{wto}. - -The following is a possible implementation of @code{wmemcpy} but there -are more optimizations possible. - -@smallexample -wchar_t * -wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, - size_t size) -@{ - return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t)); -@} -@end smallexample - -The value returned by @code{wmemmove} is the value of @var{wto}. - -This function is a GNU extension. -@end deftypefun - -@comment string.h -@comment SVID -@deftypefun {void *} memccpy (void *restrict @var{to}, const void *restrict @var{from}, int @var{c}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function copies no more than @var{size} bytes from @var{from} to -@var{to}, stopping if a byte matching @var{c} is found. The return -value is a pointer into @var{to} one byte past where @var{c} was copied, -or a null pointer if no byte matching @var{c} appeared in the first -@var{size} bytes of @var{from}. -@end deftypefun - -@comment string.h -@comment ISO -@deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function copies the value of @var{c} (converted to an -@code{unsigned char}) into each of the first @var{size} bytes of the -object beginning at @var{block}. It returns the value of @var{block}. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun {wchar_t *} wmemset (wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function copies the value of @var{wc} into each of the first -@var{size} wide characters of the object beginning at @var{block}. It -returns the value of @var{block}. -@end deftypefun - -@comment string.h -@comment ISO -@deftypefun {char *} strcpy (char *restrict @var{to}, const char *restrict @var{from}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This copies bytes from the string @var{from} (up to and including -the terminating null byte) into the string @var{to}. Like -@code{memcpy}, this function has undefined results if the strings -overlap. The return value is the value of @var{to}. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun {wchar_t *} wcscpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This copies wide characters from the wide string @var{wfrom} (up to and -including the terminating null wide character) into the string -@var{wto}. Like @code{wmemcpy}, this function has undefined results if -the strings overlap. The return value is the value of @var{wto}. -@end deftypefun - -@comment SVID -@deftypefun {char *} strdup (const char *@var{s}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -This function copies the string @var{s} into a newly -allocated string. The string is allocated using @code{malloc}; see -@ref{Unconstrained Allocation}. If @code{malloc} cannot allocate space -for the new string, @code{strdup} returns a null pointer. Otherwise it -returns a pointer to the new string. -@end deftypefun - -@comment wchar.h -@comment GNU -@deftypefun {wchar_t *} wcsdup (const wchar_t *@var{ws}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -This function copies the wide string @var{ws} -into a newly allocated string. The string is allocated using -@code{malloc}; see @ref{Unconstrained Allocation}. If @code{malloc} -cannot allocate space for the new string, @code{wcsdup} returns a null -pointer. Otherwise it returns a pointer to the new wide string. - -This function is a GNU extension. -@end deftypefun - -@comment string.h -@comment Unknown origin -@deftypefun {char *} stpcpy (char *restrict @var{to}, const char *restrict @var{from}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is like @code{strcpy}, except that it returns a pointer to -the end of the string @var{to} (that is, the address of the terminating -null byte @code{to + strlen (from)}) rather than the beginning. - -For example, this program uses @code{stpcpy} to concatenate @samp{foo} -and @samp{bar} to produce @samp{foobar}, which it then prints. - -@smallexample -@include stpcpy.c.texi -@end smallexample - -This function is part of POSIX.1-2008 and later editions, but was -available in @theglibc{} and other systems as an extension long before -it was standardized. - -Its behavior is undefined if the strings overlap. The function is -declared in @file{string.h}. -@end deftypefun - -@comment wchar.h -@comment GNU -@deftypefun {wchar_t *} wcpcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is like @code{wcscpy}, except that it returns a pointer to -the end of the string @var{wto} (that is, the address of the terminating -null wide character @code{wto + wcslen (wfrom)}) rather than the beginning. - -This function is not part of ISO or POSIX but was found useful while -developing @theglibc{} itself. - -The behavior of @code{wcpcpy} is undefined if the strings overlap. - -@code{wcpcpy} is a GNU extension and is declared in @file{wchar.h}. -@end deftypefun - -@comment string.h -@comment GNU -@deftypefn {Macro} {char *} strdupa (const char *@var{s}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This macro is similar to @code{strdup} but allocates the new string -using @code{alloca} instead of @code{malloc} (@pxref{Variable Size -Automatic}). This means of course the returned string has the same -limitations as any block of memory allocated using @code{alloca}. - -For obvious reasons @code{strdupa} is implemented only as a macro; -you cannot get the address of this function. Despite this limitation -it is a useful function. The following code shows a situation where -using @code{malloc} would be a lot more expensive. - -@smallexample -@include strdupa.c.texi -@end smallexample - -Please note that calling @code{strtok} using @var{path} directly is -invalid. It is also not allowed to call @code{strdupa} in the argument -list of @code{strtok} since @code{strdupa} uses @code{alloca} -(@pxref{Variable Size Automatic}) can interfere with the parameter -passing. - -This function is only available if GNU CC is used. -@end deftypefn - -@comment string.h -@comment BSD -@deftypefun void bcopy (const void *@var{from}, void *@var{to}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This is a partially obsolete alternative for @code{memmove}, derived from -BSD. Note that it is not quite equivalent to @code{memmove}, because the -arguments are not in the same order and there is no return value. -@end deftypefun - -@comment string.h -@comment BSD -@deftypefun void bzero (void *@var{block}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This is a partially obsolete alternative for @code{memset}, derived from -BSD. Note that it is not as general as @code{memset}, because the only -value it can store is zero. -@end deftypefun - -@node Concatenating Strings -@section Concatenating Strings -@pindex string.h -@pindex wchar.h -@cindex concatenating strings -@cindex string concatenation functions - -The functions described in this section concatenate the contents of a -string or wide string to another. They follow the string-copying -functions in their conventions. @xref{Copying Strings and Arrays}. -@samp{strcat} is declared in the header file @file{string.h} while -@samp{wcscat} is declared in @file{wchar.h}. - -@comment string.h -@comment ISO -@deftypefun {char *} strcat (char *restrict @var{to}, const char *restrict @var{from}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{strcat} function is similar to @code{strcpy}, except that the -bytes from @var{from} are concatenated or appended to the end of -@var{to}, instead of overwriting it. That is, the first byte from -@var{from} overwrites the null byte marking the end of @var{to}. - -An equivalent definition for @code{strcat} would be: - -@smallexample -char * -strcat (char *restrict to, const char *restrict from) -@{ - strcpy (to + strlen (to), from); - return to; -@} -@end smallexample - -This function has undefined results if the strings overlap. - -As noted below, this function has significant performance issues. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun {wchar_t *} wcscat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{wcscat} function is similar to @code{wcscpy}, except that the -wide characters from @var{wfrom} are concatenated or appended to the end of -@var{wto}, instead of overwriting it. That is, the first wide character from -@var{wfrom} overwrites the null wide character marking the end of @var{wto}. - -An equivalent definition for @code{wcscat} would be: - -@smallexample -wchar_t * -wcscat (wchar_t *wto, const wchar_t *wfrom) -@{ - wcscpy (wto + wcslen (wto), wfrom); - return wto; -@} -@end smallexample - -This function has undefined results if the strings overlap. - -As noted below, this function has significant performance issues. -@end deftypefun - -Programmers using the @code{strcat} or @code{wcscat} function (or the -@code{strncat} or @code{wcsncat} functions defined in -a later section, for that matter) -can easily be recognized as lazy and reckless. In almost all situations -the lengths of the participating strings are known (it better should be -since how can one otherwise ensure the allocated size of the buffer is -sufficient?) Or at least, one could know them if one keeps track of the -results of the various function calls. But then it is very inefficient -to use @code{strcat}/@code{wcscat}. A lot of time is wasted finding the -end of the destination string so that the actual copying can start. -This is a common example: - -@cindex va_copy -@smallexample -/* @r{This function concatenates arbitrarily many strings. The last} - @r{parameter must be @code{NULL}.} */ -char * -concat (const char *str, @dots{}) -@{ - va_list ap, ap2; - size_t total = 1; - const char *s; - char *result; - - va_start (ap, str); - va_copy (ap2, ap); - - /* @r{Determine how much space we need.} */ - for (s = str; s != NULL; s = va_arg (ap, const char *)) - total += strlen (s); - - va_end (ap); - - result = (char *) malloc (total); - if (result != NULL) - @{ - result[0] = '\0'; - - /* @r{Copy the strings.} */ - for (s = str; s != NULL; s = va_arg (ap2, const char *)) - strcat (result, s); - @} - - va_end (ap2); - - return result; -@} -@end smallexample - -This looks quite simple, especially the second loop where the strings -are actually copied. But these innocent lines hide a major performance -penalty. Just imagine that ten strings of 100 bytes each have to be -concatenated. For the second string we search the already stored 100 -bytes for the end of the string so that we can append the next string. -For all strings in total the comparisons necessary to find the end of -the intermediate results sums up to 5500! If we combine the copying -with the search for the allocation we can write this function more -efficiently: - -@smallexample -char * -concat (const char *str, @dots{}) -@{ - va_list ap; - size_t allocated = 100; - char *result = (char *) malloc (allocated); - - if (result != NULL) - @{ - char *newp; - char *wp; - const char *s; - - va_start (ap, str); - - wp = result; - for (s = str; s != NULL; s = va_arg (ap, const char *)) - @{ - size_t len = strlen (s); - - /* @r{Resize the allocated memory if necessary.} */ - if (wp + len + 1 > result + allocated) - @{ - allocated = (allocated + len) * 2; - newp = (char *) realloc (result, allocated); - if (newp == NULL) - @{ - free (result); - return NULL; - @} - wp = newp + (wp - result); - result = newp; - @} - - wp = mempcpy (wp, s, len); - @} - - /* @r{Terminate the result string.} */ - *wp++ = '\0'; - - /* @r{Resize memory to the optimal size.} */ - newp = realloc (result, wp - result); - if (newp != NULL) - result = newp; - - va_end (ap); - @} - - return result; -@} -@end smallexample - -With a bit more knowledge about the input strings one could fine-tune -the memory allocation. The difference we are pointing to here is that -we don't use @code{strcat} anymore. We always keep track of the length -of the current intermediate result so we can save ourselves the search for the -end of the string and use @code{mempcpy}. Please note that we also -don't use @code{stpcpy} which might seem more natural since we are handling -strings. But this is not necessary since we already know the -length of the string and therefore can use the faster memory copying -function. The example would work for wide characters the same way. - -Whenever a programmer feels the need to use @code{strcat} she or he -should think twice and look through the program to see whether the code cannot -be rewritten to take advantage of already calculated results. Again: it -is almost always unnecessary to use @code{strcat}. - -@node Truncating Strings -@section Truncating Strings while Copying -@cindex truncating strings -@cindex string truncation - -The functions described in this section copy or concatenate the -possibly-truncated contents of a string or array to another, and -similarly for wide strings. They follow the string-copying functions -in their header conventions. @xref{Copying Strings and Arrays}. The -@samp{str} functions are declared in the header file @file{string.h} -and the @samp{wc} functions are declared in the file @file{wchar.h}. - -@comment string.h -@deftypefun {char *} strncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is similar to @code{strcpy} but always copies exactly -@var{size} bytes into @var{to}. - -If @var{from} does not contain a null byte in its first @var{size} -bytes, @code{strncpy} copies just the first @var{size} bytes. In this -case no null terminator is written into @var{to}. - -Otherwise @var{from} must be a string with length less than -@var{size}. In this case @code{strncpy} copies all of @var{from}, -followed by enough null bytes to add up to @var{size} bytes in all. - -The behavior of @code{strncpy} is undefined if the strings overlap. - -This function was designed for now-rarely-used arrays consisting of -non-null bytes followed by zero or more null bytes. It needs to set -all @var{size} bytes of the destination, even when @var{size} is much -greater than the length of @var{from}. As noted below, this function -is generally a poor choice for processing text. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun {wchar_t *} wcsncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is similar to @code{wcscpy} but always copies exactly -@var{size} wide characters into @var{wto}. - -If @var{wfrom} does not contain a null wide character in its first -@var{size} wide characters, then @code{wcsncpy} copies just the first -@var{size} wide characters. In this case no null terminator is -written into @var{wto}. - -Otherwise @var{wfrom} must be a wide string with length less than -@var{size}. In this case @code{wcsncpy} copies all of @var{wfrom}, -followed by enough null wide characters to add up to @var{size} wide -characters in all. - -The behavior of @code{wcsncpy} is undefined if the strings overlap. - -This function is the wide-character counterpart of @code{strncpy} and -suffers from most of the problems that @code{strncpy} does. For -example, as noted below, this function is generally a poor choice for -processing text. -@end deftypefun - -@comment string.h -@comment GNU -@deftypefun {char *} strndup (const char *@var{s}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -This function is similar to @code{strdup} but always copies at most -@var{size} bytes into the newly allocated string. - -If the length of @var{s} is more than @var{size}, then @code{strndup} -copies just the first @var{size} bytes and adds a closing null byte. -Otherwise all bytes are copied and the string is terminated. - -This function differs from @code{strncpy} in that it always terminates -the destination string. - -As noted below, this function is generally a poor choice for -processing text. - -@code{strndup} is a GNU extension. -@end deftypefun - -@comment string.h -@comment GNU -@deftypefn {Macro} {char *} strndupa (const char *@var{s}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is similar to @code{strndup} but like @code{strdupa} it -allocates the new string using @code{alloca} @pxref{Variable Size -Automatic}. The same advantages and limitations of @code{strdupa} are -valid for @code{strndupa}, too. - -This function is implemented only as a macro, just like @code{strdupa}. -Just as @code{strdupa} this macro also must not be used inside the -parameter list in a function call. - -As noted below, this function is generally a poor choice for -processing text. - -@code{strndupa} is only available if GNU CC is used. -@end deftypefn - -@comment string.h -@comment GNU -@deftypefun {char *} stpncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is similar to @code{stpcpy} but copies always exactly -@var{size} bytes into @var{to}. - -If the length of @var{from} is more than @var{size}, then @code{stpncpy} -copies just the first @var{size} bytes and returns a pointer to the -byte directly following the one which was copied last. Note that in -this case there is no null terminator written into @var{to}. - -If the length of @var{from} is less than @var{size}, then @code{stpncpy} -copies all of @var{from}, followed by enough null bytes to add up -to @var{size} bytes in all. This behavior is rarely useful, but it -is implemented to be useful in contexts where this behavior of the -@code{strncpy} is used. @code{stpncpy} returns a pointer to the -@emph{first} written null byte. - -This function is not part of ISO or POSIX but was found useful while -developing @theglibc{} itself. - -Its behavior is undefined if the strings overlap. The function is -declared in @file{string.h}. - -As noted below, this function is generally a poor choice for -processing text. -@end deftypefun - -@comment wchar.h -@comment GNU -@deftypefun {wchar_t *} wcpncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is similar to @code{wcpcpy} but copies always exactly -@var{wsize} wide characters into @var{wto}. - -If the length of @var{wfrom} is more than @var{size}, then -@code{wcpncpy} copies just the first @var{size} wide characters and -returns a pointer to the wide character directly following the last -non-null wide character which was copied last. Note that in this case -there is no null terminator written into @var{wto}. - -If the length of @var{wfrom} is less than @var{size}, then @code{wcpncpy} -copies all of @var{wfrom}, followed by enough null wide characters to add up -to @var{size} wide characters in all. This behavior is rarely useful, but it -is implemented to be useful in contexts where this behavior of the -@code{wcsncpy} is used. @code{wcpncpy} returns a pointer to the -@emph{first} written null wide character. - -This function is not part of ISO or POSIX but was found useful while -developing @theglibc{} itself. - -Its behavior is undefined if the strings overlap. - -As noted below, this function is generally a poor choice for -processing text. - -@code{wcpncpy} is a GNU extension. -@end deftypefun - -@comment string.h -@comment ISO -@deftypefun {char *} strncat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is like @code{strcat} except that not more than @var{size} -bytes from @var{from} are appended to the end of @var{to}, and -@var{from} need not be null-terminated. A single null byte is also -always appended to @var{to}, so the total -allocated size of @var{to} must be at least @code{@var{size} + 1} bytes -longer than its initial length. - -The @code{strncat} function could be implemented like this: - -@smallexample -@group -char * -strncat (char *to, const char *from, size_t size) -@{ - size_t len = strlen (to); - memcpy (to + len, from, strnlen (from, size)); - to[len + strnlen (from, size)] = '\0'; - return to; -@} -@end group -@end smallexample - -The behavior of @code{strncat} is undefined if the strings overlap. - -As a companion to @code{strncpy}, @code{strncat} was designed for -now-rarely-used arrays consisting of non-null bytes followed by zero -or more null bytes. As noted below, this function is generally a poor -choice for processing text. Also, this function has significant -performance issues. @xref{Concatenating Strings}. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun {wchar_t *} wcsncat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is like @code{wcscat} except that not more than @var{size} -wide characters from @var{from} are appended to the end of @var{to}, -and @var{from} need not be null-terminated. A single null wide -character is also always appended to @var{to}, so the total allocated -size of @var{to} must be at least @code{wcsnlen (@var{wfrom}, -@var{size}) + 1} wide characters longer than its initial length. - -The @code{wcsncat} function could be implemented like this: - -@smallexample -@group -wchar_t * -wcsncat (wchar_t *restrict wto, const wchar_t *restrict wfrom, - size_t size) -@{ - size_t len = wcslen (wto); - memcpy (wto + len, wfrom, wcsnlen (wfrom, size) * sizeof (wchar_t)); - wto[len + wcsnlen (wfrom, size)] = L'\0'; - return wto; -@} -@end group -@end smallexample - -The behavior of @code{wcsncat} is undefined if the strings overlap. - -As noted below, this function is generally a poor choice for -processing text. Also, this function has significant performance -issues. @xref{Concatenating Strings}. -@end deftypefun - -Because these functions can abruptly truncate strings or wide strings, -they are generally poor choices for processing text. When coping or -concatening multibyte strings, they can truncate within a multibyte -character so that the result is not a valid multibyte string. When -combining or concatenating multibyte or wide strings, they may -truncate the output after a combining character, resulting in a -corrupted grapheme. They can cause bugs even when processing -single-byte strings: for example, when calculating an ASCII-only user -name, a truncated name can identify the wrong user. - -Although some buffer overruns can be prevented by manually replacing -calls to copying functions with calls to truncation functions, there -are often easier and safer automatic techniques that cause buffer -overruns to reliably terminate a program, such as GCC's -@option{-fcheck-pointer-bounds} and @option{-fsanitize=address} -options. @xref{Debugging Options,, Options for Debugging Your Program -or GCC, gcc.info, Using GCC}. Because truncation functions can mask -application bugs that would otherwise be caught by the automatic -techniques, these functions should be used only when the application's -underlying logic requires truncation. - -@strong{Note:} GNU programs should not truncate strings or wide -strings to fit arbitrary size limits. @xref{Semantics, , Writing -Robust Programs, standards, The GNU Coding Standards}. Instead of -string-truncation functions, it is usually better to use dynamic -memory allocation (@pxref{Unconstrained Allocation}) and functions -such as @code{strdup} or @code{asprintf} to construct strings. - -@node String/Array Comparison -@section String/Array Comparison -@cindex comparing strings and arrays -@cindex string comparison functions -@cindex array comparison functions -@cindex predicates on strings -@cindex predicates on arrays - -You can use the functions in this section to perform comparisons on the -contents of strings and arrays. As well as checking for equality, these -functions can also be used as the ordering functions for sorting -operations. @xref{Searching and Sorting}, for an example of this. - -Unlike most comparison operations in C, the string comparison functions -return a nonzero value if the strings are @emph{not} equivalent rather -than if they are. The sign of the value indicates the relative ordering -of the first part of the strings that are not equivalent: a -negative value indicates that the first string is ``less'' than the -second, while a positive value indicates that the first string is -``greater''. - -The most common use of these functions is to check only for equality. -This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}. - -All of these functions are declared in the header file @file{string.h}. -@pindex string.h - -@comment string.h -@comment ISO -@deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The function @code{memcmp} compares the @var{size} bytes of memory -beginning at @var{a1} against the @var{size} bytes of memory beginning -at @var{a2}. The value returned has the same sign as the difference -between the first differing pair of bytes (interpreted as @code{unsigned -char} objects, then promoted to @code{int}). - -If the contents of the two blocks are equal, @code{memcmp} returns -@code{0}. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun int wmemcmp (const wchar_t *@var{a1}, const wchar_t *@var{a2}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The function @code{wmemcmp} compares the @var{size} wide characters -beginning at @var{a1} against the @var{size} wide characters beginning -at @var{a2}. The value returned is smaller than or larger than zero -depending on whether the first differing wide character is @var{a1} is -smaller or larger than the corresponding wide character in @var{a2}. - -If the contents of the two blocks are equal, @code{wmemcmp} returns -@code{0}. -@end deftypefun - -On arbitrary arrays, the @code{memcmp} function is mostly useful for -testing equality. It usually isn't meaningful to do byte-wise ordering -comparisons on arrays of things other than bytes. For example, a -byte-wise comparison on the bytes that make up floating-point numbers -isn't likely to tell you anything about the relationship between the -values of the floating-point numbers. - -@code{wmemcmp} is really only useful to compare arrays of type -@code{wchar_t} since the function looks at @code{sizeof (wchar_t)} bytes -at a time and this number of bytes is system dependent. - -You should also be careful about using @code{memcmp} to compare objects -that can contain ``holes'', such as the padding inserted into structure -objects to enforce alignment requirements, extra space at the end of -unions, and extra bytes at the ends of strings whose length is less -than their allocated size. The contents of these ``holes'' are -indeterminate and may cause strange behavior when performing byte-wise -comparisons. For more predictable results, perform an explicit -component-wise comparison. - -For example, given a structure type definition like: - -@smallexample -struct foo - @{ - unsigned char tag; - union - @{ - double f; - long i; - char *p; - @} value; - @}; -@end smallexample - -@noindent -you are better off writing a specialized comparison function to compare -@code{struct foo} objects instead of comparing them with @code{memcmp}. - -@comment string.h -@comment ISO -@deftypefun int strcmp (const char *@var{s1}, const char *@var{s2}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{strcmp} function compares the string @var{s1} against -@var{s2}, returning a value that has the same sign as the difference -between the first differing pair of bytes (interpreted as -@code{unsigned char} objects, then promoted to @code{int}). - -If the two strings are equal, @code{strcmp} returns @code{0}. - -A consequence of the ordering used by @code{strcmp} is that if @var{s1} -is an initial substring of @var{s2}, then @var{s1} is considered to be -``less than'' @var{s2}. - -@code{strcmp} does not take sorting conventions of the language the -strings are written in into account. To get that one has to use -@code{strcoll}. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun int wcscmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} - -The @code{wcscmp} function compares the wide string @var{ws1} -against @var{ws2}. The value returned is smaller than or larger than zero -depending on whether the first differing wide character is @var{ws1} is -smaller or larger than the corresponding wide character in @var{ws2}. - -If the two strings are equal, @code{wcscmp} returns @code{0}. - -A consequence of the ordering used by @code{wcscmp} is that if @var{ws1} -is an initial substring of @var{ws2}, then @var{ws1} is considered to be -``less than'' @var{ws2}. - -@code{wcscmp} does not take sorting conventions of the language the -strings are written in into account. To get that one has to use -@code{wcscoll}. -@end deftypefun - -@comment string.h -@comment BSD -@deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2}) -@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} -@c Although this calls tolower multiple times, it's a macro, and -@c strcasecmp is optimized so that the locale pointer is read only once. -@c There are some asm implementations too, for which the single-read -@c from locale TLS pointers also applies. -This function is like @code{strcmp}, except that differences in case are -ignored, and its arguments must be multibyte strings. -How uppercase and lowercase characters are related is -determined by the currently selected locale. In the standard @code{"C"} -locale the characters @"A and @"a do not match but in a locale which -regards these characters as parts of the alphabet they do match. - -@noindent -@code{strcasecmp} is derived from BSD. -@end deftypefun - -@comment wchar.h -@comment GNU -@deftypefun int wcscasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}) -@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} -@c Since towlower is not a macro, the locale object may be read multiple -@c times. -This function is like @code{wcscmp}, except that differences in case are -ignored. How uppercase and lowercase characters are related is -determined by the currently selected locale. In the standard @code{"C"} -locale the characters @"A and @"a do not match but in a locale which -regards these characters as parts of the alphabet they do match. - -@noindent -@code{wcscasecmp} is a GNU extension. -@end deftypefun - -@comment string.h -@comment ISO -@deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is the similar to @code{strcmp}, except that no more than -@var{size} bytes are compared. In other words, if the two -strings are the same in their first @var{size} bytes, the -return value is zero. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun int wcsncmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is similar to @code{wcscmp}, except that no more than -@var{size} wide characters are compared. In other words, if the two -strings are the same in their first @var{size} wide characters, the -return value is zero. -@end deftypefun - -@comment string.h -@comment BSD -@deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n}) -@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} -This function is like @code{strncmp}, except that differences in case -are ignored, and the compared parts of the arguments should consist of -valid multibyte characters. -Like @code{strcasecmp}, it is locale dependent how -uppercase and lowercase characters are related. - -@noindent -@code{strncasecmp} is a GNU extension. -@end deftypefun - -@comment wchar.h -@comment GNU -@deftypefun int wcsncasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{s2}, size_t @var{n}) -@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} -This function is like @code{wcsncmp}, except that differences in case -are ignored. Like @code{wcscasecmp}, it is locale dependent how -uppercase and lowercase characters are related. - -@noindent -@code{wcsncasecmp} is a GNU extension. -@end deftypefun - -Here are some examples showing the use of @code{strcmp} and -@code{strncmp} (equivalent examples can be constructed for the wide -character functions). These examples assume the use of the ASCII -character set. (If some other character set---say, EBCDIC---is used -instead, then the glyphs are associated with different numeric codes, -and the return values and ordering may differ.) - -@smallexample -strcmp ("hello", "hello") - @result{} 0 /* @r{These two strings are the same.} */ -strcmp ("hello", "Hello") - @result{} 32 /* @r{Comparisons are case-sensitive.} */ -strcmp ("hello", "world") - @result{} -15 /* @r{The byte @code{'h'} comes before @code{'w'}.} */ -strcmp ("hello", "hello, world") - @result{} -44 /* @r{Comparing a null byte against a comma.} */ -strncmp ("hello", "hello, world", 5) - @result{} 0 /* @r{The initial 5 bytes are the same.} */ -strncmp ("hello, world", "hello, stupid world!!!", 5) - @result{} 0 /* @r{The initial 5 bytes are the same.} */ -@end smallexample - -@comment string.h -@comment GNU -@deftypefun int strverscmp (const char *@var{s1}, const char *@var{s2}) -@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} -@c Calls isdigit multiple times, locale may change in between. -The @code{strverscmp} function compares the string @var{s1} against -@var{s2}, considering them as holding indices/version numbers. The -return value follows the same conventions as found in the -@code{strcmp} function. In fact, if @var{s1} and @var{s2} contain no -digits, @code{strverscmp} behaves like @code{strcmp} -(in the sense that the sign of the result is the same). - -The comparison algorithm which the @code{strverscmp} function implements -differs slightly from other version-comparison algorithms. The -implementation is based on a finite-state machine, whose behavior is -approximated below. - -@itemize @bullet -@item -The input strings are each split into sequences of non-digits and -digits. These sequences can be empty at the beginning and end of the -string. Digits are determined by the @code{isdigit} function and are -thus subject to the current locale. - -@item -Comparison starts with a (possibly empty) non-digit sequence. The first -non-equal sequences of non-digits or digits determines the outcome of -the comparison. - -@item -Corresponding non-digit sequences in both strings are compared -lexicographically if their lengths are equal. If the lengths differ, -the shorter non-digit sequence is extended with the input string -character immediately following it (which may be the null terminator), -the other sequence is truncated to be of the same (extended) length, and -these two sequences are compared lexicographically. In the last case, -the sequence comparison determines the result of the function because -the extension character (or some character before it) is necessarily -different from the character at the same offset in the other input -string. - -@item -For two sequences of digits, the number of leading zeros is counted (which -can be zero). If the count differs, the string with more leading zeros -in the digit sequence is considered smaller than the other string. - -@item -If the two sequences of digits have no leading zeros, they are compared -as integers, that is, the string with the longer digit sequence is -deemed larger, and if both sequences are of equal length, they are -compared lexicographically. - -@item -If both digit sequences start with a zero and have an equal number of -leading zeros, they are compared lexicographically if their lengths are -the same. If the lengths differ, the shorter sequence is extended with -the following character in its input string, and the other sequence is -truncated to the same length, and both sequences are compared -lexicographically (similar to the non-digit sequence case above). -@end itemize - -The treatment of leading zeros and the tie-breaking extension characters -(which in effect propagate across non-digit/digit sequence boundaries) -differs from other version-comparison algorithms. - -@smallexample -strverscmp ("no digit", "no digit") - @result{} 0 /* @r{same behavior as strcmp.} */ -strverscmp ("item#99", "item#100") - @result{} <0 /* @r{same prefix, but 99 < 100.} */ -strverscmp ("alpha1", "alpha001") - @result{} >0 /* @r{different number of leading zeros (0 and 2).} */ -strverscmp ("part1_f012", "part1_f01") - @result{} >0 /* @r{lexicographical comparison with leading zeros.} */ -strverscmp ("foo.009", "foo.0") - @result{} <0 /* @r{different number of leading zeros (2 and 1).} */ -@end smallexample - -@code{strverscmp} is a GNU extension. -@end deftypefun - -@comment string.h -@comment BSD -@deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This is an obsolete alias for @code{memcmp}, derived from BSD. -@end deftypefun - -@node Collation Functions -@section Collation Functions - -@cindex collating strings -@cindex string collation functions - -In some locales, the conventions for lexicographic ordering differ from -the strict numeric ordering of character codes. For example, in Spanish -most glyphs with diacritical marks such as accents are not considered -distinct letters for the purposes of collation. On the other hand, the -two-character sequence @samp{ll} is treated as a single letter that is -collated immediately after @samp{l}. - -You can use the functions @code{strcoll} and @code{strxfrm} (declared in -the headers file @file{string.h}) and @code{wcscoll} and @code{wcsxfrm} -(declared in the headers file @file{wchar}) to compare strings using a -collation ordering appropriate for the current locale. The locale used -by these functions in particular can be specified by setting the locale -for the @code{LC_COLLATE} category; see @ref{Locales}. -@pindex string.h -@pindex wchar.h - -In the standard C locale, the collation sequence for @code{strcoll} is -the same as that for @code{strcmp}. Similarly, @code{wcscoll} and -@code{wcscmp} are the same in this situation. - -Effectively, the way these functions work is by applying a mapping to -transform the characters in a multibyte string to a byte -sequence that represents -the string's position in the collating sequence of the current locale. -Comparing two such byte sequences in a simple fashion is equivalent to -comparing the strings with the locale's collating sequence. - -The functions @code{strcoll} and @code{wcscoll} perform this translation -implicitly, in order to do one comparison. By contrast, @code{strxfrm} -and @code{wcsxfrm} perform the mapping explicitly. If you are making -multiple comparisons using the same string or set of strings, it is -likely to be more efficient to use @code{strxfrm} or @code{wcsxfrm} to -transform all the strings just once, and subsequently compare the -transformed strings with @code{strcmp} or @code{wcscmp}. - -@comment string.h -@comment ISO -@deftypefun int strcoll (const char *@var{s1}, const char *@var{s2}) -@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -@c Calls strcoll_l with the current locale, which dereferences only the -@c LC_COLLATE data pointer. -The @code{strcoll} function is similar to @code{strcmp} but uses the -collating sequence of the current locale for collation (the -@code{LC_COLLATE} locale). The arguments are multibyte strings. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun int wcscoll (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}) -@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -@c Same as strcoll, but calling wcscoll_l. -The @code{wcscoll} function is similar to @code{wcscmp} but uses the -collating sequence of the current locale for collation (the -@code{LC_COLLATE} locale). -@end deftypefun - -Here is an example of sorting an array of strings, using @code{strcoll} -to compare them. The actual sort algorithm is not written here; it -comes from @code{qsort} (@pxref{Array Sort Function}). The job of the -code shown here is to say how to compare the strings while sorting them. -(Later on in this section, we will show a way to do this more -efficiently using @code{strxfrm}.) - -@smallexample -/* @r{This is the comparison function used with @code{qsort}.} */ - -int -compare_elements (const void *v1, const void *v2) -@{ - char * const *p1 = v1; - char * const *p2 = v2; - - return strcoll (*p1, *p2); -@} - -/* @r{This is the entry point---the function to sort} - @r{strings using the locale's collating sequence.} */ - -void -sort_strings (char **array, int nstrings) -@{ - /* @r{Sort @code{temp_array} by comparing the strings.} */ - qsort (array, nstrings, - sizeof (char *), compare_elements); -@} -@end smallexample - -@cindex converting string to collation order -@comment string.h -@comment ISO -@deftypefun size_t strxfrm (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size}) -@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -The function @code{strxfrm} transforms the multibyte string -@var{from} using the -collation transformation determined by the locale currently selected for -collation, and stores the transformed string in the array @var{to}. Up -to @var{size} bytes (including a terminating null byte) are -stored. - -The behavior is undefined if the strings @var{to} and @var{from} -overlap; see @ref{Copying Strings and Arrays}. - -The return value is the length of the entire transformed string. This -value is not affected by the value of @var{size}, but if it is greater -or equal than @var{size}, it means that the transformed string did not -entirely fit in the array @var{to}. In this case, only as much of the -string as actually fits was stored. To get the whole transformed -string, call @code{strxfrm} again with a bigger output array. - -The transformed string may be longer than the original string, and it -may also be shorter. - -If @var{size} is zero, no bytes are stored in @var{to}. In this -case, @code{strxfrm} simply returns the number of bytes that would -be the length of the transformed string. This is useful for determining -what size the allocated array should be. It does not matter what -@var{to} is if @var{size} is zero; @var{to} may even be a null pointer. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun size_t wcsxfrm (wchar_t *restrict @var{wto}, const wchar_t *@var{wfrom}, size_t @var{size}) -@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -The function @code{wcsxfrm} transforms wide string @var{wfrom} -using the collation transformation determined by the locale currently -selected for collation, and stores the transformed string in the array -@var{wto}. Up to @var{size} wide characters (including a terminating null -wide character) are stored. - -The behavior is undefined if the strings @var{wto} and @var{wfrom} -overlap; see @ref{Copying Strings and Arrays}. - -The return value is the length of the entire transformed wide -string. This value is not affected by the value of @var{size}, but if -it is greater or equal than @var{size}, it means that the transformed -wide string did not entirely fit in the array @var{wto}. In -this case, only as much of the wide string as actually fits -was stored. To get the whole transformed wide string, call -@code{wcsxfrm} again with a bigger output array. - -The transformed wide string may be longer than the original -wide string, and it may also be shorter. - -If @var{size} is zero, no wide characters are stored in @var{to}. In this -case, @code{wcsxfrm} simply returns the number of wide characters that -would be the length of the transformed wide string. This is -useful for determining what size the allocated array should be (remember -to multiply with @code{sizeof (wchar_t)}). It does not matter what -@var{wto} is if @var{size} is zero; @var{wto} may even be a null pointer. -@end deftypefun - -Here is an example of how you can use @code{strxfrm} when -you plan to do many comparisons. It does the same thing as the previous -example, but much faster, because it has to transform each string only -once, no matter how many times it is compared with other strings. Even -the time needed to allocate and free storage is much less than the time -we save, when there are many strings. - -@smallexample -struct sorter @{ char *input; char *transformed; @}; - -/* @r{This is the comparison function used with @code{qsort}} - @r{to sort an array of @code{struct sorter}.} */ - -int -compare_elements (const void *v1, const void *v2) -@{ - const struct sorter *p1 = v1; - const struct sorter *p2 = v2; - - return strcmp (p1->transformed, p2->transformed); -@} - -/* @r{This is the entry point---the function to sort} - @r{strings using the locale's collating sequence.} */ - -void -sort_strings_fast (char **array, int nstrings) -@{ - struct sorter temp_array[nstrings]; - int i; - - /* @r{Set up @code{temp_array}. Each element contains} - @r{one input string and its transformed string.} */ - for (i = 0; i < nstrings; i++) - @{ - size_t length = strlen (array[i]) * 2; - char *transformed; - size_t transformed_length; - - temp_array[i].input = array[i]; - - /* @r{First try a buffer perhaps big enough.} */ - transformed = (char *) xmalloc (length); - - /* @r{Transform @code{array[i]}.} */ - transformed_length = strxfrm (transformed, array[i], length); - - /* @r{If the buffer was not large enough, resize it} - @r{and try again.} */ - if (transformed_length >= length) - @{ - /* @r{Allocate the needed space. +1 for terminating} - @r{@code{'\0'} byte.} */ - transformed = (char *) xrealloc (transformed, - transformed_length + 1); - - /* @r{The return value is not interesting because we know} - @r{how long the transformed string is.} */ - (void) strxfrm (transformed, array[i], - transformed_length + 1); - @} - - temp_array[i].transformed = transformed; - @} - - /* @r{Sort @code{temp_array} by comparing transformed strings.} */ - qsort (temp_array, nstrings, - sizeof (struct sorter), compare_elements); - - /* @r{Put the elements back in the permanent array} - @r{in their sorted order.} */ - for (i = 0; i < nstrings; i++) - array[i] = temp_array[i].input; - - /* @r{Free the strings we allocated.} */ - for (i = 0; i < nstrings; i++) - free (temp_array[i].transformed); -@} -@end smallexample - -The interesting part of this code for the wide character version would -look like this: - -@smallexample -void -sort_strings_fast (wchar_t **array, int nstrings) -@{ - @dots{} - /* @r{Transform @code{array[i]}.} */ - transformed_length = wcsxfrm (transformed, array[i], length); - - /* @r{If the buffer was not large enough, resize it} - @r{and try again.} */ - if (transformed_length >= length) - @{ - /* @r{Allocate the needed space. +1 for terminating} - @r{@code{L'\0'} wide character.} */ - transformed = (wchar_t *) xrealloc (transformed, - (transformed_length + 1) - * sizeof (wchar_t)); - - /* @r{The return value is not interesting because we know} - @r{how long the transformed string is.} */ - (void) wcsxfrm (transformed, array[i], - transformed_length + 1); - @} - @dots{} -@end smallexample - -@noindent -Note the additional multiplication with @code{sizeof (wchar_t)} in the -@code{realloc} call. - -@strong{Compatibility Note:} The string collation functions are a new -feature of @w{ISO C90}. Older C dialects have no equivalent feature. -The wide character versions were introduced in @w{Amendment 1} to @w{ISO -C90}. - -@node Search Functions -@section Search Functions - -This section describes library functions which perform various kinds -of searching operations on strings and arrays. These functions are -declared in the header file @file{string.h}. -@pindex string.h -@cindex search functions (for strings) -@cindex string search functions - -@comment string.h -@comment ISO -@deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function finds the first occurrence of the byte @var{c} (converted -to an @code{unsigned char}) in the initial @var{size} bytes of the -object beginning at @var{block}. The return value is a pointer to the -located byte, or a null pointer if no match was found. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun {wchar_t *} wmemchr (const wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function finds the first occurrence of the wide character @var{wc} -in the initial @var{size} wide characters of the object beginning at -@var{block}. The return value is a pointer to the located wide -character, or a null pointer if no match was found. -@end deftypefun - -@comment string.h -@comment GNU -@deftypefun {void *} rawmemchr (const void *@var{block}, int @var{c}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -Often the @code{memchr} function is used with the knowledge that the -byte @var{c} is available in the memory block specified by the -parameters. But this means that the @var{size} parameter is not really -needed and that the tests performed with it at runtime (to check whether -the end of the block is reached) are not needed. - -The @code{rawmemchr} function exists for just this situation which is -surprisingly frequent. The interface is similar to @code{memchr} except -that the @var{size} parameter is missing. The function will look beyond -the end of the block pointed to by @var{block} in case the programmer -made an error in assuming that the byte @var{c} is present in the block. -In this case the result is unspecified. Otherwise the return value is a -pointer to the located byte. - -This function is of special interest when looking for the end of a -string. Since all strings are terminated by a null byte a call like - -@smallexample - rawmemchr (str, '\0') -@end smallexample - -@noindent -will never go beyond the end of the string. - -This function is a GNU extension. -@end deftypefun - -@comment string.h -@comment GNU -@deftypefun {void *} memrchr (const void *@var{block}, int @var{c}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The function @code{memrchr} is like @code{memchr}, except that it searches -backwards from the end of the block defined by @var{block} and @var{size} -(instead of forwards from the front). - -This function is a GNU extension. -@end deftypefun - -@comment string.h -@comment ISO -@deftypefun {char *} strchr (const char *@var{string}, int @var{c}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{strchr} function finds the first occurrence of the byte -@var{c} (converted to a @code{char}) in the string -beginning at @var{string}. The return value is a pointer to the located -byte, or a null pointer if no match was found. - -For example, -@smallexample -strchr ("hello, world", 'l') - @result{} "llo, world" -strchr ("hello, world", '?') - @result{} NULL -@end smallexample - -The terminating null byte is considered to be part of the string, -so you can use this function get a pointer to the end of a string by -specifying zero as the value of the @var{c} argument. - -When @code{strchr} returns a null pointer, it does not let you know -the position of the terminating null byte it has found. If you -need that information, it is better (but less portable) to use -@code{strchrnul} than to search for it a second time. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun {wchar_t *} wcschr (const wchar_t *@var{wstring}, int @var{wc}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{wcschr} function finds the first occurrence of the wide -character @var{wc} in the wide string -beginning at @var{wstring}. The return value is a pointer to the -located wide character, or a null pointer if no match was found. - -The terminating null wide character is considered to be part of the wide -string, so you can use this function get a pointer to the end -of a wide string by specifying a null wide character as the -value of the @var{wc} argument. It would be better (but less portable) -to use @code{wcschrnul} in this case, though. -@end deftypefun - -@comment string.h -@comment GNU -@deftypefun {char *} strchrnul (const char *@var{string}, int @var{c}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@code{strchrnul} is the same as @code{strchr} except that if it does -not find the byte, it returns a pointer to string's terminating -null byte rather than a null pointer. - -This function is a GNU extension. -@end deftypefun - -@comment wchar.h -@comment GNU -@deftypefun {wchar_t *} wcschrnul (const wchar_t *@var{wstring}, wchar_t @var{wc}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@code{wcschrnul} is the same as @code{wcschr} except that if it does not -find the wide character, it returns a pointer to the wide string's -terminating null wide character rather than a null pointer. - -This function is a GNU extension. -@end deftypefun - -One useful, but unusual, use of the @code{strchr} -function is when one wants to have a pointer pointing to the null byte -terminating a string. This is often written in this way: - -@smallexample - s += strlen (s); -@end smallexample - -@noindent -This is almost optimal but the addition operation duplicated a bit of -the work already done in the @code{strlen} function. A better solution -is this: - -@smallexample - s = strchr (s, '\0'); -@end smallexample - -There is no restriction on the second parameter of @code{strchr} so it -could very well also be zero. Those readers thinking very -hard about this might now point out that the @code{strchr} function is -more expensive than the @code{strlen} function since we have two abort -criteria. This is right. But in @theglibc{} the implementation of -@code{strchr} is optimized in a special way so that @code{strchr} -actually is faster. - -@comment string.h -@comment ISO -@deftypefun {char *} strrchr (const char *@var{string}, int @var{c}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The function @code{strrchr} is like @code{strchr}, except that it searches -backwards from the end of the string @var{string} (instead of forwards -from the front). - -For example, -@smallexample -strrchr ("hello, world", 'l') - @result{} "ld" -@end smallexample -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun {wchar_t *} wcsrchr (const wchar_t *@var{wstring}, wchar_t @var{c}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The function @code{wcsrchr} is like @code{wcschr}, except that it searches -backwards from the end of the string @var{wstring} (instead of forwards -from the front). -@end deftypefun - -@comment string.h -@comment ISO -@deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This is like @code{strchr}, except that it searches @var{haystack} for a -substring @var{needle} rather than just a single byte. It -returns a pointer into the string @var{haystack} that is the first -byte of the substring, or a null pointer if no match was found. If -@var{needle} is an empty string, the function returns @var{haystack}. - -For example, -@smallexample -strstr ("hello, world", "l") - @result{} "llo, world" -strstr ("hello, world", "wo") - @result{} "world" -@end smallexample -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun {wchar_t *} wcsstr (const wchar_t *@var{haystack}, const wchar_t *@var{needle}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This is like @code{wcschr}, except that it searches @var{haystack} for a -substring @var{needle} rather than just a single wide character. It -returns a pointer into the string @var{haystack} that is the first wide -character of the substring, or a null pointer if no match was found. If -@var{needle} is an empty string, the function returns @var{haystack}. -@end deftypefun - -@comment wchar.h -@comment XPG -@deftypefun {wchar_t *} wcswcs (const wchar_t *@var{haystack}, const wchar_t *@var{needle}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@code{wcswcs} is a deprecated alias for @code{wcsstr}. This is the -name originally used in the X/Open Portability Guide before the -@w{Amendment 1} to @w{ISO C90} was published. -@end deftypefun - - -@comment string.h -@comment GNU -@deftypefun {char *} strcasestr (const char *@var{haystack}, const char *@var{needle}) -@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}} -@c There may be multiple calls of strncasecmp, each accessing the locale -@c object independently. -This is like @code{strstr}, except that it ignores case in searching for -the substring. Like @code{strcasecmp}, it is locale dependent how -uppercase and lowercase characters are related, and arguments are -multibyte strings. - - -For example, -@smallexample -strcasestr ("hello, world", "L") - @result{} "llo, world" -strcasestr ("hello, World", "wo") - @result{} "World" -@end smallexample -@end deftypefun - - -@comment string.h -@comment GNU -@deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This is like @code{strstr}, but @var{needle} and @var{haystack} are byte -arrays rather than strings. @var{needle-len} is the -length of @var{needle} and @var{haystack-len} is the length of -@var{haystack}.@refill - -This function is a GNU extension. -@end deftypefun - -@comment string.h -@comment ISO -@deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{strspn} (``string span'') function returns the length of the -initial substring of @var{string} that consists entirely of bytes that -are members of the set specified by the string @var{skipset}. The order -of the bytes in @var{skipset} is not important. - -For example, -@smallexample -strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz") - @result{} 5 -@end smallexample - -In a multibyte string, characters consisting of -more than one byte are not treated as single entities. Each byte is treated -separately. The function is not locale-dependent. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun size_t wcsspn (const wchar_t *@var{wstring}, const wchar_t *@var{skipset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{wcsspn} (``wide character string span'') function returns the -length of the initial substring of @var{wstring} that consists entirely -of wide characters that are members of the set specified by the string -@var{skipset}. The order of the wide characters in @var{skipset} is not -important. -@end deftypefun - -@comment string.h -@comment ISO -@deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{strcspn} (``string complement span'') function returns the length -of the initial substring of @var{string} that consists entirely of bytes -that are @emph{not} members of the set specified by the string @var{stopset}. -(In other words, it returns the offset of the first byte in @var{string} -that is a member of the set @var{stopset}.) - -For example, -@smallexample -strcspn ("hello, world", " \t\n,.;!?") - @result{} 5 -@end smallexample - -In a multibyte string, characters consisting of -more than one byte are not treated as a single entities. Each byte is treated -separately. The function is not locale-dependent. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun size_t wcscspn (const wchar_t *@var{wstring}, const wchar_t *@var{stopset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{wcscspn} (``wide character string complement span'') function -returns the length of the initial substring of @var{wstring} that -consists entirely of wide characters that are @emph{not} members of the -set specified by the string @var{stopset}. (In other words, it returns -the offset of the first wide character in @var{string} that is a member of -the set @var{stopset}.) -@end deftypefun - -@comment string.h -@comment ISO -@deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{strpbrk} (``string pointer break'') function is related to -@code{strcspn}, except that it returns a pointer to the first byte -in @var{string} that is a member of the set @var{stopset} instead of the -length of the initial substring. It returns a null pointer if no such -byte from @var{stopset} is found. - -@c @group Invalid outside the example. -For example, - -@smallexample -strpbrk ("hello, world", " \t\n,.;!?") - @result{} ", world" -@end smallexample -@c @end group - -In a multibyte string, characters consisting of -more than one byte are not treated as single entities. Each byte is treated -separately. The function is not locale-dependent. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun {wchar_t *} wcspbrk (const wchar_t *@var{wstring}, const wchar_t *@var{stopset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{wcspbrk} (``wide character string pointer break'') function is -related to @code{wcscspn}, except that it returns a pointer to the first -wide character in @var{wstring} that is a member of the set -@var{stopset} instead of the length of the initial substring. It -returns a null pointer if no such wide character from @var{stopset} is found. -@end deftypefun - - -@subsection Compatibility String Search Functions - -@comment string.h -@comment BSD -@deftypefun {char *} index (const char *@var{string}, int @var{c}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@code{index} is another name for @code{strchr}; they are exactly the same. -New code should always use @code{strchr} since this name is defined in -@w{ISO C} while @code{index} is a BSD invention which never was available -on @w{System V} derived systems. -@end deftypefun - -@comment string.h -@comment BSD -@deftypefun {char *} rindex (const char *@var{string}, int @var{c}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@code{rindex} is another name for @code{strrchr}; they are exactly the same. -New code should always use @code{strrchr} since this name is defined in -@w{ISO C} while @code{rindex} is a BSD invention which never was available -on @w{System V} derived systems. -@end deftypefun - -@node Finding Tokens in a String -@section Finding Tokens in a String - -@cindex tokenizing strings -@cindex breaking a string into tokens -@cindex parsing tokens from a string -It's fairly common for programs to have a need to do some simple kinds -of lexical analysis and parsing, such as splitting a command string up -into tokens. You can do this with the @code{strtok} function, declared -in the header file @file{string.h}. -@pindex string.h - -@comment string.h -@comment ISO -@deftypefun {char *} strtok (char *restrict @var{newstring}, const char *restrict @var{delimiters}) -@safety{@prelim{}@mtunsafe{@mtasurace{:strtok}}@asunsafe{}@acsafe{}} -A string can be split into tokens by making a series of calls to the -function @code{strtok}. - -The string to be split up is passed as the @var{newstring} argument on -the first call only. The @code{strtok} function uses this to set up -some internal state information. Subsequent calls to get additional -tokens from the same string are indicated by passing a null pointer as -the @var{newstring} argument. Calling @code{strtok} with another -non-null @var{newstring} argument reinitializes the state information. -It is guaranteed that no other library function ever calls @code{strtok} -behind your back (which would mess up this internal state information). - -The @var{delimiters} argument is a string that specifies a set of delimiters -that may surround the token being extracted. All the initial bytes -that are members of this set are discarded. The first byte that is -@emph{not} a member of this set of delimiters marks the beginning of the -next token. The end of the token is found by looking for the next -byte that is a member of the delimiter set. This byte in the -original string @var{newstring} is overwritten by a null byte, and the -pointer to the beginning of the token in @var{newstring} is returned. - -On the next call to @code{strtok}, the searching begins at the next -byte beyond the one that marked the end of the previous token. -Note that the set of delimiters @var{delimiters} do not have to be the -same on every call in a series of calls to @code{strtok}. - -If the end of the string @var{newstring} is reached, or if the remainder of -string consists only of delimiter bytes, @code{strtok} returns -a null pointer. - -In a multibyte string, characters consisting of -more than one byte are not treated as single entities. Each byte is treated -separately. The function is not locale-dependent. -@end deftypefun - -@comment wchar.h -@comment ISO -@deftypefun {wchar_t *} wcstok (wchar_t *@var{newstring}, const wchar_t *@var{delimiters}, wchar_t **@var{save_ptr}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -A string can be split into tokens by making a series of calls to the -function @code{wcstok}. - -The string to be split up is passed as the @var{newstring} argument on -the first call only. The @code{wcstok} function uses this to set up -some internal state information. Subsequent calls to get additional -tokens from the same wide string are indicated by passing a -null pointer as the @var{newstring} argument, which causes the pointer -previously stored in @var{save_ptr} to be used instead. - -The @var{delimiters} argument is a wide string that specifies -a set of delimiters that may surround the token being extracted. All -the initial wide characters that are members of this set are discarded. -The first wide character that is @emph{not} a member of this set of -delimiters marks the beginning of the next token. The end of the token -is found by looking for the next wide character that is a member of the -delimiter set. This wide character in the original wide -string @var{newstring} is overwritten by a null wide character, the -pointer past the overwritten wide character is saved in @var{save_ptr}, -and the pointer to the beginning of the token in @var{newstring} is -returned. - -On the next call to @code{wcstok}, the searching begins at the next -wide character beyond the one that marked the end of the previous token. -Note that the set of delimiters @var{delimiters} do not have to be the -same on every call in a series of calls to @code{wcstok}. - -If the end of the wide string @var{newstring} is reached, or -if the remainder of string consists only of delimiter wide characters, -@code{wcstok} returns a null pointer. -@end deftypefun - -@strong{Warning:} Since @code{strtok} and @code{wcstok} alter the string -they is parsing, you should always copy the string to a temporary buffer -before parsing it with @code{strtok}/@code{wcstok} (@pxref{Copying Strings -and Arrays}). If you allow @code{strtok} or @code{wcstok} to modify -a string that came from another part of your program, you are asking for -trouble; that string might be used for other purposes after -@code{strtok} or @code{wcstok} has modified it, and it would not have -the expected value. - -The string that you are operating on might even be a constant. Then -when @code{strtok} or @code{wcstok} tries to modify it, your program -will get a fatal signal for writing in read-only memory. @xref{Program -Error Signals}. Even if the operation of @code{strtok} or @code{wcstok} -would not require a modification of the string (e.g., if there is -exactly one token) the string can (and in the @glibcadj{} case will) be -modified. - -This is a special case of a general principle: if a part of a program -does not have as its purpose the modification of a certain data -structure, then it is error-prone to modify the data structure -temporarily. - -The function @code{strtok} is not reentrant, whereas @code{wcstok} is. -@xref{Nonreentrancy}, for a discussion of where and why reentrancy is -important. - -Here is a simple example showing the use of @code{strtok}. - -@comment Yes, this example has been tested. -@smallexample -#include <string.h> -#include <stddef.h> - -@dots{} - -const char string[] = "words separated by spaces -- and, punctuation!"; -const char delimiters[] = " .,;:!-"; -char *token, *cp; - -@dots{} - -cp = strdupa (string); /* Make writable copy. */ -token = strtok (cp, delimiters); /* token => "words" */ -token = strtok (NULL, delimiters); /* token => "separated" */ -token = strtok (NULL, delimiters); /* token => "by" */ -token = strtok (NULL, delimiters); /* token => "spaces" */ -token = strtok (NULL, delimiters); /* token => "and" */ -token = strtok (NULL, delimiters); /* token => "punctuation" */ -token = strtok (NULL, delimiters); /* token => NULL */ -@end smallexample - -@Theglibc{} contains two more functions for tokenizing a string -which overcome the limitation of non-reentrancy. They are not -available available for wide strings. - -@comment string.h -@comment POSIX -@deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -Just like @code{strtok}, this function splits the string into several -tokens which can be accessed by successive calls to @code{strtok_r}. -The difference is that, as in @code{wcstok}, the information about the -next token is stored in the space pointed to by the third argument, -@var{save_ptr}, which is a pointer to a string pointer. Calling -@code{strtok_r} with a null pointer for @var{newstring} and leaving -@var{save_ptr} between the calls unchanged does the job without -hindering reentrancy. - -This function is defined in POSIX.1 and can be found on many systems -which support multi-threading. -@end deftypefun - -@comment string.h -@comment BSD -@deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function has a similar functionality as @code{strtok_r} with the -@var{newstring} argument replaced by the @var{save_ptr} argument. The -initialization of the moving pointer has to be done by the user. -Successive calls to @code{strsep} move the pointer along the tokens -separated by @var{delimiter}, returning the address of the next token -and updating @var{string_ptr} to point to the beginning of the next -token. - -One difference between @code{strsep} and @code{strtok_r} is that if the -input string contains more than one byte from @var{delimiter} in a -row @code{strsep} returns an empty string for each pair of bytes -from @var{delimiter}. This means that a program normally should test -for @code{strsep} returning an empty string before processing it. - -This function was introduced in 4.3BSD and therefore is widely available. -@end deftypefun - -Here is how the above example looks like when @code{strsep} is used. - -@comment Yes, this example has been tested. -@smallexample -#include <string.h> -#include <stddef.h> - -@dots{} - -const char string[] = "words separated by spaces -- and, punctuation!"; -const char delimiters[] = " .,;:!-"; -char *running; -char *token; - -@dots{} - -running = strdupa (string); -token = strsep (&running, delimiters); /* token => "words" */ -token = strsep (&running, delimiters); /* token => "separated" */ -token = strsep (&running, delimiters); /* token => "by" */ -token = strsep (&running, delimiters); /* token => "spaces" */ -token = strsep (&running, delimiters); /* token => "" */ -token = strsep (&running, delimiters); /* token => "" */ -token = strsep (&running, delimiters); /* token => "" */ -token = strsep (&running, delimiters); /* token => "and" */ -token = strsep (&running, delimiters); /* token => "" */ -token = strsep (&running, delimiters); /* token => "punctuation" */ -token = strsep (&running, delimiters); /* token => "" */ -token = strsep (&running, delimiters); /* token => NULL */ -@end smallexample - -@comment string.h -@comment GNU -@deftypefun {char *} basename (const char *@var{filename}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The GNU version of the @code{basename} function returns the last -component of the path in @var{filename}. This function is the preferred -usage, since it does not modify the argument, @var{filename}, and -respects trailing slashes. The prototype for @code{basename} can be -found in @file{string.h}. Note, this function is overridden by the XPG -version, if @file{libgen.h} is included. - -Example of using GNU @code{basename}: - -@smallexample -#include <string.h> - -int -main (int argc, char *argv[]) -@{ - char *prog = basename (argv[0]); - - if (argc < 2) - @{ - fprintf (stderr, "Usage %s <arg>\n", prog); - exit (1); - @} - - @dots{} -@} -@end smallexample - -@strong{Portability Note:} This function may produce different results -on different systems. - -@end deftypefun - -@comment libgen.h -@comment XPG -@deftypefun {char *} basename (char *@var{path}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This is the standard XPG defined @code{basename}. It is similar in -spirit to the GNU version, but may modify the @var{path} by removing -trailing '/' bytes. If the @var{path} is made up entirely of '/' -bytes, then "/" will be returned. Also, if @var{path} is -@code{NULL} or an empty string, then "." is returned. The prototype for -the XPG version can be found in @file{libgen.h}. - -Example of using XPG @code{basename}: - -@smallexample -#include <libgen.h> - -int -main (int argc, char *argv[]) -@{ - char *prog; - char *path = strdupa (argv[0]); - - prog = basename (path); - - if (argc < 2) - @{ - fprintf (stderr, "Usage %s <arg>\n", prog); - exit (1); - @} - - @dots{} - -@} -@end smallexample -@end deftypefun - -@comment libgen.h -@comment XPG -@deftypefun {char *} dirname (char *@var{path}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{dirname} function is the compliment to the XPG version of -@code{basename}. It returns the parent directory of the file specified -by @var{path}. If @var{path} is @code{NULL}, an empty string, or -contains no '/' bytes, then "." is returned. The prototype for this -function can be found in @file{libgen.h}. -@end deftypefun - -@node Erasing Sensitive Data -@section Erasing Sensitive Data - -Sensitive data, such as cryptographic keys, should be erased from -memory after use, to reduce the risk that a bug will expose it to the -outside world. However, compiler optimizations may determine that an -erasure operation is ``unnecessary,'' and remove it from the generated -code, because no @emph{correct} program could access the variable or -heap object containing the sensitive data after it's deallocated. -Since erasure is a precaution against bugs, this optimization is -inappropriate. - -The function @code{explicit_bzero} erases a block of memory, and -guarantees that the compiler will not remove the erasure as -``unnecessary.'' - -@smallexample -@group -#include <string.h> - -extern void encrypt (const char *key, const char *in, - char *out, size_t n); -extern void genkey (const char *phrase, char *key); - -void encrypt_with_phrase (const char *phrase, const char *in, - char *out, size_t n) -@{ - char key[16]; - genkey (phrase, key); - encrypt (key, in, out, n); - explicit_bzero (key, 16); -@} -@end group -@end smallexample - -@noindent -In this example, if @code{memset}, @code{bzero}, or a hand-written -loop had been used, the compiler might remove them as ``unnecessary.'' - -@strong{Warning:} @code{explicit_bzero} does not guarantee that -sensitive data is @emph{completely} erased from the computer's memory. -There may be copies in temporary storage areas, such as registers and -``scratch'' stack space; since these are invisible to the source code, -a library function cannot erase them. - -Also, @code{explicit_bzero} only operates on RAM. If a sensitive data -object never needs to have its address taken other than to call -@code{explicit_bzero}, it might be stored entirely in CPU registers -@emph{until} the call to @code{explicit_bzero}. Then it will be -copied into RAM, the copy will be erased, and the original will remain -intact. Data in RAM is more likely to be exposed by a bug than data -in registers, so this creates a brief window where the data is at -greater risk of exposure than it would have been if the program didn't -try to erase it at all. - -Declaring sensitive variables as @code{volatile} will make both the -above problems @emph{worse}; a @code{volatile} variable will be stored -in memory for its entire lifetime, and the compiler will make -@emph{more} copies of it than it would otherwise have. Attempting to -erase a normal variable ``by hand'' through a -@code{volatile}-qualified pointer doesn't work at all---because the -variable itself is not @code{volatile}, some compilers will ignore the -qualification on the pointer and remove the erasure anyway. - -Having said all that, in most situations, using @code{explicit_bzero} -is better than not using it. At present, the only way to do a more -thorough job is to write the entire sensitive operation in assembly -language. We anticipate that future compilers will recognize calls to -@code{explicit_bzero} and take appropriate steps to erase all the -copies of the affected data, whereever they may be. - -@comment string.h -@comment BSD -@deftypefun void explicit_bzero (void *@var{block}, size_t @var{len}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} - -@code{explicit_bzero} writes zero into @var{len} bytes of memory -beginning at @var{block}, just as @code{bzero} would. The zeroes are -always written, even if the compiler could determine that this is -``unnecessary'' because no correct program could read them back. - -@strong{Note:} The @emph{only} optimization that @code{explicit_bzero} -disables is removal of ``unnecessary'' writes to memory. The compiler -can perform all the other optimizations that it could for a call to -@code{memset}. For instance, it may replace the function call with -inline memory writes, and it may assume that @var{block} cannot be a -null pointer. - -@strong{Portability Note:} This function first appeared in OpenBSD 5.5 -and has not been standardized. Other systems may provide the same -functionality under a different name, such as @code{explicit_memset}, -@code{memset_s}, or @code{SecureZeroMemory}. - -@Theglibc{} declares this function in @file{string.h}, but on other -systems it may be in @file{strings.h} instead. -@end deftypefun - -@node strfry -@section strfry - -The function below addresses the perennial programming quandary: ``How do -I take good data in string form and painlessly turn it into garbage?'' -This is actually a fairly simple task for C programmers who do not use -@theglibc{} string functions, but for programs based on @theglibc{}, -the @code{strfry} function is the preferred method for -destroying string data. - -The prototype for this function is in @file{string.h}. - -@comment string.h -@comment GNU -@deftypefun {char *} strfry (char *@var{string}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c Calls initstate_r, time, getpid, strlen, and random_r. - -@code{strfry} creates a pseudorandom anagram of a string, replacing the -input with the anagram in place. For each position in the string, -@code{strfry} swaps it with a position in the string selected at random -(from a uniform distribution). The two positions may be the same. - -The return value of @code{strfry} is always @var{string}. - -@strong{Portability Note:} This function is unique to @theglibc{}. - -@end deftypefun - - -@node Trivial Encryption -@section Trivial Encryption -@cindex encryption - - -The @code{memfrob} function converts an array of data to something -unrecognizable and back again. It is not encryption in its usual sense -since it is easy for someone to convert the encrypted data back to clear -text. The transformation is analogous to Usenet's ``Rot13'' encryption -method for obscuring offensive jokes from sensitive eyes and such. -Unlike Rot13, @code{memfrob} works on arbitrary binary data, not just -text. -@cindex Rot13 - -For true encryption, @xref{Cryptographic Functions}. - -This function is declared in @file{string.h}. -@pindex string.h - -@comment string.h -@comment GNU -@deftypefun {void *} memfrob (void *@var{mem}, size_t @var{length}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} - -@code{memfrob} transforms (frobnicates) each byte of the data structure -at @var{mem}, which is @var{length} bytes long, by bitwise exclusive -oring it with binary 00101010. It does the transformation in place and -its return value is always @var{mem}. - -Note that @code{memfrob} a second time on the same data structure -returns it to its original state. - -This is a good function for hiding information from someone who doesn't -want to see it or doesn't want to see it very much. To really prevent -people from retrieving the information, use stronger encryption such as -that described in @xref{Cryptographic Functions}. - -@strong{Portability Note:} This function is unique to @theglibc{}. - -@end deftypefun - -@node Encode Binary Data -@section Encode Binary Data - -To store or transfer binary data in environments which only support text -one has to encode the binary data by mapping the input bytes to -bytes in the range allowed for storing or transferring. SVID -systems (and nowadays XPG compliant systems) provide minimal support for -this task. - -@comment stdlib.h -@comment XPG -@deftypefun {char *} l64a (long int @var{n}) -@safety{@prelim{}@mtunsafe{@mtasurace{:l64a}}@asunsafe{}@acsafe{}} -This function encodes a 32-bit input value using bytes from the -basic character set. It returns a pointer to a 7 byte buffer which -contains an encoded version of @var{n}. To encode a series of bytes the -user must copy the returned string to a destination buffer. It returns -the empty string if @var{n} is zero, which is somewhat bizarre but -mandated by the standard.@* -@strong{Warning:} Since a static buffer is used this function should not -be used in multi-threaded programs. There is no thread-safe alternative -to this function in the C library.@* -@strong{Compatibility Note:} The XPG standard states that the return -value of @code{l64a} is undefined if @var{n} is negative. In the GNU -implementation, @code{l64a} treats its argument as unsigned, so it will -return a sensible encoding for any nonzero @var{n}; however, portable -programs should not rely on this. - -To encode a large buffer @code{l64a} must be called in a loop, once for -each 32-bit word of the buffer. For example, one could do something -like this: - -@smallexample -char * -encode (const void *buf, size_t len) -@{ - /* @r{We know in advance how long the buffer has to be.} */ - unsigned char *in = (unsigned char *) buf; - char *out = malloc (6 + ((len + 3) / 4) * 6 + 1); - char *cp = out, *p; - - /* @r{Encode the length.} */ - /* @r{Using `htonl' is necessary so that the data can be} - @r{decoded even on machines with different byte order.} - @r{`l64a' can return a string shorter than 6 bytes, so } - @r{we pad it with encoding of 0 (}'.'@r{) at the end by } - @r{hand.} */ - - p = stpcpy (cp, l64a (htonl (len))); - cp = mempcpy (p, "......", 6 - (p - cp)); - - while (len > 3) - @{ - unsigned long int n = *in++; - n = (n << 8) | *in++; - n = (n << 8) | *in++; - n = (n << 8) | *in++; - len -= 4; - p = stpcpy (cp, l64a (htonl (n))); - cp = mempcpy (p, "......", 6 - (p - cp)); - @} - if (len > 0) - @{ - unsigned long int n = *in++; - if (--len > 0) - @{ - n = (n << 8) | *in++; - if (--len > 0) - n = (n << 8) | *in; - @} - cp = stpcpy (cp, l64a (htonl (n))); - @} - *cp = '\0'; - return out; -@} -@end smallexample - -It is strange that the library does not provide the complete -functionality needed but so be it. - -@end deftypefun - -To decode data produced with @code{l64a} the following function should be -used. - -@comment stdlib.h -@comment XPG -@deftypefun {long int} a64l (const char *@var{string}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The parameter @var{string} should contain a string which was produced by -a call to @code{l64a}. The function processes at least 6 bytes of -this string, and decodes the bytes it finds according to the table -below. It stops decoding when it finds a byte not in the table, -rather like @code{atoi}; if you have a buffer which has been broken into -lines, you must be careful to skip over the end-of-line bytes. - -The decoded number is returned as a @code{long int} value. -@end deftypefun - -The @code{l64a} and @code{a64l} functions use a base 64 encoding, in -which each byte of an encoded string represents six bits of an -input word. These symbols are used for the base 64 digits: - -@multitable {xxxxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} -@item @tab 0 @tab 1 @tab 2 @tab 3 @tab 4 @tab 5 @tab 6 @tab 7 -@item 0 @tab @code{.} @tab @code{/} @tab @code{0} @tab @code{1} - @tab @code{2} @tab @code{3} @tab @code{4} @tab @code{5} -@item 8 @tab @code{6} @tab @code{7} @tab @code{8} @tab @code{9} - @tab @code{A} @tab @code{B} @tab @code{C} @tab @code{D} -@item 16 @tab @code{E} @tab @code{F} @tab @code{G} @tab @code{H} - @tab @code{I} @tab @code{J} @tab @code{K} @tab @code{L} -@item 24 @tab @code{M} @tab @code{N} @tab @code{O} @tab @code{P} - @tab @code{Q} @tab @code{R} @tab @code{S} @tab @code{T} -@item 32 @tab @code{U} @tab @code{V} @tab @code{W} @tab @code{X} - @tab @code{Y} @tab @code{Z} @tab @code{a} @tab @code{b} -@item 40 @tab @code{c} @tab @code{d} @tab @code{e} @tab @code{f} - @tab @code{g} @tab @code{h} @tab @code{i} @tab @code{j} -@item 48 @tab @code{k} @tab @code{l} @tab @code{m} @tab @code{n} - @tab @code{o} @tab @code{p} @tab @code{q} @tab @code{r} -@item 56 @tab @code{s} @tab @code{t} @tab @code{u} @tab @code{v} - @tab @code{w} @tab @code{x} @tab @code{y} @tab @code{z} -@end multitable - -This encoding scheme is not standard. There are some other encoding -methods which are much more widely used (UU encoding, MIME encoding). -Generally, it is better to use one of these encodings. - -@node Argz and Envz Vectors -@section Argz and Envz Vectors - -@cindex argz vectors (string vectors) -@cindex string vectors, null-byte separated -@cindex argument vectors, null-byte separated -@dfn{argz vectors} are vectors of strings in a contiguous block of -memory, each element separated from its neighbors by null bytes -(@code{'\0'}). - -@cindex envz vectors (environment vectors) -@cindex environment vectors, null-byte separated -@dfn{Envz vectors} are an extension of argz vectors where each element is a -name-value pair, separated by a @code{'='} byte (as in a Unix -environment). - -@menu -* Argz Functions:: Operations on argz vectors. -* Envz Functions:: Additional operations on environment vectors. -@end menu - -@node Argz Functions, Envz Functions, , Argz and Envz Vectors -@subsection Argz Functions - -Each argz vector is represented by a pointer to the first element, of -type @code{char *}, and a size, of type @code{size_t}, both of which can -be initialized to @code{0} to represent an empty argz vector. All argz -functions accept either a pointer and a size argument, or pointers to -them, if they will be modified. - -The argz functions use @code{malloc}/@code{realloc} to allocate/grow -argz vectors, and so any argz vector created using these functions may -be freed by using @code{free}; conversely, any argz function that may -grow a string expects that string to have been allocated using -@code{malloc} (those argz functions that only examine their arguments or -modify them in place will work on any sort of memory). -@xref{Unconstrained Allocation}. - -All argz functions that do memory allocation have a return type of -@code{error_t}, and return @code{0} for success, and @code{ENOMEM} if an -allocation error occurs. - -@pindex argz.h -These functions are declared in the standard include file @file{argz.h}. - -@comment argz.h -@comment GNU -@deftypefun {error_t} argz_create (char *const @var{argv}[], char **@var{argz}, size_t *@var{argz_len}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -The @code{argz_create} function converts the Unix-style argument vector -@var{argv} (a vector of pointers to normal C strings, terminated by -@code{(char *)0}; @pxref{Program Arguments}) into an argz vector with -the same elements, which is returned in @var{argz} and @var{argz_len}. -@end deftypefun - -@comment argz.h -@comment GNU -@deftypefun {error_t} argz_create_sep (const char *@var{string}, int @var{sep}, char **@var{argz}, size_t *@var{argz_len}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -The @code{argz_create_sep} function converts the string -@var{string} into an argz vector (returned in @var{argz} and -@var{argz_len}) by splitting it into elements at every occurrence of the -byte @var{sep}. -@end deftypefun - -@comment argz.h -@comment GNU -@deftypefun {size_t} argz_count (const char *@var{argz}, size_t @var{argz_len}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -Returns the number of elements in the argz vector @var{argz} and -@var{argz_len}. -@end deftypefun - -@comment argz.h -@comment GNU -@deftypefun {void} argz_extract (const char *@var{argz}, size_t @var{argz_len}, char **@var{argv}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{argz_extract} function converts the argz vector @var{argz} and -@var{argz_len} into a Unix-style argument vector stored in @var{argv}, -by putting pointers to every element in @var{argz} into successive -positions in @var{argv}, followed by a terminator of @code{0}. -@var{Argv} must be pre-allocated with enough space to hold all the -elements in @var{argz} plus the terminating @code{(char *)0} -(@code{(argz_count (@var{argz}, @var{argz_len}) + 1) * sizeof (char *)} -bytes should be enough). Note that the string pointers stored into -@var{argv} point into @var{argz}---they are not copies---and so -@var{argz} must be copied if it will be changed while @var{argv} is -still active. This function is useful for passing the elements in -@var{argz} to an exec function (@pxref{Executing a File}). -@end deftypefun - -@comment argz.h -@comment GNU -@deftypefun {void} argz_stringify (char *@var{argz}, size_t @var{len}, int @var{sep}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{argz_stringify} converts @var{argz} into a normal string with -the elements separated by the byte @var{sep}, by replacing each -@code{'\0'} inside @var{argz} (except the last one, which terminates the -string) with @var{sep}. This is handy for printing @var{argz} in a -readable manner. -@end deftypefun - -@comment argz.h -@comment GNU -@deftypefun {error_t} argz_add (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -@c Calls strlen and argz_append. -The @code{argz_add} function adds the string @var{str} to the end of the -argz vector @code{*@var{argz}}, and updates @code{*@var{argz}} and -@code{*@var{argz_len}} accordingly. -@end deftypefun - -@comment argz.h -@comment GNU -@deftypefun {error_t} argz_add_sep (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}, int @var{delim}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -The @code{argz_add_sep} function is similar to @code{argz_add}, but -@var{str} is split into separate elements in the result at occurrences of -the byte @var{delim}. This is useful, for instance, for -adding the components of a Unix search path to an argz vector, by using -a value of @code{':'} for @var{delim}. -@end deftypefun - -@comment argz.h -@comment GNU -@deftypefun {error_t} argz_append (char **@var{argz}, size_t *@var{argz_len}, const char *@var{buf}, size_t @var{buf_len}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -The @code{argz_append} function appends @var{buf_len} bytes starting at -@var{buf} to the argz vector @code{*@var{argz}}, reallocating -@code{*@var{argz}} to accommodate it, and adding @var{buf_len} to -@code{*@var{argz_len}}. -@end deftypefun - -@comment argz.h -@comment GNU -@deftypefun {void} argz_delete (char **@var{argz}, size_t *@var{argz_len}, char *@var{entry}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -@c Calls free if no argument is left. -If @var{entry} points to the beginning of one of the elements in the -argz vector @code{*@var{argz}}, the @code{argz_delete} function will -remove this entry and reallocate @code{*@var{argz}}, modifying -@code{*@var{argz}} and @code{*@var{argz_len}} accordingly. Note that as -destructive argz functions usually reallocate their argz argument, -pointers into argz vectors such as @var{entry} will then become invalid. -@end deftypefun - -@comment argz.h -@comment GNU -@deftypefun {error_t} argz_insert (char **@var{argz}, size_t *@var{argz_len}, char *@var{before}, const char *@var{entry}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -@c Calls argz_add or realloc and memmove. -The @code{argz_insert} function inserts the string @var{entry} into the -argz vector @code{*@var{argz}} at a point just before the existing -element pointed to by @var{before}, reallocating @code{*@var{argz}} and -updating @code{*@var{argz}} and @code{*@var{argz_len}}. If @var{before} -is @code{0}, @var{entry} is added to the end instead (as if by -@code{argz_add}). Since the first element is in fact the same as -@code{*@var{argz}}, passing in @code{*@var{argz}} as the value of -@var{before} will result in @var{entry} being inserted at the beginning. -@end deftypefun - -@comment argz.h -@comment GNU -@deftypefun {char *} argz_next (const char *@var{argz}, size_t @var{argz_len}, const char *@var{entry}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{argz_next} function provides a convenient way of iterating -over the elements in the argz vector @var{argz}. It returns a pointer -to the next element in @var{argz} after the element @var{entry}, or -@code{0} if there are no elements following @var{entry}. If @var{entry} -is @code{0}, the first element of @var{argz} is returned. - -This behavior suggests two styles of iteration: - -@smallexample - char *entry = 0; - while ((entry = argz_next (@var{argz}, @var{argz_len}, entry))) - @var{action}; -@end smallexample - -(the double parentheses are necessary to make some C compilers shut up -about what they consider a questionable @code{while}-test) and: - -@smallexample - char *entry; - for (entry = @var{argz}; - entry; - entry = argz_next (@var{argz}, @var{argz_len}, entry)) - @var{action}; -@end smallexample - -Note that the latter depends on @var{argz} having a value of @code{0} if -it is empty (rather than a pointer to an empty block of memory); this -invariant is maintained for argz vectors created by the functions here. -@end deftypefun - -@comment argz.h -@comment GNU -@deftypefun error_t argz_replace (@w{char **@var{argz}, size_t *@var{argz_len}}, @w{const char *@var{str}, const char *@var{with}}, @w{unsigned *@var{replace_count}}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -Replace any occurrences of the string @var{str} in @var{argz} with -@var{with}, reallocating @var{argz} as necessary. If -@var{replace_count} is non-zero, @code{*@var{replace_count}} will be -incremented by the number of replacements performed. -@end deftypefun - -@node Envz Functions, , Argz Functions, Argz and Envz Vectors -@subsection Envz Functions - -Envz vectors are just argz vectors with additional constraints on the form -of each element; as such, argz functions can also be used on them, where it -makes sense. - -Each element in an envz vector is a name-value pair, separated by a @code{'='} -byte; if multiple @code{'='} bytes are present in an element, those -after the first are considered part of the value, and treated like all other -non-@code{'\0'} bytes. - -If @emph{no} @code{'='} bytes are present in an element, that element is -considered the name of a ``null'' entry, as distinct from an entry with an -empty value: @code{envz_get} will return @code{0} if given the name of null -entry, whereas an entry with an empty value would result in a value of -@code{""}; @code{envz_entry} will still find such entries, however. Null -entries can be removed with the @code{envz_strip} function. - -As with argz functions, envz functions that may allocate memory (and thus -fail) have a return type of @code{error_t}, and return either @code{0} or -@code{ENOMEM}. - -@pindex envz.h -These functions are declared in the standard include file @file{envz.h}. - -@comment envz.h -@comment GNU -@deftypefun {char *} envz_entry (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{envz_entry} function finds the entry in @var{envz} with the name -@var{name}, and returns a pointer to the whole entry---that is, the argz -element which begins with @var{name} followed by a @code{'='} byte. If -there is no entry with that name, @code{0} is returned. -@end deftypefun - -@comment envz.h -@comment GNU -@deftypefun {char *} envz_get (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{envz_get} function finds the entry in @var{envz} with the name -@var{name} (like @code{envz_entry}), and returns a pointer to the value -portion of that entry (following the @code{'='}). If there is no entry with -that name (or only a null entry), @code{0} is returned. -@end deftypefun - -@comment envz.h -@comment GNU -@deftypefun {error_t} envz_add (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}, const char *@var{value}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -@c Calls envz_remove, which calls enz_entry and argz_delete, and then -@c argz_add or equivalent code that reallocs and appends name=value. -The @code{envz_add} function adds an entry to @code{*@var{envz}} -(updating @code{*@var{envz}} and @code{*@var{envz_len}}) with the name -@var{name}, and value @var{value}. If an entry with the same name -already exists in @var{envz}, it is removed first. If @var{value} is -@code{0}, then the new entry will be the special null type of entry -(mentioned above). -@end deftypefun - -@comment envz.h -@comment GNU -@deftypefun {error_t} envz_merge (char **@var{envz}, size_t *@var{envz_len}, const char *@var{envz2}, size_t @var{envz2_len}, int @var{override}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -The @code{envz_merge} function adds each entry in @var{envz2} to @var{envz}, -as if with @code{envz_add}, updating @code{*@var{envz}} and -@code{*@var{envz_len}}. If @var{override} is true, then values in @var{envz2} -will supersede those with the same name in @var{envz}, otherwise not. - -Null entries are treated just like other entries in this respect, so a null -entry in @var{envz} can prevent an entry of the same name in @var{envz2} from -being added to @var{envz}, if @var{override} is false. -@end deftypefun - -@comment envz.h -@comment GNU -@deftypefun {void} envz_strip (char **@var{envz}, size_t *@var{envz_len}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{envz_strip} function removes any null entries from @var{envz}, -updating @code{*@var{envz}} and @code{*@var{envz_len}}. -@end deftypefun - -@comment envz.h -@comment GNU -@deftypefun {void} envz_remove (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -The @code{envz_remove} function removes an entry named @var{name} from -@var{envz}, updating @code{*@var{envz}} and @code{*@var{envz_len}}. -@end deftypefun - -@c FIXME this are undocumented: -@c strcasecmp_l @safety{@mtsafe{}@assafe{}@acsafe{}} see strcasecmp |