diff options
Diffstat (limited to 'manual/pattern.texi')
-rw-r--r-- | manual/pattern.texi | 2311 |
1 files changed, 0 insertions, 2311 deletions
diff --git a/manual/pattern.texi b/manual/pattern.texi deleted file mode 100644 index 069a6a23ea..0000000000 --- a/manual/pattern.texi +++ /dev/null @@ -1,2311 +0,0 @@ -@node Pattern Matching, I/O Overview, Searching and Sorting, Top -@c %MENU% Matching shell ``globs'' and regular expressions -@chapter Pattern Matching - -@Theglibc{} provides pattern matching facilities for two kinds of -patterns: regular expressions and file-name wildcards. The library also -provides a facility for expanding variable and command references and -parsing text into words in the way the shell does. - -@menu -* Wildcard Matching:: Matching a wildcard pattern against a single string. -* Globbing:: Finding the files that match a wildcard pattern. -* Regular Expressions:: Matching regular expressions against strings. -* Word Expansion:: Expanding shell variables, nested commands, - arithmetic, and wildcards. - This is what the shell does with shell commands. -@end menu - -@node Wildcard Matching -@section Wildcard Matching - -@pindex fnmatch.h -This section describes how to match a wildcard pattern against a -particular string. The result is a yes or no answer: does the -string fit the pattern or not. The symbols described here are all -declared in @file{fnmatch.h}. - -@comment fnmatch.h -@comment POSIX.2 -@deftypefun int fnmatch (const char *@var{pattern}, const char *@var{string}, int @var{flags}) -@safety{@prelim{}@mtsafe{@mtsenv{} @mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -@c fnmatch @mtsenv @mtslocale @ascuheap @acsmem -@c strnlen dup ok -@c mbsrtowcs -@c memset dup ok -@c malloc dup @ascuheap @acsmem -@c mbsinit dup ok -@c free dup @ascuheap @acsmem -@c FCT = internal_fnwmatch @mtsenv @mtslocale @ascuheap @acsmem -@c FOLD @mtslocale -@c towlower @mtslocale -@c EXT @mtsenv @mtslocale @ascuheap @acsmem -@c STRLEN = wcslen dup ok -@c getenv @mtsenv -@c malloc dup @ascuheap @acsmem -@c MEMPCPY = wmempcpy dup ok -@c FCT dup @mtsenv @mtslocale @ascuheap @acsmem -@c STRCAT = wcscat dup ok -@c free dup @ascuheap @acsmem -@c END @mtsenv -@c getenv @mtsenv -@c MEMCHR = wmemchr dup ok -@c getenv @mtsenv -@c IS_CHAR_CLASS = is_char_class @mtslocale -@c wctype @mtslocale -@c BTOWC ok -@c ISWCTYPE ok -@c auto findidx dup ok -@c elem_hash dup ok -@c memcmp dup ok -@c collseq_table_lookup dup ok -@c NO_LEADING_PERIOD ok -This function tests whether the string @var{string} matches the pattern -@var{pattern}. It returns @code{0} if they do match; otherwise, it -returns the nonzero value @code{FNM_NOMATCH}. The arguments -@var{pattern} and @var{string} are both strings. - -The argument @var{flags} is a combination of flag bits that alter the -details of matching. See below for a list of the defined flags. - -In @theglibc{}, @code{fnmatch} might sometimes report ``errors'' by -returning nonzero values that are not equal to @code{FNM_NOMATCH}. -@end deftypefun - -These are the available flags for the @var{flags} argument: - -@vtable @code -@comment fnmatch.h -@comment GNU -@item FNM_FILE_NAME -Treat the @samp{/} character specially, for matching file names. If -this flag is set, wildcard constructs in @var{pattern} cannot match -@samp{/} in @var{string}. Thus, the only way to match @samp{/} is with -an explicit @samp{/} in @var{pattern}. - -@comment fnmatch.h -@comment POSIX.2 -@item FNM_PATHNAME -This is an alias for @code{FNM_FILE_NAME}; it comes from POSIX.2. We -don't recommend this name because we don't use the term ``pathname'' for -file names. - -@comment fnmatch.h -@comment POSIX.2 -@item FNM_PERIOD -Treat the @samp{.} character specially if it appears at the beginning of -@var{string}. If this flag is set, wildcard constructs in @var{pattern} -cannot match @samp{.} as the first character of @var{string}. - -If you set both @code{FNM_PERIOD} and @code{FNM_FILE_NAME}, then the -special treatment applies to @samp{.} following @samp{/} as well as to -@samp{.} at the beginning of @var{string}. (The shell uses the -@code{FNM_PERIOD} and @code{FNM_FILE_NAME} flags together for matching -file names.) - -@comment fnmatch.h -@comment POSIX.2 -@item FNM_NOESCAPE -Don't treat the @samp{\} character specially in patterns. Normally, -@samp{\} quotes the following character, turning off its special meaning -(if any) so that it matches only itself. When quoting is enabled, the -pattern @samp{\?} matches only the string @samp{?}, because the question -mark in the pattern acts like an ordinary character. - -If you use @code{FNM_NOESCAPE}, then @samp{\} is an ordinary character. - -@comment fnmatch.h -@comment GNU -@item FNM_LEADING_DIR -Ignore a trailing sequence of characters starting with a @samp{/} in -@var{string}; that is to say, test whether @var{string} starts with a -directory name that @var{pattern} matches. - -If this flag is set, either @samp{foo*} or @samp{foobar} as a pattern -would match the string @samp{foobar/frobozz}. - -@comment fnmatch.h -@comment GNU -@item FNM_CASEFOLD -Ignore case in comparing @var{string} to @var{pattern}. - -@comment fnmatch.h -@comment GNU -@item FNM_EXTMATCH -@cindex Korn Shell -@pindex ksh -Besides the normal patterns, also recognize the extended patterns -introduced in @file{ksh}. The patterns are written in the form -explained in the following table where @var{pattern-list} is a @code{|} -separated list of patterns. - -@table @code -@item ?(@var{pattern-list}) -The pattern matches if zero or one occurrences of any of the patterns -in the @var{pattern-list} allow matching the input string. - -@item *(@var{pattern-list}) -The pattern matches if zero or more occurrences of any of the patterns -in the @var{pattern-list} allow matching the input string. - -@item +(@var{pattern-list}) -The pattern matches if one or more occurrences of any of the patterns -in the @var{pattern-list} allow matching the input string. - -@item @@(@var{pattern-list}) -The pattern matches if exactly one occurrence of any of the patterns in -the @var{pattern-list} allows matching the input string. - -@item !(@var{pattern-list}) -The pattern matches if the input string cannot be matched with any of -the patterns in the @var{pattern-list}. -@end table -@end vtable - -@node Globbing -@section Globbing - -@cindex globbing -The archetypal use of wildcards is for matching against the files in a -directory, and making a list of all the matches. This is called -@dfn{globbing}. - -You could do this using @code{fnmatch}, by reading the directory entries -one by one and testing each one with @code{fnmatch}. But that would be -slow (and complex, since you would have to handle subdirectories by -hand). - -The library provides a function @code{glob} to make this particular use -of wildcards convenient. @code{glob} and the other symbols in this -section are declared in @file{glob.h}. - -@menu -* Calling Glob:: Basic use of @code{glob}. -* Flags for Globbing:: Flags that enable various options in @code{glob}. -* More Flags for Globbing:: GNU specific extensions to @code{glob}. -@end menu - -@node Calling Glob -@subsection Calling @code{glob} - -The result of globbing is a vector of file names (strings). To return -this vector, @code{glob} uses a special data type, @code{glob_t}, which -is a structure. You pass @code{glob} the address of the structure, and -it fills in the structure's fields to tell you about the results. - -@comment glob.h -@comment POSIX.2 -@deftp {Data Type} glob_t -This data type holds a pointer to a word vector. More precisely, it -records both the address of the word vector and its size. The GNU -implementation contains some more fields which are non-standard -extensions. - -@table @code -@item gl_pathc -The number of elements in the vector, excluding the initial null entries -if the GLOB_DOOFFS flag is used (see gl_offs below). - -@item gl_pathv -The address of the vector. This field has type @w{@code{char **}}. - -@item gl_offs -The offset of the first real element of the vector, from its nominal -address in the @code{gl_pathv} field. Unlike the other fields, this -is always an input to @code{glob}, rather than an output from it. - -If you use a nonzero offset, then that many elements at the beginning of -the vector are left empty. (The @code{glob} function fills them with -null pointers.) - -The @code{gl_offs} field is meaningful only if you use the -@code{GLOB_DOOFFS} flag. Otherwise, the offset is always zero -regardless of what is in this field, and the first real element comes at -the beginning of the vector. - -@item gl_closedir -The address of an alternative implementation of the @code{closedir} -function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in -the flag parameter. The type of this field is -@w{@code{void (*) (void *)}}. - -This is a GNU extension. - -@item gl_readdir -The address of an alternative implementation of the @code{readdir} -function used to read the contents of a directory. It is used if the -@code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of -this field is @w{@code{struct dirent *(*) (void *)}}. - -An implementation of @code{gl_readdir} needs to initialize the following -members of the @code{struct dirent} object: - -@table @code -@item d_type -This member should be set to the file type of the entry if it is known. -Otherwise, the value @code{DT_UNKNOWN} can be used. The @code{glob} -function may use the specified file type to avoid callbacks in cases -where the file type indicates that the data is not required. - -@item d_ino -This member needs to be non-zero, otherwise @code{glob} may skip the -current entry and call the @code{gl_readdir} callback function again to -retrieve another entry. - -@item d_name -This member must be set to the name of the entry. It must be -null-terminated. -@end table - -The example below shows how to allocate a @code{struct dirent} object -containing a given name. - -@smallexample -@include mkdirent.c.texi -@end smallexample - -The @code{glob} function reads the @code{struct dirent} members listed -above and makes a copy of the file name in the @code{d_name} member -immediately after the @code{gl_readdir} callback function returns. -Future invocations of any of the callback functions may dealloacte or -reuse the buffer. It is the responsibility of the caller of the -@code{glob} function to allocate and deallocate the buffer, around the -call to @code{glob} or using the callback functions. For example, an -application could allocate the buffer in the @code{gl_readdir} callback -function, and deallocate it in the @code{gl_closedir} callback function. - -The @code{gl_readdir} member is a GNU extension. - -@item gl_opendir -The address of an alternative implementation of the @code{opendir} -function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in -the flag parameter. The type of this field is -@w{@code{void *(*) (const char *)}}. - -This is a GNU extension. - -@item gl_stat -The address of an alternative implementation of the @code{stat} function -to get information about an object in the filesystem. It is used if the -@code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of -this field is @w{@code{int (*) (const char *, struct stat *)}}. - -This is a GNU extension. - -@item gl_lstat -The address of an alternative implementation of the @code{lstat} -function to get information about an object in the filesystems, not -following symbolic links. It is used if the @code{GLOB_ALTDIRFUNC} bit -is set in the flag parameter. The type of this field is @code{@w{int -(*) (const char *,} @w{struct stat *)}}. - -This is a GNU extension. - -@item gl_flags -The flags used when @code{glob} was called. In addition, @code{GLOB_MAGCHAR} -might be set. See @ref{Flags for Globbing} for more details. - -This is a GNU extension. -@end table -@end deftp - -For use in the @code{glob64} function @file{glob.h} contains another -definition for a very similar type. @code{glob64_t} differs from -@code{glob_t} only in the types of the members @code{gl_readdir}, -@code{gl_stat}, and @code{gl_lstat}. - -@comment glob.h -@comment GNU -@deftp {Data Type} glob64_t -This data type holds a pointer to a word vector. More precisely, it -records both the address of the word vector and its size. The GNU -implementation contains some more fields which are non-standard -extensions. - -@table @code -@item gl_pathc -The number of elements in the vector, excluding the initial null entries -if the GLOB_DOOFFS flag is used (see gl_offs below). - -@item gl_pathv -The address of the vector. This field has type @w{@code{char **}}. - -@item gl_offs -The offset of the first real element of the vector, from its nominal -address in the @code{gl_pathv} field. Unlike the other fields, this -is always an input to @code{glob}, rather than an output from it. - -If you use a nonzero offset, then that many elements at the beginning of -the vector are left empty. (The @code{glob} function fills them with -null pointers.) - -The @code{gl_offs} field is meaningful only if you use the -@code{GLOB_DOOFFS} flag. Otherwise, the offset is always zero -regardless of what is in this field, and the first real element comes at -the beginning of the vector. - -@item gl_closedir -The address of an alternative implementation of the @code{closedir} -function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in -the flag parameter. The type of this field is -@w{@code{void (*) (void *)}}. - -This is a GNU extension. - -@item gl_readdir -The address of an alternative implementation of the @code{readdir64} -function used to read the contents of a directory. It is used if the -@code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of -this field is @w{@code{struct dirent64 *(*) (void *)}}. - -This is a GNU extension. - -@item gl_opendir -The address of an alternative implementation of the @code{opendir} -function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in -the flag parameter. The type of this field is -@w{@code{void *(*) (const char *)}}. - -This is a GNU extension. - -@item gl_stat -The address of an alternative implementation of the @code{stat64} function -to get information about an object in the filesystem. It is used if the -@code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of -this field is @w{@code{int (*) (const char *, struct stat64 *)}}. - -This is a GNU extension. - -@item gl_lstat -The address of an alternative implementation of the @code{lstat64} -function to get information about an object in the filesystems, not -following symbolic links. It is used if the @code{GLOB_ALTDIRFUNC} bit -is set in the flag parameter. The type of this field is @code{@w{int -(*) (const char *,} @w{struct stat64 *)}}. - -This is a GNU extension. - -@item gl_flags -The flags used when @code{glob} was called. In addition, @code{GLOB_MAGCHAR} -might be set. See @ref{Flags for Globbing} for more details. - -This is a GNU extension. -@end table -@end deftp - -@comment glob.h -@comment POSIX.2 -@deftypefun int glob (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob_t *@var{vector-ptr}) -@safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @ascuplugin{} @asucorrupt{} @ascuheap{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} -@c glob @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @asucorrupt @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c strlen dup ok -@c strchr dup ok -@c malloc dup @ascuheap @acsmem -@c mempcpy dup ok -@c next_brace_sub ok -@c free dup @ascuheap @acsmem -@c globfree dup @asucorrupt @ascuheap @acucorrupt @acsmem -@c glob_pattern_p ok -@c glob_pattern_type dup ok -@c getenv dup @mtsenv -@c GET_LOGIN_NAME_MAX ok -@c getlogin_r dup @mtasurace:utent @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c GETPW_R_SIZE_MAX ok -@c getpwnam_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c realloc dup @ascuheap @acsmem -@c memcpy dup ok -@c memchr dup ok -@c *pglob->gl_stat user-supplied -@c stat64 dup ok -@c S_ISDIR dup ok -@c strdup dup @ascuheap @acsmem -@c glob_pattern_type ok -@c glob_in_dir @mtsenv @mtslocale @asucorrupt @ascuheap @acucorrupt @acsfd @acsmem -@c strlen dup ok -@c glob_pattern_type dup ok -@c malloc dup @ascuheap @acsmem -@c mempcpy dup ok -@c *pglob->gl_stat user-supplied -@c stat64 dup ok -@c free dup @ascuheap @acsmem -@c *pglob->gl_opendir user-supplied -@c opendir dup @ascuheap @acsmem @acsfd -@c dirfd dup ok -@c *pglob->gl_readdir user-supplied -@c CONVERT_DIRENT_DIRENT64 ok -@c readdir64 ok [protected by exclusive use of the stream] -@c REAL_DIR_ENTRY ok -@c DIRENT_MIGHT_BE_DIR ok -@c fnmatch dup @mtsenv @mtslocale @ascuheap @acsmem -@c DIRENT_MIGHT_BE_SYMLINK ok -@c link_exists_p ok -@c link_exists2_p ok -@c strlen dup ok -@c mempcpy dup ok -@c *pglob->gl_stat user-supplied -@c fxstatat64 dup ok -@c realloc dup @ascuheap @acsmem -@c pglob->gl_closedir user-supplied -@c closedir @ascuheap @acsmem @acsfd -@c prefix_array dup @asucorrupt @ascuheap @acucorrupt @acsmem -@c strlen dup ok -@c malloc dup @ascuheap @acsmem -@c free dup @ascuheap @acsmem -@c mempcpy dup ok -@c strcpy dup ok -The function @code{glob} does globbing using the pattern @var{pattern} -in the current directory. It puts the result in a newly allocated -vector, and stores the size and address of this vector into -@code{*@var{vector-ptr}}. The argument @var{flags} is a combination of -bit flags; see @ref{Flags for Globbing}, for details of the flags. - -The result of globbing is a sequence of file names. The function -@code{glob} allocates a string for each resulting word, then -allocates a vector of type @code{char **} to store the addresses of -these strings. The last element of the vector is a null pointer. -This vector is called the @dfn{word vector}. - -To return this vector, @code{glob} stores both its address and its -length (number of elements, not counting the terminating null pointer) -into @code{*@var{vector-ptr}}. - -Normally, @code{glob} sorts the file names alphabetically before -returning them. You can turn this off with the flag @code{GLOB_NOSORT} -if you want to get the information as fast as possible. Usually it's -a good idea to let @code{glob} sort them---if you process the files in -alphabetical order, the users will have a feel for the rate of progress -that your application is making. - -If @code{glob} succeeds, it returns 0. Otherwise, it returns one -of these error codes: - -@vtable @code -@comment glob.h -@comment POSIX.2 -@item GLOB_ABORTED -There was an error opening a directory, and you used the flag -@code{GLOB_ERR} or your specified @var{errfunc} returned a nonzero -value. -@iftex -See below -@end iftex -@ifinfo -@xref{Flags for Globbing}, -@end ifinfo -for an explanation of the @code{GLOB_ERR} flag and @var{errfunc}. - -@comment glob.h -@comment POSIX.2 -@item GLOB_NOMATCH -The pattern didn't match any existing files. If you use the -@code{GLOB_NOCHECK} flag, then you never get this error code, because -that flag tells @code{glob} to @emph{pretend} that the pattern matched -at least one file. - -@comment glob.h -@comment POSIX.2 -@item GLOB_NOSPACE -It was impossible to allocate memory to hold the result. -@end vtable - -In the event of an error, @code{glob} stores information in -@code{*@var{vector-ptr}} about all the matches it has found so far. - -It is important to notice that the @code{glob} function will not fail if -it encounters directories or files which cannot be handled without the -LFS interfaces. The implementation of @code{glob} is supposed to use -these functions internally. This at least is the assumption made by -the Unix standard. The GNU extension of allowing the user to provide their -own directory handling and @code{stat} functions complicates things a -bit. If these callback functions are used and a large file or directory -is encountered @code{glob} @emph{can} fail. -@end deftypefun - -@comment glob.h -@comment GNU -@deftypefun int glob64 (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob64_t *@var{vector-ptr}) -@safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @asucorrupt{} @ascuheap{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} -@c Same code as glob, but with glob64_t #defined as glob_t. -The @code{glob64} function was added as part of the Large File Summit -extensions but is not part of the original LFS proposal. The reason for -this is simple: it is not necessary. The necessity for a @code{glob64} -function is added by the extensions of the GNU @code{glob} -implementation which allows the user to provide their own directory handling -and @code{stat} functions. The @code{readdir} and @code{stat} functions -do depend on the choice of @code{_FILE_OFFSET_BITS} since the definition -of the types @code{struct dirent} and @code{struct stat} will change -depending on the choice. - -Besides this difference, @code{glob64} works just like @code{glob} in -all aspects. - -This function is a GNU extension. -@end deftypefun - -@node Flags for Globbing -@subsection Flags for Globbing - -This section describes the standard flags that you can specify in the -@var{flags} argument to @code{glob}. Choose the flags you want, -and combine them with the C bitwise OR operator @code{|}. - -Note that there are @ref{More Flags for Globbing} available as GNU extensions. - -@vtable @code -@comment glob.h -@comment POSIX.2 -@item GLOB_APPEND -Append the words from this expansion to the vector of words produced by -previous calls to @code{glob}. This way you can effectively expand -several words as if they were concatenated with spaces between them. - -In order for appending to work, you must not modify the contents of the -word vector structure between calls to @code{glob}. And, if you set -@code{GLOB_DOOFFS} in the first call to @code{glob}, you must also -set it when you append to the results. - -Note that the pointer stored in @code{gl_pathv} may no longer be valid -after you call @code{glob} the second time, because @code{glob} might -have relocated the vector. So always fetch @code{gl_pathv} from the -@code{glob_t} structure after each @code{glob} call; @strong{never} save -the pointer across calls. - -@comment glob.h -@comment POSIX.2 -@item GLOB_DOOFFS -Leave blank slots at the beginning of the vector of words. -The @code{gl_offs} field says how many slots to leave. -The blank slots contain null pointers. - -@comment glob.h -@comment POSIX.2 -@item GLOB_ERR -Give up right away and report an error if there is any difficulty -reading the directories that must be read in order to expand @var{pattern} -fully. Such difficulties might include a directory in which you don't -have the requisite access. Normally, @code{glob} tries its best to keep -on going despite any errors, reading whatever directories it can. - -You can exercise even more control than this by specifying an -error-handler function @var{errfunc} when you call @code{glob}. If -@var{errfunc} is not a null pointer, then @code{glob} doesn't give up -right away when it can't read a directory; instead, it calls -@var{errfunc} with two arguments, like this: - -@smallexample -(*@var{errfunc}) (@var{filename}, @var{error-code}) -@end smallexample - -@noindent -The argument @var{filename} is the name of the directory that -@code{glob} couldn't open or couldn't read, and @var{error-code} is the -@code{errno} value that was reported to @code{glob}. - -If the error handler function returns nonzero, then @code{glob} gives up -right away. Otherwise, it continues. - -@comment glob.h -@comment POSIX.2 -@item GLOB_MARK -If the pattern matches the name of a directory, append @samp{/} to the -directory's name when returning it. - -@comment glob.h -@comment POSIX.2 -@item GLOB_NOCHECK -If the pattern doesn't match any file names, return the pattern itself -as if it were a file name that had been matched. (Normally, when the -pattern doesn't match anything, @code{glob} returns that there were no -matches.) - -@comment glob.h -@comment POSIX.2 -@item GLOB_NOESCAPE -Don't treat the @samp{\} character specially in patterns. Normally, -@samp{\} quotes the following character, turning off its special meaning -(if any) so that it matches only itself. When quoting is enabled, the -pattern @samp{\?} matches only the string @samp{?}, because the question -mark in the pattern acts like an ordinary character. - -If you use @code{GLOB_NOESCAPE}, then @samp{\} is an ordinary character. - -@code{glob} does its work by calling the function @code{fnmatch} -repeatedly. It handles the flag @code{GLOB_NOESCAPE} by turning on the -@code{FNM_NOESCAPE} flag in calls to @code{fnmatch}. - -@comment glob.h -@comment POSIX.2 -@item GLOB_NOSORT -Don't sort the file names; return them in no particular order. -(In practice, the order will depend on the order of the entries in -the directory.) The only reason @emph{not} to sort is to save time. -@end vtable - -@node More Flags for Globbing -@subsection More Flags for Globbing - -Beside the flags described in the last section, the GNU implementation of -@code{glob} allows a few more flags which are also defined in the -@file{glob.h} file. Some of the extensions implement functionality -which is available in modern shell implementations. - -@vtable @code -@comment glob.h -@comment GNU -@item GLOB_PERIOD -The @code{.} character (period) is treated special. It cannot be -matched by wildcards. @xref{Wildcard Matching}, @code{FNM_PERIOD}. - -@comment glob.h -@comment GNU -@item GLOB_MAGCHAR -The @code{GLOB_MAGCHAR} value is not to be given to @code{glob} in the -@var{flags} parameter. Instead, @code{glob} sets this bit in the -@var{gl_flags} element of the @var{glob_t} structure provided as the -result if the pattern used for matching contains any wildcard character. - -@comment glob.h -@comment GNU -@item GLOB_ALTDIRFUNC -Instead of using the normal functions for accessing the -filesystem the @code{glob} implementation uses the user-supplied -functions specified in the structure pointed to by @var{pglob} -parameter. For more information about the functions refer to the -sections about directory handling see @ref{Accessing Directories}, and -@ref{Reading Attributes}. - -@comment glob.h -@comment GNU -@item GLOB_BRACE -If this flag is given, the handling of braces in the pattern is changed. -It is now required that braces appear correctly grouped. I.e., for each -opening brace there must be a closing one. Braces can be used -recursively. So it is possible to define one brace expression in -another one. It is important to note that the range of each brace -expression is completely contained in the outer brace expression (if -there is one). - -The string between the matching braces is separated into single -expressions by splitting at @code{,} (comma) characters. The commas -themselves are discarded. Please note what we said above about recursive -brace expressions. The commas used to separate the subexpressions must -be at the same level. Commas in brace subexpressions are not matched. -They are used during expansion of the brace expression of the deeper -level. The example below shows this - -@smallexample -glob ("@{foo/@{,bar,biz@},baz@}", GLOB_BRACE, NULL, &result) -@end smallexample - -@noindent -is equivalent to the sequence - -@smallexample -glob ("foo/", GLOB_BRACE, NULL, &result) -glob ("foo/bar", GLOB_BRACE|GLOB_APPEND, NULL, &result) -glob ("foo/biz", GLOB_BRACE|GLOB_APPEND, NULL, &result) -glob ("baz", GLOB_BRACE|GLOB_APPEND, NULL, &result) -@end smallexample - -@noindent -if we leave aside error handling. - -@comment glob.h -@comment GNU -@item GLOB_NOMAGIC -If the pattern contains no wildcard constructs (it is a literal file name), -return it as the sole ``matching'' word, even if no file exists by that name. - -@comment glob.h -@comment GNU -@item GLOB_TILDE -If this flag is used the character @code{~} (tilde) is handled specially -if it appears at the beginning of the pattern. Instead of being taken -verbatim it is used to represent the home directory of a known user. - -If @code{~} is the only character in pattern or it is followed by a -@code{/} (slash), the home directory of the process owner is -substituted. Using @code{getlogin} and @code{getpwnam} the information -is read from the system databases. As an example take user @code{bart} -with his home directory at @file{/home/bart}. For him a call like - -@smallexample -glob ("~/bin/*", GLOB_TILDE, NULL, &result) -@end smallexample - -@noindent -would return the contents of the directory @file{/home/bart/bin}. -Instead of referring to the own home directory it is also possible to -name the home directory of other users. To do so one has to append the -user name after the tilde character. So the contents of user -@code{homer}'s @file{bin} directory can be retrieved by - -@smallexample -glob ("~homer/bin/*", GLOB_TILDE, NULL, &result) -@end smallexample - -If the user name is not valid or the home directory cannot be determined -for some reason the pattern is left untouched and itself used as the -result. I.e., if in the last example @code{home} is not available the -tilde expansion yields to @code{"~homer/bin/*"} and @code{glob} is not -looking for a directory named @code{~homer}. - -This functionality is equivalent to what is available in C-shells if the -@code{nonomatch} flag is set. - -@comment glob.h -@comment GNU -@item GLOB_TILDE_CHECK -If this flag is used @code{glob} behaves as if @code{GLOB_TILDE} is -given. The only difference is that if the user name is not available or -the home directory cannot be determined for other reasons this leads to -an error. @code{glob} will return @code{GLOB_NOMATCH} instead of using -the pattern itself as the name. - -This functionality is equivalent to what is available in C-shells if -the @code{nonomatch} flag is not set. - -@comment glob.h -@comment GNU -@item GLOB_ONLYDIR -If this flag is used the globbing function takes this as a -@strong{hint} that the caller is only interested in directories -matching the pattern. If the information about the type of the file -is easily available non-directories will be rejected but no extra -work will be done to determine the information for each file. I.e., -the caller must still be able to filter directories out. - -This functionality is only available with the GNU @code{glob} -implementation. It is mainly used internally to increase the -performance but might be useful for a user as well and therefore is -documented here. -@end vtable - -Calling @code{glob} will in most cases allocate resources which are used -to represent the result of the function call. If the same object of -type @code{glob_t} is used in multiple call to @code{glob} the resources -are freed or reused so that no leaks appear. But this does not include -the time when all @code{glob} calls are done. - -@comment glob.h -@comment POSIX.2 -@deftypefun void globfree (glob_t *@var{pglob}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} -@c globfree dup @asucorrupt @ascuheap @acucorrupt @acsmem -@c free dup @ascuheap @acsmem -The @code{globfree} function frees all resources allocated by previous -calls to @code{glob} associated with the object pointed to by -@var{pglob}. This function should be called whenever the currently used -@code{glob_t} typed object isn't used anymore. -@end deftypefun - -@comment glob.h -@comment GNU -@deftypefun void globfree64 (glob64_t *@var{pglob}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} -This function is equivalent to @code{globfree} but it frees records of -type @code{glob64_t} which were allocated by @code{glob64}. -@end deftypefun - - -@node Regular Expressions -@section Regular Expression Matching - -@Theglibc{} supports two interfaces for matching regular -expressions. One is the standard POSIX.2 interface, and the other is -what @theglibc{} has had for many years. - -Both interfaces are declared in the header file @file{regex.h}. -If you define @w{@code{_POSIX_C_SOURCE}}, then only the POSIX.2 -functions, structures, and constants are declared. -@c !!! we only document the POSIX.2 interface here!! - -@menu -* POSIX Regexp Compilation:: Using @code{regcomp} to prepare to match. -* Flags for POSIX Regexps:: Syntax variations for @code{regcomp}. -* Matching POSIX Regexps:: Using @code{regexec} to match the compiled - pattern that you get from @code{regcomp}. -* Regexp Subexpressions:: Finding which parts of the string were matched. -* Subexpression Complications:: Find points of which parts were matched. -* Regexp Cleanup:: Freeing storage; reporting errors. -@end menu - -@node POSIX Regexp Compilation -@subsection POSIX Regular Expression Compilation - -Before you can actually match a regular expression, you must -@dfn{compile} it. This is not true compilation---it produces a special -data structure, not machine instructions. But it is like ordinary -compilation in that its purpose is to enable you to ``execute'' the -pattern fast. (@xref{Matching POSIX Regexps}, for how to use the -compiled regular expression for matching.) - -There is a special data type for compiled regular expressions: - -@comment regex.h -@comment POSIX.2 -@deftp {Data Type} regex_t -This type of object holds a compiled regular expression. -It is actually a structure. It has just one field that your programs -should look at: - -@table @code -@item re_nsub -This field holds the number of parenthetical subexpressions in the -regular expression that was compiled. -@end table - -There are several other fields, but we don't describe them here, because -only the functions in the library should use them. -@end deftp - -After you create a @code{regex_t} object, you can compile a regular -expression into it by calling @code{regcomp}. - -@comment regex.h -@comment POSIX.2 -@deftypefun int regcomp (regex_t *restrict @var{compiled}, const char *restrict @var{pattern}, int @var{cflags}) -@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} -@c All of the issues have to do with memory allocation and multi-byte -@c character handling present in the input string, or implied by ranges -@c or inverted character classes. -@c (re_)malloc @ascuheap @acsmem -@c re_compile_internal @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c (re_)realloc @ascuheap @acsmem [no @asucorrupt @acucorrupt for we zero the buffer] -@c init_dfa @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c (re_)malloc @ascuheap @acsmem -@c calloc @ascuheap @acsmem -@c _NL_CURRENT ok -@c _NL_CURRENT_WORD ok -@c btowc @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c libc_lock_init ok -@c re_string_construct @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_string_construct_common ok -@c re_string_realloc_buffers @ascuheap @acsmem -@c (re_)realloc dup @ascuheap @acsmem -@c build_wcs_upper_buffer @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c isascii ok -@c mbsinit ok -@c toupper ok -@c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c iswlower @mtslocale -@c towupper @mtslocale -@c wcrtomb dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c (re_)malloc dup @ascuheap @acsmem -@c build_upper_buffer ok (@mtslocale but optimized) -@c islower ok -@c toupper ok -@c build_wcs_buffer @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_string_translate_buffer ok -@c parse @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c fetch_token @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c peek_token @mtslocale -@c re_string_eoi ok -@c re_string_peek_byte ok -@c re_string_cur_idx ok -@c re_string_length ok -@c re_string_peek_byte_case @mtslocale -@c re_string_peek_byte dup ok -@c re_string_is_single_byte_char ok -@c isascii ok -@c re_string_peek_byte dup ok -@c re_string_wchar_at ok -@c re_string_skip_bytes ok -@c re_string_skip_bytes dup ok -@c parse_reg_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c parse_branch @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c parse_expression @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c create_token_tree dup @ascuheap @acsmem -@c re_string_eoi dup ok -@c re_string_first_byte ok -@c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c create_tree dup @ascuheap @acsmem -@c parse_sub_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c parse_reg_exp dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c postorder() @ascuheap @acsmem -@c free_tree @ascuheap @acsmem -@c free_token dup @ascuheap @acsmem -@c create_tree dup @ascuheap @acsmem -@c parse_bracket_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c _NL_CURRENT dup ok -@c _NL_CURRENT_WORD dup ok -@c calloc dup @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c peek_token_bracket ok -@c re_string_eoi dup ok -@c re_string_peek_byte dup ok -@c re_string_first_byte dup ok -@c re_string_cur_idx dup ok -@c re_string_length dup ok -@c re_string_skip_bytes dup ok -@c bitset_set ok -@c re_string_skip_bytes ok -@c parse_bracket_element @mtslocale -@c re_string_char_size_at ok -@c re_string_wchar_at dup ok -@c re_string_skip_bytes dup ok -@c parse_bracket_symbol @mtslocale -@c re_string_eoi dup ok -@c re_string_fetch_byte_case @mtslocale -@c re_string_fetch_byte ok -@c re_string_first_byte dup ok -@c isascii ok -@c re_string_char_size_at dup ok -@c re_string_skip_bytes dup ok -@c re_string_fetch_byte dup ok -@c re_string_peek_byte dup ok -@c re_string_skip_bytes dup ok -@c peek_token_bracket dup ok -@c auto build_range_exp @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c auto lookup_collation_sequence_value @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c btowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c collseq_table_lookup ok -@c auto seek_collating_symbol_entry dup ok -@c (re_)realloc dup @ascuheap @acsmem -@c collseq_table_lookup dup ok -@c bitset_set dup ok -@c (re_)realloc dup @ascuheap @acsmem -@c build_equiv_class @mtslocale @ascuheap @acsmem -@c _NL_CURRENT ok -@c auto findidx ok -@c bitset_set dup ok -@c (re_)realloc dup @ascuheap @acsmem -@c auto build_collating_symbol @ascuheap @acsmem -@c auto seek_collating_symbol_entry ok -@c bitset_set dup ok -@c (re_)realloc dup @ascuheap @acsmem -@c build_charclass @mtslocale @ascuheap @acsmem -@c (re_)realloc dup @ascuheap @acsmem -@c bitset_set dup ok -@c isalnum ok -@c iscntrl ok -@c isspace ok -@c isalpha ok -@c isdigit ok -@c isprint ok -@c isupper ok -@c isblank ok -@c isgraph ok -@c ispunct ok -@c isxdigit ok -@c bitset_not ok -@c bitset_mask ok -@c create_token_tree dup @ascuheap @acsmem -@c create_tree dup @ascuheap @acsmem -@c free_charset dup @ascuheap @acsmem -@c init_word_char @mtslocale -@c isalnum ok -@c build_charclass_op @mtslocale @ascuheap @acsmem -@c calloc dup @ascuheap @acsmem -@c build_charclass dup @mtslocale @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c free_charset dup @ascuheap @acsmem -@c bitset_set dup ok -@c bitset_not dup ok -@c bitset_mask dup ok -@c create_token_tree dup @ascuheap @acsmem -@c create_tree dup @ascuheap @acsmem -@c parse_dup_op @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_string_cur_idx dup ok -@c fetch_number @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_string_set_index ok -@c postorder() @ascuheap @acsmem -@c free_tree dup @ascuheap @acsmem -@c mark_opt_subexp ok -@c duplicate_tree @ascuheap @acsmem -@c create_token_tree dup @ascuheap @acsmem -@c create_tree dup @ascuheap @acsmem -@c postorder() @ascuheap @acsmem -@c free_tree dup @ascuheap @acsmem -@c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c parse_branch dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c create_tree dup @ascuheap @acsmem -@c create_tree @ascuheap @acsmem -@c create_token_tree @ascuheap @acsmem -@c (re_)malloc dup @ascuheap @acsmem -@c analyze @ascuheap @acsmem -@c (re_)malloc dup @ascuheap @acsmem -@c preorder() @ascuheap @acsmem -@c optimize_subexps ok -@c calc_next ok -@c link_nfa_nodes @ascuheap @acsmem -@c re_node_set_init_1 @ascuheap @acsmem -@c (re_)malloc dup @ascuheap @acsmem -@c re_node_set_init_2 @ascuheap @acsmem -@c (re_)malloc dup @ascuheap @acsmem -@c postorder() @ascuheap @acsmem -@c lower_subexps @ascuheap @acsmem -@c lower_subexp @ascuheap @acsmem -@c create_tree dup @ascuheap @acsmem -@c calc_first @ascuheap @acsmem -@c re_dfa_add_node @ascuheap @acsmem -@c (re_)realloc dup @ascuheap @acsmem -@c re_node_set_init_empty ok -@c calc_eclosure @ascuheap @acsmem -@c calc_eclosure_iter @ascuheap @acsmem -@c re_node_set_alloc @ascuheap @acsmem -@c (re_)malloc dup @ascuheap @acsmem -@c duplicate_node_closure @ascuheap @acsmem -@c re_node_set_empty ok -@c duplicate_node @ascuheap @acsmem -@c re_dfa_add_node dup @ascuheap @acsmem -@c re_node_set_insert @ascuheap @acsmem -@c (re_)realloc dup @ascuheap @acsmem -@c search_duplicated_node ok -@c re_node_set_merge @ascuheap @acsmem -@c (re_)realloc dup @ascuheap @acsmem -@c re_node_set_free @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c re_node_set_insert dup @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c calc_inveclosure @ascuheap @acsmem -@c re_node_set_init_empty dup ok -@c re_node_set_insert_last @ascuheap @acsmem -@c (re_)realloc dup @ascuheap @acsmem -@c optimize_utf8 ok -@c create_initial_state @ascuheap @acsmem -@c re_node_set_init_copy @ascuheap @acsmem -@c (re_)malloc dup @ascuheap @acsmem -@c re_node_set_init_empty dup ok -@c re_node_set_contains ok -@c re_node_set_merge dup @ascuheap @acsmem -@c re_acquire_state_context @ascuheap @acsmem -@c calc_state_hash ok -@c re_node_set_compare ok -@c create_cd_newstate @ascuheap @acsmem -@c calloc dup @ascuheap @acsmem -@c re_node_set_init_copy dup @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c free_state @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c NOT_SATISFY_PREV_CONSTRAINT ok -@c re_node_set_remove_at ok -@c register_state @ascuheap @acsmem -@c re_node_set_alloc dup @ascuheap @acsmem -@c re_node_set_insert_last dup @ascuheap @acsmem -@c (re_)realloc dup @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c free_workarea_compile @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c re_string_destruct @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c free_dfa_content @ascuheap @acsmem -@c free_token @ascuheap @acsmem -@c free_charset @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c re_compile_fastmap @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_compile_fastmap_iter @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_set_fastmap ok -@c tolower ok -@c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c wcrtomb dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c towlower @mtslocale -@c _NL_CURRENT ok -@c (re_)free @ascuheap @acsmem -The function @code{regcomp} ``compiles'' a regular expression into a -data structure that you can use with @code{regexec} to match against a -string. The compiled regular expression format is designed for -efficient matching. @code{regcomp} stores it into @code{*@var{compiled}}. - -It's up to you to allocate an object of type @code{regex_t} and pass its -address to @code{regcomp}. - -The argument @var{cflags} lets you specify various options that control -the syntax and semantics of regular expressions. @xref{Flags for POSIX -Regexps}. - -If you use the flag @code{REG_NOSUB}, then @code{regcomp} omits from -the compiled regular expression the information necessary to record -how subexpressions actually match. In this case, you might as well -pass @code{0} for the @var{matchptr} and @var{nmatch} arguments when -you call @code{regexec}. - -If you don't use @code{REG_NOSUB}, then the compiled regular expression -does have the capacity to record how subexpressions match. Also, -@code{regcomp} tells you how many subexpressions @var{pattern} has, by -storing the number in @code{@var{compiled}->re_nsub}. You can use that -value to decide how long an array to allocate to hold information about -subexpression matches. - -@code{regcomp} returns @code{0} if it succeeds in compiling the regular -expression; otherwise, it returns a nonzero error code (see the table -below). You can use @code{regerror} to produce an error message string -describing the reason for a nonzero value; see @ref{Regexp Cleanup}. - -@end deftypefun - -Here are the possible nonzero values that @code{regcomp} can return: - -@vtable @code -@comment regex.h -@comment POSIX.2 -@item REG_BADBR -There was an invalid @samp{\@{@dots{}\@}} construct in the regular -expression. A valid @samp{\@{@dots{}\@}} construct must contain either -a single number, or two numbers in increasing order separated by a -comma. - -@comment regex.h -@comment POSIX.2 -@item REG_BADPAT -There was a syntax error in the regular expression. - -@comment regex.h -@comment POSIX.2 -@item REG_BADRPT -A repetition operator such as @samp{?} or @samp{*} appeared in a bad -position (with no preceding subexpression to act on). - -@comment regex.h -@comment POSIX.2 -@item REG_ECOLLATE -The regular expression referred to an invalid collating element (one not -defined in the current locale for string collation). @xref{Locale -Categories}. - -@comment regex.h -@comment POSIX.2 -@item REG_ECTYPE -The regular expression referred to an invalid character class name. - -@comment regex.h -@comment POSIX.2 -@item REG_EESCAPE -The regular expression ended with @samp{\}. - -@comment regex.h -@comment POSIX.2 -@item REG_ESUBREG -There was an invalid number in the @samp{\@var{digit}} construct. - -@comment regex.h -@comment POSIX.2 -@item REG_EBRACK -There were unbalanced square brackets in the regular expression. - -@comment regex.h -@comment POSIX.2 -@item REG_EPAREN -An extended regular expression had unbalanced parentheses, -or a basic regular expression had unbalanced @samp{\(} and @samp{\)}. - -@comment regex.h -@comment POSIX.2 -@item REG_EBRACE -The regular expression had unbalanced @samp{\@{} and @samp{\@}}. - -@comment regex.h -@comment POSIX.2 -@item REG_ERANGE -One of the endpoints in a range expression was invalid. - -@comment regex.h -@comment POSIX.2 -@item REG_ESPACE -@code{regcomp} ran out of memory. -@end vtable - -@node Flags for POSIX Regexps -@subsection Flags for POSIX Regular Expressions - -These are the bit flags that you can use in the @var{cflags} operand when -compiling a regular expression with @code{regcomp}. - -@vtable @code -@comment regex.h -@comment POSIX.2 -@item REG_EXTENDED -Treat the pattern as an extended regular expression, rather than as a -basic regular expression. - -@comment regex.h -@comment POSIX.2 -@item REG_ICASE -Ignore case when matching letters. - -@comment regex.h -@comment POSIX.2 -@item REG_NOSUB -Don't bother storing the contents of the @var{matchptr} array. - -@comment regex.h -@comment POSIX.2 -@item REG_NEWLINE -Treat a newline in @var{string} as dividing @var{string} into multiple -lines, so that @samp{$} can match before the newline and @samp{^} can -match after. Also, don't permit @samp{.} to match a newline, and don't -permit @samp{[^@dots{}]} to match a newline. - -Otherwise, newline acts like any other ordinary character. -@end vtable - -@node Matching POSIX Regexps -@subsection Matching a Compiled POSIX Regular Expression - -Once you have compiled a regular expression, as described in @ref{POSIX -Regexp Compilation}, you can match it against strings using -@code{regexec}. A match anywhere inside the string counts as success, -unless the regular expression contains anchor characters (@samp{^} or -@samp{$}). - -@comment regex.h -@comment POSIX.2 -@deftypefun int regexec (const regex_t *restrict @var{compiled}, const char *restrict @var{string}, size_t @var{nmatch}, regmatch_t @var{matchptr}[restrict], int @var{eflags}) -@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} -@c libc_lock_lock @asulock @aculock -@c re_search_internal @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_string_allocate @ascuheap @acsmem -@c re_string_construct_common dup ok -@c re_string_realloc_buffers dup @ascuheap @acsmem -@c match_ctx_init @ascuheap @acsmem -@c (re_)malloc dup @ascuheap @acsmem -@c re_string_byte_at ok -@c re_string_first_byte dup ok -@c check_matching @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_string_cur_idx dup ok -@c acquire_init_state_context dup @ascuheap @acsmem -@c re_string_context_at ok -@c re_string_byte_at dup ok -@c bitset_contain ok -@c re_acquire_state_context dup @ascuheap @acsmem -@c check_subexp_matching_top @ascuheap @acsmem -@c match_ctx_add_subtop @ascuheap @acsmem -@c (re_)realloc dup @ascuheap @acsmem -@c calloc dup @ascuheap @acsmem -@c transit_state_bkref @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_string_cur_idx dup ok -@c re_string_context_at dup ok -@c NOT_SATISFY_NEXT_CONSTRAINT ok -@c get_subexp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_string_get_buffer ok -@c search_cur_bkref_entry ok -@c clean_state_log_if_needed @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c extend_buffers @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_string_realloc_buffers dup @ascuheap @acsmem -@c (re_)realloc dup @ascuheap @acsmem -@c build_wcs_upper_buffer dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c build_upper_buffer dup ok (@mtslocale but optimized) -@c build_wcs_buffer dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_string_translate_buffer dup ok -@c get_subexp_sub @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c check_arrival @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c (re_)realloc dup @ascuheap @acsmem -@c re_string_context_at dup ok -@c re_node_set_init_1 dup @ascuheap @acsmem -@c check_arrival_expand_ecl @ascuheap @acsmem -@c re_node_set_alloc dup @ascuheap @acsmem -@c find_subexp_node ok -@c re_node_set_merge dup @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c check_arrival_expand_ecl_sub @ascuheap @acsmem -@c re_node_set_contains dup ok -@c re_node_set_insert dup @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c re_node_set_init_copy dup @ascuheap @acsmem -@c re_node_set_init_empty dup ok -@c expand_bkref_cache @ascuheap @acsmem -@c search_cur_bkref_entry dup ok -@c re_node_set_contains dup ok -@c re_node_set_init_1 dup @ascuheap @acsmem -@c check_arrival_expand_ecl dup @ascuheap @acsmem -@c re_node_set_merge dup @ascuheap @acsmem -@c re_node_set_init_copy dup @ascuheap @acsmem -@c re_node_set_insert dup @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c re_acquire_state @ascuheap @acsmem -@c calc_state_hash dup ok -@c re_node_set_compare dup ok -@c create_ci_newstate @ascuheap @acsmem -@c calloc dup @ascuheap @acsmem -@c re_node_set_init_copy dup @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c register_state dup @ascuheap @acsmem -@c free_state dup @ascuheap @acsmem -@c re_acquire_state_context dup @ascuheap @acsmem -@c re_node_set_merge dup @ascuheap @acsmem -@c check_arrival_add_next_nodes @mtslocale @ascuheap @acsmem -@c re_node_set_init_empty dup ok -@c check_node_accept_bytes @mtslocale @ascuheap @acsmem -@c re_string_byte_at dup ok -@c re_string_char_size_at dup ok -@c re_string_elem_size_at @mtslocale -@c _NL_CURRENT_WORD dup ok -@c _NL_CURRENT dup ok -@c auto findidx dup ok -@c _NL_CURRENT_WORD dup ok -@c _NL_CURRENT dup ok -@c collseq_table_lookup dup ok -@c find_collation_sequence_value @mtslocale -@c _NL_CURRENT_WORD dup ok -@c _NL_CURRENT dup ok -@c auto findidx dup ok -@c wcscoll @mtslocale @ascuheap @acsmem -@c re_node_set_empty dup ok -@c re_node_set_merge dup @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c re_node_set_insert dup @ascuheap @acsmem -@c re_acquire_state dup @ascuheap @acsmem -@c check_node_accept ok -@c re_string_byte_at dup ok -@c bitset_contain dup ok -@c re_string_context_at dup ok -@c NOT_SATISFY_NEXT_CONSTRAINT dup ok -@c match_ctx_add_entry @ascuheap @acsmem -@c (re_)realloc dup @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c clean_state_log_if_needed dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c extend_buffers dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c find_subexp_node dup ok -@c calloc dup @ascuheap @acsmem -@c check_arrival dup *** -@c match_ctx_add_sublast @ascuheap @acsmem -@c (re_)realloc dup @ascuheap @acsmem -@c re_acquire_state_context dup @ascuheap @acsmem -@c re_node_set_init_union @ascuheap @acsmem -@c (re_)malloc dup @ascuheap @acsmem -@c re_node_set_init_copy dup @ascuheap @acsmem -@c re_node_set_init_empty dup ok -@c re_node_set_free dup @ascuheap @acsmem -@c check_subexp_matching_top dup @ascuheap @acsmem -@c check_halt_state_context ok -@c re_string_context_at dup ok -@c check_halt_node_context ok -@c NOT_SATISFY_NEXT_CONSTRAINT dup ok -@c re_string_eoi dup ok -@c extend_buffers dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c transit_state @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c transit_state_mb @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_string_context_at dup ok -@c NOT_SATISFY_NEXT_CONSTRAINT dup ok -@c check_node_accept_bytes dup @mtslocale @ascuheap @acsmem -@c re_string_cur_idx dup ok -@c clean_state_log_if_needed @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_node_set_init_union dup @ascuheap @acsmem -@c re_acquire_state_context dup @ascuheap @acsmem -@c re_string_fetch_byte dup ok -@c re_string_context_at dup ok -@c build_trtable @ascuheap @acsmem -@c (re_)malloc dup @ascuheap @acsmem -@c group_nodes_into_DFAstates @ascuheap @acsmem -@c bitset_empty dup ok -@c bitset_set dup ok -@c bitset_merge dup ok -@c bitset_set_all ok -@c bitset_clear ok -@c bitset_contain dup ok -@c bitset_copy ok -@c re_node_set_init_copy dup @ascuheap @acsmem -@c re_node_set_insert dup @ascuheap @acsmem -@c re_node_set_init_1 dup @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c re_node_set_alloc dup @ascuheap @acsmem -@c malloc dup @ascuheap @acsmem -@c free dup @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c bitset_empty ok -@c re_node_set_empty dup ok -@c re_node_set_merge dup @ascuheap @acsmem -@c re_acquire_state_context dup @ascuheap @acsmem -@c bitset_merge ok -@c calloc dup @ascuheap @acsmem -@c bitset_contain dup ok -@c merge_state_with_log @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c re_string_cur_idx dup ok -@c re_node_set_init_union dup @ascuheap @acsmem -@c re_string_context_at dup ok -@c re_node_set_free dup @ascuheap @acsmem -@c check_subexp_matching_top @ascuheap @acsmem -@c match_ctx_add_subtop dup @ascuheap @acsmem -@c transit_state_bkref dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c find_recover_state -@c re_string_cur_idx dup ok -@c re_string_skip_bytes dup ok -@c merge_state_with_log dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd -@c check_halt_state_context dup ok -@c prune_impossible_nodes @mtslocale @ascuheap @acsmem -@c (re_)malloc dup @ascuheap @acsmem -@c sift_ctx_init ok -@c re_node_set_init_empty dup ok -@c sift_states_backward @mtslocale @ascuheap @acsmem -@c re_node_set_init_1 dup @ascuheap @acsmem -@c update_cur_sifted_state @mtslocale @ascuheap @acsmem -@c add_epsilon_src_nodes @ascuheap @acsmem -@c re_acquire_state dup @ascuheap @acsmem -@c re_node_set_alloc dup @ascuheap @acsmem -@c re_node_set_merge dup @ascuheap @acsmem -@c re_node_set_add_intersect @ascuheap @acsmem -@c (re_)realloc dup @ascuheap @acsmem -@c check_subexp_limits @ascuheap @acsmem -@c sub_epsilon_src_nodes @ascuheap @acsmem -@c re_node_set_init_empty dup ok -@c re_node_set_contains dup ok -@c re_node_set_add_intersect dup @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c re_node_set_remove_at dup ok -@c re_node_set_contains dup ok -@c re_acquire_state dup @ascuheap @acsmem -@c sift_states_bkref @mtslocale @ascuheap @acsmem -@c search_cur_bkref_entry dup ok -@c check_dst_limits ok -@c search_cur_bkref_entry dup ok -@c check_dst_limits_calc_pos ok -@c check_dst_limits_calc_pos_1 ok -@c re_node_set_init_copy dup @ascuheap @acsmem -@c re_node_set_insert dup @ascuheap @acsmem -@c sift_states_backward dup @mtslocale @ascuheap @acsmem -@c merge_state_array dup @ascuheap @acsmem -@c re_node_set_remove ok -@c re_node_set_contains dup ok -@c re_node_set_remove_at dup ok -@c re_node_set_free dup @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c re_node_set_empty dup ok -@c build_sifted_states @mtslocale @ascuheap @acsmem -@c sift_states_iter_mb @mtslocale @ascuheap @acsmem -@c check_node_accept_bytes dup @mtslocale @ascuheap @acsmem -@c check_node_accept dup ok -@c check_dst_limits dup ok -@c re_node_set_insert dup @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c check_halt_state_context dup ok -@c merge_state_array @ascuheap @acsmem -@c re_node_set_init_union dup @ascuheap @acsmem -@c re_acquire_state dup @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c set_regs @ascuheap @acsmem -@c (re_)malloc dup @ascuheap @acsmem -@c re_node_set_init_empty dup ok -@c free_fail_stack_return @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c update_regs ok -@c re_node_set_free dup @ascuheap @acsmem -@c pop_fail_stack @ascuheap @acsmem -@c re_node_set_free dup @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c match_ctx_free @ascuheap @acsmem -@c match_ctx_clean @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c (re_)free dup @ascuheap @acsmem -@c re_string_destruct dup @ascuheap @acsmem -@c libc_lock_unlock @aculock -This function tries to match the compiled regular expression -@code{*@var{compiled}} against @var{string}. - -@code{regexec} returns @code{0} if the regular expression matches; -otherwise, it returns a nonzero value. See the table below for -what nonzero values mean. You can use @code{regerror} to produce an -error message string describing the reason for a nonzero value; -see @ref{Regexp Cleanup}. - -The argument @var{eflags} is a word of bit flags that enable various -options. - -If you want to get information about what part of @var{string} actually -matched the regular expression or its subexpressions, use the arguments -@var{matchptr} and @var{nmatch}. Otherwise, pass @code{0} for -@var{nmatch}, and @code{NULL} for @var{matchptr}. @xref{Regexp -Subexpressions}. -@end deftypefun - -You must match the regular expression with the same set of current -locales that were in effect when you compiled the regular expression. - -The function @code{regexec} accepts the following flags in the -@var{eflags} argument: - -@vtable @code -@comment regex.h -@comment POSIX.2 -@item REG_NOTBOL -Do not regard the beginning of the specified string as the beginning of -a line; more generally, don't make any assumptions about what text might -precede it. - -@comment regex.h -@comment POSIX.2 -@item REG_NOTEOL -Do not regard the end of the specified string as the end of a line; more -generally, don't make any assumptions about what text might follow it. -@end vtable - -Here are the possible nonzero values that @code{regexec} can return: - -@vtable @code -@comment regex.h -@comment POSIX.2 -@item REG_NOMATCH -The pattern didn't match the string. This isn't really an error. - -@comment regex.h -@comment POSIX.2 -@item REG_ESPACE -@code{regexec} ran out of memory. -@end vtable - -@node Regexp Subexpressions -@subsection Match Results with Subexpressions - -When @code{regexec} matches parenthetical subexpressions of -@var{pattern}, it records which parts of @var{string} they match. It -returns that information by storing the offsets into an array whose -elements are structures of type @code{regmatch_t}. The first element of -the array (index @code{0}) records the part of the string that matched -the entire regular expression. Each other element of the array records -the beginning and end of the part that matched a single parenthetical -subexpression. - -@comment regex.h -@comment POSIX.2 -@deftp {Data Type} regmatch_t -This is the data type of the @var{matchptr} array that you pass to -@code{regexec}. It contains two structure fields, as follows: - -@table @code -@item rm_so -The offset in @var{string} of the beginning of a substring. Add this -value to @var{string} to get the address of that part. - -@item rm_eo -The offset in @var{string} of the end of the substring. -@end table -@end deftp - -@comment regex.h -@comment POSIX.2 -@deftp {Data Type} regoff_t -@code{regoff_t} is an alias for another signed integer type. -The fields of @code{regmatch_t} have type @code{regoff_t}. -@end deftp - -The @code{regmatch_t} elements correspond to subexpressions -positionally; the first element (index @code{1}) records where the first -subexpression matched, the second element records the second -subexpression, and so on. The order of the subexpressions is the order -in which they begin. - -When you call @code{regexec}, you specify how long the @var{matchptr} -array is, with the @var{nmatch} argument. This tells @code{regexec} how -many elements to store. If the actual regular expression has more than -@var{nmatch} subexpressions, then you won't get offset information about -the rest of them. But this doesn't alter whether the pattern matches a -particular string or not. - -If you don't want @code{regexec} to return any information about where -the subexpressions matched, you can either supply @code{0} for -@var{nmatch}, or use the flag @code{REG_NOSUB} when you compile the -pattern with @code{regcomp}. - -@node Subexpression Complications -@subsection Complications in Subexpression Matching - -Sometimes a subexpression matches a substring of no characters. This -happens when @samp{f\(o*\)} matches the string @samp{fum}. (It really -matches just the @samp{f}.) In this case, both of the offsets identify -the point in the string where the null substring was found. In this -example, the offsets are both @code{1}. - -Sometimes the entire regular expression can match without using some of -its subexpressions at all---for example, when @samp{ba\(na\)*} matches the -string @samp{ba}, the parenthetical subexpression is not used. When -this happens, @code{regexec} stores @code{-1} in both fields of the -element for that subexpression. - -Sometimes matching the entire regular expression can match a particular -subexpression more than once---for example, when @samp{ba\(na\)*} -matches the string @samp{bananana}, the parenthetical subexpression -matches three times. When this happens, @code{regexec} usually stores -the offsets of the last part of the string that matched the -subexpression. In the case of @samp{bananana}, these offsets are -@code{6} and @code{8}. - -But the last match is not always the one that is chosen. It's more -accurate to say that the last @emph{opportunity} to match is the one -that takes precedence. What this means is that when one subexpression -appears within another, then the results reported for the inner -subexpression reflect whatever happened on the last match of the outer -subexpression. For an example, consider @samp{\(ba\(na\)*s \)*} matching -the string @samp{bananas bas }. The last time the inner expression -actually matches is near the end of the first word. But it is -@emph{considered} again in the second word, and fails to match there. -@code{regexec} reports nonuse of the ``na'' subexpression. - -Another place where this rule applies is when the regular expression -@smallexample -\(ba\(na\)*s \|nefer\(ti\)* \)* -@end smallexample -@noindent -matches @samp{bananas nefertiti}. The ``na'' subexpression does match -in the first word, but it doesn't match in the second word because the -other alternative is used there. Once again, the second repetition of -the outer subexpression overrides the first, and within that second -repetition, the ``na'' subexpression is not used. So @code{regexec} -reports nonuse of the ``na'' subexpression. - -@node Regexp Cleanup -@subsection POSIX Regexp Matching Cleanup - -When you are finished using a compiled regular expression, you can -free the storage it uses by calling @code{regfree}. - -@comment regex.h -@comment POSIX.2 -@deftypefun void regfree (regex_t *@var{compiled}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -@c (re_)free dup @ascuheap @acsmem -@c free_dfa_content dup @ascuheap @acsmem -Calling @code{regfree} frees all the storage that @code{*@var{compiled}} -points to. This includes various internal fields of the @code{regex_t} -structure that aren't documented in this manual. - -@code{regfree} does not free the object @code{*@var{compiled}} itself. -@end deftypefun - -You should always free the space in a @code{regex_t} structure with -@code{regfree} before using the structure to compile another regular -expression. - -When @code{regcomp} or @code{regexec} reports an error, you can use -the function @code{regerror} to turn it into an error message string. - -@comment regex.h -@comment POSIX.2 -@deftypefun size_t regerror (int @var{errcode}, const regex_t *restrict @var{compiled}, char *restrict @var{buffer}, size_t @var{length}) -@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} -@c regerror calls gettext, strcmp and mempcpy or memcpy. -This function produces an error message string for the error code -@var{errcode}, and stores the string in @var{length} bytes of memory -starting at @var{buffer}. For the @var{compiled} argument, supply the -same compiled regular expression structure that @code{regcomp} or -@code{regexec} was working with when it got the error. Alternatively, -you can supply @code{NULL} for @var{compiled}; you will still get a -meaningful error message, but it might not be as detailed. - -If the error message can't fit in @var{length} bytes (including a -terminating null character), then @code{regerror} truncates it. -The string that @code{regerror} stores is always null-terminated -even if it has been truncated. - -The return value of @code{regerror} is the minimum length needed to -store the entire error message. If this is less than @var{length}, then -the error message was not truncated, and you can use it. Otherwise, you -should call @code{regerror} again with a larger buffer. - -Here is a function which uses @code{regerror}, but always dynamically -allocates a buffer for the error message: - -@smallexample -char *get_regerror (int errcode, regex_t *compiled) -@{ - size_t length = regerror (errcode, compiled, NULL, 0); - char *buffer = xmalloc (length); - (void) regerror (errcode, compiled, buffer, length); - return buffer; -@} -@end smallexample -@end deftypefun - -@node Word Expansion -@section Shell-Style Word Expansion -@cindex word expansion -@cindex expansion of shell words - -@dfn{Word expansion} means the process of splitting a string into -@dfn{words} and substituting for variables, commands, and wildcards -just as the shell does. - -For example, when you write @samp{ls -l foo.c}, this string is split -into three separate words---@samp{ls}, @samp{-l} and @samp{foo.c}. -This is the most basic function of word expansion. - -When you write @samp{ls *.c}, this can become many words, because -the word @samp{*.c} can be replaced with any number of file names. -This is called @dfn{wildcard expansion}, and it is also a part of -word expansion. - -When you use @samp{echo $PATH} to print your path, you are taking -advantage of @dfn{variable substitution}, which is also part of word -expansion. - -Ordinary programs can perform word expansion just like the shell by -calling the library function @code{wordexp}. - -@menu -* Expansion Stages:: What word expansion does to a string. -* Calling Wordexp:: How to call @code{wordexp}. -* Flags for Wordexp:: Options you can enable in @code{wordexp}. -* Wordexp Example:: A sample program that does word expansion. -* Tilde Expansion:: Details of how tilde expansion works. -* Variable Substitution:: Different types of variable substitution. -@end menu - -@node Expansion Stages -@subsection The Stages of Word Expansion - -When word expansion is applied to a sequence of words, it performs the -following transformations in the order shown here: - -@enumerate -@item -@cindex tilde expansion -@dfn{Tilde expansion}: Replacement of @samp{~foo} with the name of -the home directory of @samp{foo}. - -@item -Next, three different transformations are applied in the same step, -from left to right: - -@itemize @bullet -@item -@cindex variable substitution -@cindex substitution of variables and commands -@dfn{Variable substitution}: Environment variables are substituted for -references such as @samp{$foo}. - -@item -@cindex command substitution -@dfn{Command substitution}: Constructs such as @w{@samp{`cat foo`}} and -the equivalent @w{@samp{$(cat foo)}} are replaced with the output from -the inner command. - -@item -@cindex arithmetic expansion -@dfn{Arithmetic expansion}: Constructs such as @samp{$(($x-1))} are -replaced with the result of the arithmetic computation. -@end itemize - -@item -@cindex field splitting -@dfn{Field splitting}: subdivision of the text into @dfn{words}. - -@item -@cindex wildcard expansion -@dfn{Wildcard expansion}: The replacement of a construct such as @samp{*.c} -with a list of @samp{.c} file names. Wildcard expansion applies to an -entire word at a time, and replaces that word with 0 or more file names -that are themselves words. - -@item -@cindex quote removal -@cindex removal of quotes -@dfn{Quote removal}: The deletion of string-quotes, now that they have -done their job by inhibiting the above transformations when appropriate. -@end enumerate - -For the details of these transformations, and how to write the constructs -that use them, see @w{@cite{The BASH Manual}} (to appear). - -@node Calling Wordexp -@subsection Calling @code{wordexp} - -All the functions, constants and data types for word expansion are -declared in the header file @file{wordexp.h}. - -Word expansion produces a vector of words (strings). To return this -vector, @code{wordexp} uses a special data type, @code{wordexp_t}, which -is a structure. You pass @code{wordexp} the address of the structure, -and it fills in the structure's fields to tell you about the results. - -@comment wordexp.h -@comment POSIX.2 -@deftp {Data Type} {wordexp_t} -This data type holds a pointer to a word vector. More precisely, it -records both the address of the word vector and its size. - -@table @code -@item we_wordc -The number of elements in the vector. - -@item we_wordv -The address of the vector. This field has type @w{@code{char **}}. - -@item we_offs -The offset of the first real element of the vector, from its nominal -address in the @code{we_wordv} field. Unlike the other fields, this -is always an input to @code{wordexp}, rather than an output from it. - -If you use a nonzero offset, then that many elements at the beginning of -the vector are left empty. (The @code{wordexp} function fills them with -null pointers.) - -The @code{we_offs} field is meaningful only if you use the -@code{WRDE_DOOFFS} flag. Otherwise, the offset is always zero -regardless of what is in this field, and the first real element comes at -the beginning of the vector. -@end table -@end deftp - -@comment wordexp.h -@comment POSIX.2 -@deftypefun int wordexp (const char *@var{words}, wordexp_t *@var{word-vector-ptr}, int @var{flags}) -@safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtasuconst{:@mtsenv{}} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @ascuplugin{} @ascuintl{} @ascuheap{} @asucorrupt{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} -@c wordexp @mtasurace:utent @mtasuconst:@mtsenv @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuintl @ascuheap @asucorrupt @asulock @acucorrupt @aculock @acsfd @acsmem -@c w_newword ok -@c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem -@c calloc dup @ascuheap @acsmem -@c getenv dup @mtsenv -@c strcpy dup ok -@c parse_backslash @ascuheap @acsmem -@c w_addchar dup @ascuheap @acsmem -@c parse_dollars @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c w_addchar dup @ascuheap @acsmem -@c parse_arith @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c w_newword dup ok -@c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c parse_backtick dup @ascuplugin @ascuheap @aculock @acsfd @acsmem -@c parse_qtd_backslash dup @ascuheap @acsmem -@c eval_expr @mtslocale -@c eval_expr_multidiv @mtslocale -@c eval_expr_val @mtslocale -@c isspace dup @mtslocale -@c eval_expr dup @mtslocale -@c isspace dup @mtslocale -@c isspace dup @mtslocale -@c free dup @ascuheap @acsmem -@c w_addchar dup @ascuheap @acsmem -@c w_addstr dup @ascuheap @acsmem -@c itoa_word dup ok -@c parse_comm @ascuplugin @ascuheap @aculock @acsfd @acsmem -@c w_newword dup ok -@c pthread_setcancelstate @ascuplugin @ascuheap @acsmem -@c (disable cancellation around exec_comm; it may do_cancel the -@c second time, if async cancel is enabled) -@c THREAD_ATOMIC_CMPXCHG_VAL dup ok -@c CANCEL_ENABLED_AND_CANCELED_AND_ASYNCHRONOUS dup ok -@c do_cancel @ascuplugin @ascuheap @acsmem -@c THREAD_ATOMIC_BIT_SET dup ok -@c pthread_unwind @ascuplugin @ascuheap @acsmem -@c Unwind_ForcedUnwind if available @ascuplugin @ascuheap @acsmem -@c libc_unwind_longjmp otherwise -@c cleanups -@c exec_comm @ascuplugin @ascuheap @aculock @acsfd @acsmem -@c pipe2 dup ok -@c pipe dup ok -@c fork dup @ascuplugin @aculock -@c close dup @acsfd -@c on child: exec_comm_child -> exec or abort -@c waitpid dup ok -@c read dup ok -@c w_addmem dup @ascuheap @acsmem -@c strchr dup ok -@c w_addword dup @ascuheap @acsmem -@c w_newword dup ok -@c w_addchar dup @ascuheap @acsmem -@c free dup @ascuheap @acsmem -@c kill dup ok -@c free dup @ascuheap @acsmem -@c parse_param @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c reads from __libc_argc and __libc_argv without guards -@c w_newword dup ok -@c isalpha dup @mtslocale^^ -@c w_addchar dup @ascuheap @acsmem -@c isalnum dup @mtslocale^^ -@c isdigit dup @mtslocale^^ -@c strchr dup ok -@c itoa_word dup ok -@c atoi dup @mtslocale -@c getpid dup ok -@c w_addstr dup @ascuheap @acsmem -@c free dup @ascuheap @acsmem -@c strlen dup ok -@c malloc dup @ascuheap @acsmem -@c stpcpy dup ok -@c w_addword dup @ascuheap @acsmem -@c strdup dup @ascuheap @acsmem -@c getenv dup @mtsenv -@c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c parse_tilde dup @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c fnmatch dup @mtsenv @mtslocale @ascuheap @acsmem -@c mempcpy dup ok -@c _ dup @ascuintl -@c fxprintf dup @aculock -@c setenv dup @mtasuconst:@mtsenv @ascuheap @asulock @acucorrupt @aculock @acsmem -@c strspn dup ok -@c strcspn dup ok -@c parse_backtick @ascuplugin @ascuheap @aculock @acsfd @acsmem -@c w_newword dup ok -@c exec_comm dup @ascuplugin @ascuheap @aculock @acsfd @acsmem -@c free dup @ascuheap @acsmem -@c parse_qtd_backslash dup @ascuheap @acsmem -@c parse_backslash dup @ascuheap @acsmem -@c w_addchar dup @ascuheap @acsmem -@c parse_dquote @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c parse_backtick dup @ascuplugin @ascuheap @aculock @acsfd @acsmem -@c parse_qtd_backslash dup @ascuheap @acsmem -@c w_addchar dup @ascuheap @acsmem -@c w_addword dup @ascuheap @acsmem -@c strdup dup @ascuheap @acsmem -@c realloc dup @ascuheap @acsmem -@c free dup @ascuheap @acsmem -@c parse_squote dup @ascuheap @acsmem -@c w_addchar dup @ascuheap @acsmem -@c parse_tilde @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c strchr dup ok -@c w_addchar dup @ascuheap @acsmem -@c getenv dup @mtsenv -@c w_addstr dup @ascuheap @acsmem -@c strlen dup ok -@c w_addmem dup @ascuheap @acsmem -@c realloc dup @ascuheap @acsmem -@c free dup @ascuheap @acsmem -@c mempcpy dup ok -@c getuid dup ok -@c getpwuid_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c getpwnam_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c parse_glob @mtasurace:utent @mtasuconst:@mtsenv @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c strchr dup ok -@c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem -@c parse_qtd_backslash @ascuheap @acsmem -@c w_addchar dup @ascuheap @acsmem -@c parse_backslash dup @ascuheap @acsmem -@c w_addchar dup @ascuheap @acsmem -@c w_addword dup @ascuheap @acsmem -@c w_newword dup ok -@c do_parse_glob @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @aculock @acsfd @acsmem -@c glob dup @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @aculock @acsfd @acsmem [auto glob_t avoids @asucorrupt @acucorrupt] -@c w_addstr dup @ascuheap @acsmem -@c w_addchar dup @ascuheap @acsmem -@c globfree dup @ascuheap @acsmem [auto glob_t avoids @asucorrupt @acucorrupt] -@c free dup @ascuheap @acsmem -@c w_newword dup ok -@c strdup dup @ascuheap @acsmem -@c w_addword dup @ascuheap @acsmem -@c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem -@c strchr dup ok -@c w_addchar dup @ascuheap @acsmem -@c realloc dup @ascuheap @acsmem -@c free dup @ascuheap @acsmem -@c free dup @ascuheap @acsmem -Perform word expansion on the string @var{words}, putting the result in -a newly allocated vector, and store the size and address of this vector -into @code{*@var{word-vector-ptr}}. The argument @var{flags} is a -combination of bit flags; see @ref{Flags for Wordexp}, for details of -the flags. - -You shouldn't use any of the characters @samp{|&;<>} in the string -@var{words} unless they are quoted; likewise for newline. If you use -these characters unquoted, you will get the @code{WRDE_BADCHAR} error -code. Don't use parentheses or braces unless they are quoted or part of -a word expansion construct. If you use quotation characters @samp{'"`}, -they should come in pairs that balance. - -The results of word expansion are a sequence of words. The function -@code{wordexp} allocates a string for each resulting word, then -allocates a vector of type @code{char **} to store the addresses of -these strings. The last element of the vector is a null pointer. -This vector is called the @dfn{word vector}. - -To return this vector, @code{wordexp} stores both its address and its -length (number of elements, not counting the terminating null pointer) -into @code{*@var{word-vector-ptr}}. - -If @code{wordexp} succeeds, it returns 0. Otherwise, it returns one -of these error codes: - -@vtable @code -@comment wordexp.h -@comment POSIX.2 -@item WRDE_BADCHAR -The input string @var{words} contains an unquoted invalid character such -as @samp{|}. - -@comment wordexp.h -@comment POSIX.2 -@item WRDE_BADVAL -The input string refers to an undefined shell variable, and you used the flag -@code{WRDE_UNDEF} to forbid such references. - -@comment wordexp.h -@comment POSIX.2 -@item WRDE_CMDSUB -The input string uses command substitution, and you used the flag -@code{WRDE_NOCMD} to forbid command substitution. - -@comment wordexp.h -@comment POSIX.2 -@item WRDE_NOSPACE -It was impossible to allocate memory to hold the result. In this case, -@code{wordexp} can store part of the results---as much as it could -allocate room for. - -@comment wordexp.h -@comment POSIX.2 -@item WRDE_SYNTAX -There was a syntax error in the input string. For example, an unmatched -quoting character is a syntax error. This error code is also used to -signal division by zero and overflow in arithmetic expansion. -@end vtable -@end deftypefun - -@comment wordexp.h -@comment POSIX.2 -@deftypefun void wordfree (wordexp_t *@var{word-vector-ptr}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} -@c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem -@c free dup @ascuheap @acsmem -Free the storage used for the word-strings and vector that -@code{*@var{word-vector-ptr}} points to. This does not free the -structure @code{*@var{word-vector-ptr}} itself---only the other -data it points to. -@end deftypefun - -@node Flags for Wordexp -@subsection Flags for Word Expansion - -This section describes the flags that you can specify in the -@var{flags} argument to @code{wordexp}. Choose the flags you want, -and combine them with the C operator @code{|}. - -@vtable @code -@comment wordexp.h -@comment POSIX.2 -@item WRDE_APPEND -Append the words from this expansion to the vector of words produced by -previous calls to @code{wordexp}. This way you can effectively expand -several words as if they were concatenated with spaces between them. - -In order for appending to work, you must not modify the contents of the -word vector structure between calls to @code{wordexp}. And, if you set -@code{WRDE_DOOFFS} in the first call to @code{wordexp}, you must also -set it when you append to the results. - -@comment wordexp.h -@comment POSIX.2 -@item WRDE_DOOFFS -Leave blank slots at the beginning of the vector of words. -The @code{we_offs} field says how many slots to leave. -The blank slots contain null pointers. - -@comment wordexp.h -@comment POSIX.2 -@item WRDE_NOCMD -Don't do command substitution; if the input requests command substitution, -report an error. - -@comment wordexp.h -@comment POSIX.2 -@item WRDE_REUSE -Reuse a word vector made by a previous call to @code{wordexp}. -Instead of allocating a new vector of words, this call to @code{wordexp} -will use the vector that already exists (making it larger if necessary). - -Note that the vector may move, so it is not safe to save an old pointer -and use it again after calling @code{wordexp}. You must fetch -@code{we_pathv} anew after each call. - -@comment wordexp.h -@comment POSIX.2 -@item WRDE_SHOWERR -Do show any error messages printed by commands run by command substitution. -More precisely, allow these commands to inherit the standard error output -stream of the current process. By default, @code{wordexp} gives these -commands a standard error stream that discards all output. - -@comment wordexp.h -@comment POSIX.2 -@item WRDE_UNDEF -If the input refers to a shell variable that is not defined, report an -error. -@end vtable - -@node Wordexp Example -@subsection @code{wordexp} Example - -Here is an example of using @code{wordexp} to expand several strings -and use the results to run a shell command. It also shows the use of -@code{WRDE_APPEND} to concatenate the expansions and of @code{wordfree} -to free the space allocated by @code{wordexp}. - -@smallexample -int -expand_and_execute (const char *program, const char **options) -@{ - wordexp_t result; - pid_t pid - int status, i; - - /* @r{Expand the string for the program to run.} */ - switch (wordexp (program, &result, 0)) - @{ - case 0: /* @r{Successful}. */ - break; - case WRDE_NOSPACE: - /* @r{If the error was @code{WRDE_NOSPACE},} - @r{then perhaps part of the result was allocated.} */ - wordfree (&result); - default: /* @r{Some other error.} */ - return -1; - @} - - /* @r{Expand the strings specified for the arguments.} */ - for (i = 0; options[i] != NULL; i++) - @{ - if (wordexp (options[i], &result, WRDE_APPEND)) - @{ - wordfree (&result); - return -1; - @} - @} - - pid = fork (); - if (pid == 0) - @{ - /* @r{This is the child process. Execute the command.} */ - execv (result.we_wordv[0], result.we_wordv); - exit (EXIT_FAILURE); - @} - else if (pid < 0) - /* @r{The fork failed. Report failure.} */ - status = -1; - else - /* @r{This is the parent process. Wait for the child to complete.} */ - if (waitpid (pid, &status, 0) != pid) - status = -1; - - wordfree (&result); - return status; -@} -@end smallexample - -@node Tilde Expansion -@subsection Details of Tilde Expansion - -It's a standard part of shell syntax that you can use @samp{~} at the -beginning of a file name to stand for your own home directory. You -can use @samp{~@var{user}} to stand for @var{user}'s home directory. - -@dfn{Tilde expansion} is the process of converting these abbreviations -to the directory names that they stand for. - -Tilde expansion applies to the @samp{~} plus all following characters up -to whitespace or a slash. It takes place only at the beginning of a -word, and only if none of the characters to be transformed is quoted in -any way. - -Plain @samp{~} uses the value of the environment variable @code{HOME} -as the proper home directory name. @samp{~} followed by a user name -uses @code{getpwname} to look up that user in the user database, and -uses whatever directory is recorded there. Thus, @samp{~} followed -by your own name can give different results from plain @samp{~}, if -the value of @code{HOME} is not really your home directory. - -@node Variable Substitution -@subsection Details of Variable Substitution - -Part of ordinary shell syntax is the use of @samp{$@var{variable}} to -substitute the value of a shell variable into a command. This is called -@dfn{variable substitution}, and it is one part of doing word expansion. - -There are two basic ways you can write a variable reference for -substitution: - -@table @code -@item $@{@var{variable}@} -If you write braces around the variable name, then it is completely -unambiguous where the variable name ends. You can concatenate -additional letters onto the end of the variable value by writing them -immediately after the close brace. For example, @samp{$@{foo@}s} -expands into @samp{tractors}. - -@item $@var{variable} -If you do not put braces around the variable name, then the variable -name consists of all the alphanumeric characters and underscores that -follow the @samp{$}. The next punctuation character ends the variable -name. Thus, @samp{$foo-bar} refers to the variable @code{foo} and expands -into @samp{tractor-bar}. -@end table - -When you use braces, you can also use various constructs to modify the -value that is substituted, or test it in various ways. - -@table @code -@item $@{@var{variable}:-@var{default}@} -Substitute the value of @var{variable}, but if that is empty or -undefined, use @var{default} instead. - -@item $@{@var{variable}:=@var{default}@} -Substitute the value of @var{variable}, but if that is empty or -undefined, use @var{default} instead and set the variable to -@var{default}. - -@item $@{@var{variable}:?@var{message}@} -If @var{variable} is defined and not empty, substitute its value. - -Otherwise, print @var{message} as an error message on the standard error -stream, and consider word expansion a failure. - -@c ??? How does wordexp report such an error? -@c WRDE_BADVAL is returned. - -@item $@{@var{variable}:+@var{replacement}@} -Substitute @var{replacement}, but only if @var{variable} is defined and -nonempty. Otherwise, substitute nothing for this construct. -@end table - -@table @code -@item $@{#@var{variable}@} -Substitute a numeral which expresses in base ten the number of -characters in the value of @var{variable}. @samp{$@{#foo@}} stands for -@samp{7}, because @samp{tractor} is seven characters. -@end table - -These variants of variable substitution let you remove part of the -variable's value before substituting it. The @var{prefix} and -@var{suffix} are not mere strings; they are wildcard patterns, just -like the patterns that you use to match multiple file names. But -in this context, they match against parts of the variable value -rather than against file names. - -@table @code -@item $@{@var{variable}%%@var{suffix}@} -Substitute the value of @var{variable}, but first discard from that -variable any portion at the end that matches the pattern @var{suffix}. - -If there is more than one alternative for how to match against -@var{suffix}, this construct uses the longest possible match. - -Thus, @samp{$@{foo%%r*@}} substitutes @samp{t}, because the largest -match for @samp{r*} at the end of @samp{tractor} is @samp{ractor}. - -@item $@{@var{variable}%@var{suffix}@} -Substitute the value of @var{variable}, but first discard from that -variable any portion at the end that matches the pattern @var{suffix}. - -If there is more than one alternative for how to match against -@var{suffix}, this construct uses the shortest possible alternative. - -Thus, @samp{$@{foo%r*@}} substitutes @samp{tracto}, because the shortest -match for @samp{r*} at the end of @samp{tractor} is just @samp{r}. - -@item $@{@var{variable}##@var{prefix}@} -Substitute the value of @var{variable}, but first discard from that -variable any portion at the beginning that matches the pattern @var{prefix}. - -If there is more than one alternative for how to match against -@var{prefix}, this construct uses the longest possible match. - -Thus, @samp{$@{foo##*t@}} substitutes @samp{or}, because the largest -match for @samp{*t} at the beginning of @samp{tractor} is @samp{tract}. - -@item $@{@var{variable}#@var{prefix}@} -Substitute the value of @var{variable}, but first discard from that -variable any portion at the beginning that matches the pattern @var{prefix}. - -If there is more than one alternative for how to match against -@var{prefix}, this construct uses the shortest possible alternative. - -Thus, @samp{$@{foo#*t@}} substitutes @samp{ractor}, because the shortest -match for @samp{*t} at the beginning of @samp{tractor} is just @samp{t}. - -@end table |