diff options
Diffstat (limited to 'REORG.TODO/manual/pattern.texi')
-rw-r--r-- | REORG.TODO/manual/pattern.texi | 2311 |
1 files changed, 2311 insertions, 0 deletions
diff --git a/REORG.TODO/manual/pattern.texi b/REORG.TODO/manual/pattern.texi new file mode 100644 index 0000000000..069a6a23ea --- /dev/null +++ b/REORG.TODO/manual/pattern.texi @@ -0,0 +1,2311 @@ +@node Pattern Matching, I/O Overview, Searching and Sorting, Top +@c %MENU% Matching shell ``globs'' and regular expressions +@chapter Pattern Matching + +@Theglibc{} provides pattern matching facilities for two kinds of +patterns: regular expressions and file-name wildcards. The library also +provides a facility for expanding variable and command references and +parsing text into words in the way the shell does. + +@menu +* Wildcard Matching:: Matching a wildcard pattern against a single string. +* Globbing:: Finding the files that match a wildcard pattern. +* Regular Expressions:: Matching regular expressions against strings. +* Word Expansion:: Expanding shell variables, nested commands, + arithmetic, and wildcards. + This is what the shell does with shell commands. +@end menu + +@node Wildcard Matching +@section Wildcard Matching + +@pindex fnmatch.h +This section describes how to match a wildcard pattern against a +particular string. The result is a yes or no answer: does the +string fit the pattern or not. The symbols described here are all +declared in @file{fnmatch.h}. + +@comment fnmatch.h +@comment POSIX.2 +@deftypefun int fnmatch (const char *@var{pattern}, const char *@var{string}, int @var{flags}) +@safety{@prelim{}@mtsafe{@mtsenv{} @mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} +@c fnmatch @mtsenv @mtslocale @ascuheap @acsmem +@c strnlen dup ok +@c mbsrtowcs +@c memset dup ok +@c malloc dup @ascuheap @acsmem +@c mbsinit dup ok +@c free dup @ascuheap @acsmem +@c FCT = internal_fnwmatch @mtsenv @mtslocale @ascuheap @acsmem +@c FOLD @mtslocale +@c towlower @mtslocale +@c EXT @mtsenv @mtslocale @ascuheap @acsmem +@c STRLEN = wcslen dup ok +@c getenv @mtsenv +@c malloc dup @ascuheap @acsmem +@c MEMPCPY = wmempcpy dup ok +@c FCT dup @mtsenv @mtslocale @ascuheap @acsmem +@c STRCAT = wcscat dup ok +@c free dup @ascuheap @acsmem +@c END @mtsenv +@c getenv @mtsenv +@c MEMCHR = wmemchr dup ok +@c getenv @mtsenv +@c IS_CHAR_CLASS = is_char_class @mtslocale +@c wctype @mtslocale +@c BTOWC ok +@c ISWCTYPE ok +@c auto findidx dup ok +@c elem_hash dup ok +@c memcmp dup ok +@c collseq_table_lookup dup ok +@c NO_LEADING_PERIOD ok +This function tests whether the string @var{string} matches the pattern +@var{pattern}. It returns @code{0} if they do match; otherwise, it +returns the nonzero value @code{FNM_NOMATCH}. The arguments +@var{pattern} and @var{string} are both strings. + +The argument @var{flags} is a combination of flag bits that alter the +details of matching. See below for a list of the defined flags. + +In @theglibc{}, @code{fnmatch} might sometimes report ``errors'' by +returning nonzero values that are not equal to @code{FNM_NOMATCH}. +@end deftypefun + +These are the available flags for the @var{flags} argument: + +@vtable @code +@comment fnmatch.h +@comment GNU +@item FNM_FILE_NAME +Treat the @samp{/} character specially, for matching file names. If +this flag is set, wildcard constructs in @var{pattern} cannot match +@samp{/} in @var{string}. Thus, the only way to match @samp{/} is with +an explicit @samp{/} in @var{pattern}. + +@comment fnmatch.h +@comment POSIX.2 +@item FNM_PATHNAME +This is an alias for @code{FNM_FILE_NAME}; it comes from POSIX.2. We +don't recommend this name because we don't use the term ``pathname'' for +file names. + +@comment fnmatch.h +@comment POSIX.2 +@item FNM_PERIOD +Treat the @samp{.} character specially if it appears at the beginning of +@var{string}. If this flag is set, wildcard constructs in @var{pattern} +cannot match @samp{.} as the first character of @var{string}. + +If you set both @code{FNM_PERIOD} and @code{FNM_FILE_NAME}, then the +special treatment applies to @samp{.} following @samp{/} as well as to +@samp{.} at the beginning of @var{string}. (The shell uses the +@code{FNM_PERIOD} and @code{FNM_FILE_NAME} flags together for matching +file names.) + +@comment fnmatch.h +@comment POSIX.2 +@item FNM_NOESCAPE +Don't treat the @samp{\} character specially in patterns. Normally, +@samp{\} quotes the following character, turning off its special meaning +(if any) so that it matches only itself. When quoting is enabled, the +pattern @samp{\?} matches only the string @samp{?}, because the question +mark in the pattern acts like an ordinary character. + +If you use @code{FNM_NOESCAPE}, then @samp{\} is an ordinary character. + +@comment fnmatch.h +@comment GNU +@item FNM_LEADING_DIR +Ignore a trailing sequence of characters starting with a @samp{/} in +@var{string}; that is to say, test whether @var{string} starts with a +directory name that @var{pattern} matches. + +If this flag is set, either @samp{foo*} or @samp{foobar} as a pattern +would match the string @samp{foobar/frobozz}. + +@comment fnmatch.h +@comment GNU +@item FNM_CASEFOLD +Ignore case in comparing @var{string} to @var{pattern}. + +@comment fnmatch.h +@comment GNU +@item FNM_EXTMATCH +@cindex Korn Shell +@pindex ksh +Besides the normal patterns, also recognize the extended patterns +introduced in @file{ksh}. The patterns are written in the form +explained in the following table where @var{pattern-list} is a @code{|} +separated list of patterns. + +@table @code +@item ?(@var{pattern-list}) +The pattern matches if zero or one occurrences of any of the patterns +in the @var{pattern-list} allow matching the input string. + +@item *(@var{pattern-list}) +The pattern matches if zero or more occurrences of any of the patterns +in the @var{pattern-list} allow matching the input string. + +@item +(@var{pattern-list}) +The pattern matches if one or more occurrences of any of the patterns +in the @var{pattern-list} allow matching the input string. + +@item @@(@var{pattern-list}) +The pattern matches if exactly one occurrence of any of the patterns in +the @var{pattern-list} allows matching the input string. + +@item !(@var{pattern-list}) +The pattern matches if the input string cannot be matched with any of +the patterns in the @var{pattern-list}. +@end table +@end vtable + +@node Globbing +@section Globbing + +@cindex globbing +The archetypal use of wildcards is for matching against the files in a +directory, and making a list of all the matches. This is called +@dfn{globbing}. + +You could do this using @code{fnmatch}, by reading the directory entries +one by one and testing each one with @code{fnmatch}. But that would be +slow (and complex, since you would have to handle subdirectories by +hand). + +The library provides a function @code{glob} to make this particular use +of wildcards convenient. @code{glob} and the other symbols in this +section are declared in @file{glob.h}. + +@menu +* Calling Glob:: Basic use of @code{glob}. +* Flags for Globbing:: Flags that enable various options in @code{glob}. +* More Flags for Globbing:: GNU specific extensions to @code{glob}. +@end menu + +@node Calling Glob +@subsection Calling @code{glob} + +The result of globbing is a vector of file names (strings). To return +this vector, @code{glob} uses a special data type, @code{glob_t}, which +is a structure. You pass @code{glob} the address of the structure, and +it fills in the structure's fields to tell you about the results. + +@comment glob.h +@comment POSIX.2 +@deftp {Data Type} glob_t +This data type holds a pointer to a word vector. More precisely, it +records both the address of the word vector and its size. The GNU +implementation contains some more fields which are non-standard +extensions. + +@table @code +@item gl_pathc +The number of elements in the vector, excluding the initial null entries +if the GLOB_DOOFFS flag is used (see gl_offs below). + +@item gl_pathv +The address of the vector. This field has type @w{@code{char **}}. + +@item gl_offs +The offset of the first real element of the vector, from its nominal +address in the @code{gl_pathv} field. Unlike the other fields, this +is always an input to @code{glob}, rather than an output from it. + +If you use a nonzero offset, then that many elements at the beginning of +the vector are left empty. (The @code{glob} function fills them with +null pointers.) + +The @code{gl_offs} field is meaningful only if you use the +@code{GLOB_DOOFFS} flag. Otherwise, the offset is always zero +regardless of what is in this field, and the first real element comes at +the beginning of the vector. + +@item gl_closedir +The address of an alternative implementation of the @code{closedir} +function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in +the flag parameter. The type of this field is +@w{@code{void (*) (void *)}}. + +This is a GNU extension. + +@item gl_readdir +The address of an alternative implementation of the @code{readdir} +function used to read the contents of a directory. It is used if the +@code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of +this field is @w{@code{struct dirent *(*) (void *)}}. + +An implementation of @code{gl_readdir} needs to initialize the following +members of the @code{struct dirent} object: + +@table @code +@item d_type +This member should be set to the file type of the entry if it is known. +Otherwise, the value @code{DT_UNKNOWN} can be used. The @code{glob} +function may use the specified file type to avoid callbacks in cases +where the file type indicates that the data is not required. + +@item d_ino +This member needs to be non-zero, otherwise @code{glob} may skip the +current entry and call the @code{gl_readdir} callback function again to +retrieve another entry. + +@item d_name +This member must be set to the name of the entry. It must be +null-terminated. +@end table + +The example below shows how to allocate a @code{struct dirent} object +containing a given name. + +@smallexample +@include mkdirent.c.texi +@end smallexample + +The @code{glob} function reads the @code{struct dirent} members listed +above and makes a copy of the file name in the @code{d_name} member +immediately after the @code{gl_readdir} callback function returns. +Future invocations of any of the callback functions may dealloacte or +reuse the buffer. It is the responsibility of the caller of the +@code{glob} function to allocate and deallocate the buffer, around the +call to @code{glob} or using the callback functions. For example, an +application could allocate the buffer in the @code{gl_readdir} callback +function, and deallocate it in the @code{gl_closedir} callback function. + +The @code{gl_readdir} member is a GNU extension. + +@item gl_opendir +The address of an alternative implementation of the @code{opendir} +function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in +the flag parameter. The type of this field is +@w{@code{void *(*) (const char *)}}. + +This is a GNU extension. + +@item gl_stat +The address of an alternative implementation of the @code{stat} function +to get information about an object in the filesystem. It is used if the +@code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of +this field is @w{@code{int (*) (const char *, struct stat *)}}. + +This is a GNU extension. + +@item gl_lstat +The address of an alternative implementation of the @code{lstat} +function to get information about an object in the filesystems, not +following symbolic links. It is used if the @code{GLOB_ALTDIRFUNC} bit +is set in the flag parameter. The type of this field is @code{@w{int +(*) (const char *,} @w{struct stat *)}}. + +This is a GNU extension. + +@item gl_flags +The flags used when @code{glob} was called. In addition, @code{GLOB_MAGCHAR} +might be set. See @ref{Flags for Globbing} for more details. + +This is a GNU extension. +@end table +@end deftp + +For use in the @code{glob64} function @file{glob.h} contains another +definition for a very similar type. @code{glob64_t} differs from +@code{glob_t} only in the types of the members @code{gl_readdir}, +@code{gl_stat}, and @code{gl_lstat}. + +@comment glob.h +@comment GNU +@deftp {Data Type} glob64_t +This data type holds a pointer to a word vector. More precisely, it +records both the address of the word vector and its size. The GNU +implementation contains some more fields which are non-standard +extensions. + +@table @code +@item gl_pathc +The number of elements in the vector, excluding the initial null entries +if the GLOB_DOOFFS flag is used (see gl_offs below). + +@item gl_pathv +The address of the vector. This field has type @w{@code{char **}}. + +@item gl_offs +The offset of the first real element of the vector, from its nominal +address in the @code{gl_pathv} field. Unlike the other fields, this +is always an input to @code{glob}, rather than an output from it. + +If you use a nonzero offset, then that many elements at the beginning of +the vector are left empty. (The @code{glob} function fills them with +null pointers.) + +The @code{gl_offs} field is meaningful only if you use the +@code{GLOB_DOOFFS} flag. Otherwise, the offset is always zero +regardless of what is in this field, and the first real element comes at +the beginning of the vector. + +@item gl_closedir +The address of an alternative implementation of the @code{closedir} +function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in +the flag parameter. The type of this field is +@w{@code{void (*) (void *)}}. + +This is a GNU extension. + +@item gl_readdir +The address of an alternative implementation of the @code{readdir64} +function used to read the contents of a directory. It is used if the +@code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of +this field is @w{@code{struct dirent64 *(*) (void *)}}. + +This is a GNU extension. + +@item gl_opendir +The address of an alternative implementation of the @code{opendir} +function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in +the flag parameter. The type of this field is +@w{@code{void *(*) (const char *)}}. + +This is a GNU extension. + +@item gl_stat +The address of an alternative implementation of the @code{stat64} function +to get information about an object in the filesystem. It is used if the +@code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of +this field is @w{@code{int (*) (const char *, struct stat64 *)}}. + +This is a GNU extension. + +@item gl_lstat +The address of an alternative implementation of the @code{lstat64} +function to get information about an object in the filesystems, not +following symbolic links. It is used if the @code{GLOB_ALTDIRFUNC} bit +is set in the flag parameter. The type of this field is @code{@w{int +(*) (const char *,} @w{struct stat64 *)}}. + +This is a GNU extension. + +@item gl_flags +The flags used when @code{glob} was called. In addition, @code{GLOB_MAGCHAR} +might be set. See @ref{Flags for Globbing} for more details. + +This is a GNU extension. +@end table +@end deftp + +@comment glob.h +@comment POSIX.2 +@deftypefun int glob (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob_t *@var{vector-ptr}) +@safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @ascuplugin{} @asucorrupt{} @ascuheap{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} +@c glob @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @asucorrupt @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c strlen dup ok +@c strchr dup ok +@c malloc dup @ascuheap @acsmem +@c mempcpy dup ok +@c next_brace_sub ok +@c free dup @ascuheap @acsmem +@c globfree dup @asucorrupt @ascuheap @acucorrupt @acsmem +@c glob_pattern_p ok +@c glob_pattern_type dup ok +@c getenv dup @mtsenv +@c GET_LOGIN_NAME_MAX ok +@c getlogin_r dup @mtasurace:utent @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c GETPW_R_SIZE_MAX ok +@c getpwnam_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c realloc dup @ascuheap @acsmem +@c memcpy dup ok +@c memchr dup ok +@c *pglob->gl_stat user-supplied +@c stat64 dup ok +@c S_ISDIR dup ok +@c strdup dup @ascuheap @acsmem +@c glob_pattern_type ok +@c glob_in_dir @mtsenv @mtslocale @asucorrupt @ascuheap @acucorrupt @acsfd @acsmem +@c strlen dup ok +@c glob_pattern_type dup ok +@c malloc dup @ascuheap @acsmem +@c mempcpy dup ok +@c *pglob->gl_stat user-supplied +@c stat64 dup ok +@c free dup @ascuheap @acsmem +@c *pglob->gl_opendir user-supplied +@c opendir dup @ascuheap @acsmem @acsfd +@c dirfd dup ok +@c *pglob->gl_readdir user-supplied +@c CONVERT_DIRENT_DIRENT64 ok +@c readdir64 ok [protected by exclusive use of the stream] +@c REAL_DIR_ENTRY ok +@c DIRENT_MIGHT_BE_DIR ok +@c fnmatch dup @mtsenv @mtslocale @ascuheap @acsmem +@c DIRENT_MIGHT_BE_SYMLINK ok +@c link_exists_p ok +@c link_exists2_p ok +@c strlen dup ok +@c mempcpy dup ok +@c *pglob->gl_stat user-supplied +@c fxstatat64 dup ok +@c realloc dup @ascuheap @acsmem +@c pglob->gl_closedir user-supplied +@c closedir @ascuheap @acsmem @acsfd +@c prefix_array dup @asucorrupt @ascuheap @acucorrupt @acsmem +@c strlen dup ok +@c malloc dup @ascuheap @acsmem +@c free dup @ascuheap @acsmem +@c mempcpy dup ok +@c strcpy dup ok +The function @code{glob} does globbing using the pattern @var{pattern} +in the current directory. It puts the result in a newly allocated +vector, and stores the size and address of this vector into +@code{*@var{vector-ptr}}. The argument @var{flags} is a combination of +bit flags; see @ref{Flags for Globbing}, for details of the flags. + +The result of globbing is a sequence of file names. The function +@code{glob} allocates a string for each resulting word, then +allocates a vector of type @code{char **} to store the addresses of +these strings. The last element of the vector is a null pointer. +This vector is called the @dfn{word vector}. + +To return this vector, @code{glob} stores both its address and its +length (number of elements, not counting the terminating null pointer) +into @code{*@var{vector-ptr}}. + +Normally, @code{glob} sorts the file names alphabetically before +returning them. You can turn this off with the flag @code{GLOB_NOSORT} +if you want to get the information as fast as possible. Usually it's +a good idea to let @code{glob} sort them---if you process the files in +alphabetical order, the users will have a feel for the rate of progress +that your application is making. + +If @code{glob} succeeds, it returns 0. Otherwise, it returns one +of these error codes: + +@vtable @code +@comment glob.h +@comment POSIX.2 +@item GLOB_ABORTED +There was an error opening a directory, and you used the flag +@code{GLOB_ERR} or your specified @var{errfunc} returned a nonzero +value. +@iftex +See below +@end iftex +@ifinfo +@xref{Flags for Globbing}, +@end ifinfo +for an explanation of the @code{GLOB_ERR} flag and @var{errfunc}. + +@comment glob.h +@comment POSIX.2 +@item GLOB_NOMATCH +The pattern didn't match any existing files. If you use the +@code{GLOB_NOCHECK} flag, then you never get this error code, because +that flag tells @code{glob} to @emph{pretend} that the pattern matched +at least one file. + +@comment glob.h +@comment POSIX.2 +@item GLOB_NOSPACE +It was impossible to allocate memory to hold the result. +@end vtable + +In the event of an error, @code{glob} stores information in +@code{*@var{vector-ptr}} about all the matches it has found so far. + +It is important to notice that the @code{glob} function will not fail if +it encounters directories or files which cannot be handled without the +LFS interfaces. The implementation of @code{glob} is supposed to use +these functions internally. This at least is the assumption made by +the Unix standard. The GNU extension of allowing the user to provide their +own directory handling and @code{stat} functions complicates things a +bit. If these callback functions are used and a large file or directory +is encountered @code{glob} @emph{can} fail. +@end deftypefun + +@comment glob.h +@comment GNU +@deftypefun int glob64 (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob64_t *@var{vector-ptr}) +@safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @asucorrupt{} @ascuheap{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} +@c Same code as glob, but with glob64_t #defined as glob_t. +The @code{glob64} function was added as part of the Large File Summit +extensions but is not part of the original LFS proposal. The reason for +this is simple: it is not necessary. The necessity for a @code{glob64} +function is added by the extensions of the GNU @code{glob} +implementation which allows the user to provide their own directory handling +and @code{stat} functions. The @code{readdir} and @code{stat} functions +do depend on the choice of @code{_FILE_OFFSET_BITS} since the definition +of the types @code{struct dirent} and @code{struct stat} will change +depending on the choice. + +Besides this difference, @code{glob64} works just like @code{glob} in +all aspects. + +This function is a GNU extension. +@end deftypefun + +@node Flags for Globbing +@subsection Flags for Globbing + +This section describes the standard flags that you can specify in the +@var{flags} argument to @code{glob}. Choose the flags you want, +and combine them with the C bitwise OR operator @code{|}. + +Note that there are @ref{More Flags for Globbing} available as GNU extensions. + +@vtable @code +@comment glob.h +@comment POSIX.2 +@item GLOB_APPEND +Append the words from this expansion to the vector of words produced by +previous calls to @code{glob}. This way you can effectively expand +several words as if they were concatenated with spaces between them. + +In order for appending to work, you must not modify the contents of the +word vector structure between calls to @code{glob}. And, if you set +@code{GLOB_DOOFFS} in the first call to @code{glob}, you must also +set it when you append to the results. + +Note that the pointer stored in @code{gl_pathv} may no longer be valid +after you call @code{glob} the second time, because @code{glob} might +have relocated the vector. So always fetch @code{gl_pathv} from the +@code{glob_t} structure after each @code{glob} call; @strong{never} save +the pointer across calls. + +@comment glob.h +@comment POSIX.2 +@item GLOB_DOOFFS +Leave blank slots at the beginning of the vector of words. +The @code{gl_offs} field says how many slots to leave. +The blank slots contain null pointers. + +@comment glob.h +@comment POSIX.2 +@item GLOB_ERR +Give up right away and report an error if there is any difficulty +reading the directories that must be read in order to expand @var{pattern} +fully. Such difficulties might include a directory in which you don't +have the requisite access. Normally, @code{glob} tries its best to keep +on going despite any errors, reading whatever directories it can. + +You can exercise even more control than this by specifying an +error-handler function @var{errfunc} when you call @code{glob}. If +@var{errfunc} is not a null pointer, then @code{glob} doesn't give up +right away when it can't read a directory; instead, it calls +@var{errfunc} with two arguments, like this: + +@smallexample +(*@var{errfunc}) (@var{filename}, @var{error-code}) +@end smallexample + +@noindent +The argument @var{filename} is the name of the directory that +@code{glob} couldn't open or couldn't read, and @var{error-code} is the +@code{errno} value that was reported to @code{glob}. + +If the error handler function returns nonzero, then @code{glob} gives up +right away. Otherwise, it continues. + +@comment glob.h +@comment POSIX.2 +@item GLOB_MARK +If the pattern matches the name of a directory, append @samp{/} to the +directory's name when returning it. + +@comment glob.h +@comment POSIX.2 +@item GLOB_NOCHECK +If the pattern doesn't match any file names, return the pattern itself +as if it were a file name that had been matched. (Normally, when the +pattern doesn't match anything, @code{glob} returns that there were no +matches.) + +@comment glob.h +@comment POSIX.2 +@item GLOB_NOESCAPE +Don't treat the @samp{\} character specially in patterns. Normally, +@samp{\} quotes the following character, turning off its special meaning +(if any) so that it matches only itself. When quoting is enabled, the +pattern @samp{\?} matches only the string @samp{?}, because the question +mark in the pattern acts like an ordinary character. + +If you use @code{GLOB_NOESCAPE}, then @samp{\} is an ordinary character. + +@code{glob} does its work by calling the function @code{fnmatch} +repeatedly. It handles the flag @code{GLOB_NOESCAPE} by turning on the +@code{FNM_NOESCAPE} flag in calls to @code{fnmatch}. + +@comment glob.h +@comment POSIX.2 +@item GLOB_NOSORT +Don't sort the file names; return them in no particular order. +(In practice, the order will depend on the order of the entries in +the directory.) The only reason @emph{not} to sort is to save time. +@end vtable + +@node More Flags for Globbing +@subsection More Flags for Globbing + +Beside the flags described in the last section, the GNU implementation of +@code{glob} allows a few more flags which are also defined in the +@file{glob.h} file. Some of the extensions implement functionality +which is available in modern shell implementations. + +@vtable @code +@comment glob.h +@comment GNU +@item GLOB_PERIOD +The @code{.} character (period) is treated special. It cannot be +matched by wildcards. @xref{Wildcard Matching}, @code{FNM_PERIOD}. + +@comment glob.h +@comment GNU +@item GLOB_MAGCHAR +The @code{GLOB_MAGCHAR} value is not to be given to @code{glob} in the +@var{flags} parameter. Instead, @code{glob} sets this bit in the +@var{gl_flags} element of the @var{glob_t} structure provided as the +result if the pattern used for matching contains any wildcard character. + +@comment glob.h +@comment GNU +@item GLOB_ALTDIRFUNC +Instead of using the normal functions for accessing the +filesystem the @code{glob} implementation uses the user-supplied +functions specified in the structure pointed to by @var{pglob} +parameter. For more information about the functions refer to the +sections about directory handling see @ref{Accessing Directories}, and +@ref{Reading Attributes}. + +@comment glob.h +@comment GNU +@item GLOB_BRACE +If this flag is given, the handling of braces in the pattern is changed. +It is now required that braces appear correctly grouped. I.e., for each +opening brace there must be a closing one. Braces can be used +recursively. So it is possible to define one brace expression in +another one. It is important to note that the range of each brace +expression is completely contained in the outer brace expression (if +there is one). + +The string between the matching braces is separated into single +expressions by splitting at @code{,} (comma) characters. The commas +themselves are discarded. Please note what we said above about recursive +brace expressions. The commas used to separate the subexpressions must +be at the same level. Commas in brace subexpressions are not matched. +They are used during expansion of the brace expression of the deeper +level. The example below shows this + +@smallexample +glob ("@{foo/@{,bar,biz@},baz@}", GLOB_BRACE, NULL, &result) +@end smallexample + +@noindent +is equivalent to the sequence + +@smallexample +glob ("foo/", GLOB_BRACE, NULL, &result) +glob ("foo/bar", GLOB_BRACE|GLOB_APPEND, NULL, &result) +glob ("foo/biz", GLOB_BRACE|GLOB_APPEND, NULL, &result) +glob ("baz", GLOB_BRACE|GLOB_APPEND, NULL, &result) +@end smallexample + +@noindent +if we leave aside error handling. + +@comment glob.h +@comment GNU +@item GLOB_NOMAGIC +If the pattern contains no wildcard constructs (it is a literal file name), +return it as the sole ``matching'' word, even if no file exists by that name. + +@comment glob.h +@comment GNU +@item GLOB_TILDE +If this flag is used the character @code{~} (tilde) is handled specially +if it appears at the beginning of the pattern. Instead of being taken +verbatim it is used to represent the home directory of a known user. + +If @code{~} is the only character in pattern or it is followed by a +@code{/} (slash), the home directory of the process owner is +substituted. Using @code{getlogin} and @code{getpwnam} the information +is read from the system databases. As an example take user @code{bart} +with his home directory at @file{/home/bart}. For him a call like + +@smallexample +glob ("~/bin/*", GLOB_TILDE, NULL, &result) +@end smallexample + +@noindent +would return the contents of the directory @file{/home/bart/bin}. +Instead of referring to the own home directory it is also possible to +name the home directory of other users. To do so one has to append the +user name after the tilde character. So the contents of user +@code{homer}'s @file{bin} directory can be retrieved by + +@smallexample +glob ("~homer/bin/*", GLOB_TILDE, NULL, &result) +@end smallexample + +If the user name is not valid or the home directory cannot be determined +for some reason the pattern is left untouched and itself used as the +result. I.e., if in the last example @code{home} is not available the +tilde expansion yields to @code{"~homer/bin/*"} and @code{glob} is not +looking for a directory named @code{~homer}. + +This functionality is equivalent to what is available in C-shells if the +@code{nonomatch} flag is set. + +@comment glob.h +@comment GNU +@item GLOB_TILDE_CHECK +If this flag is used @code{glob} behaves as if @code{GLOB_TILDE} is +given. The only difference is that if the user name is not available or +the home directory cannot be determined for other reasons this leads to +an error. @code{glob} will return @code{GLOB_NOMATCH} instead of using +the pattern itself as the name. + +This functionality is equivalent to what is available in C-shells if +the @code{nonomatch} flag is not set. + +@comment glob.h +@comment GNU +@item GLOB_ONLYDIR +If this flag is used the globbing function takes this as a +@strong{hint} that the caller is only interested in directories +matching the pattern. If the information about the type of the file +is easily available non-directories will be rejected but no extra +work will be done to determine the information for each file. I.e., +the caller must still be able to filter directories out. + +This functionality is only available with the GNU @code{glob} +implementation. It is mainly used internally to increase the +performance but might be useful for a user as well and therefore is +documented here. +@end vtable + +Calling @code{glob} will in most cases allocate resources which are used +to represent the result of the function call. If the same object of +type @code{glob_t} is used in multiple call to @code{glob} the resources +are freed or reused so that no leaks appear. But this does not include +the time when all @code{glob} calls are done. + +@comment glob.h +@comment POSIX.2 +@deftypefun void globfree (glob_t *@var{pglob}) +@safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} +@c globfree dup @asucorrupt @ascuheap @acucorrupt @acsmem +@c free dup @ascuheap @acsmem +The @code{globfree} function frees all resources allocated by previous +calls to @code{glob} associated with the object pointed to by +@var{pglob}. This function should be called whenever the currently used +@code{glob_t} typed object isn't used anymore. +@end deftypefun + +@comment glob.h +@comment GNU +@deftypefun void globfree64 (glob64_t *@var{pglob}) +@safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} +This function is equivalent to @code{globfree} but it frees records of +type @code{glob64_t} which were allocated by @code{glob64}. +@end deftypefun + + +@node Regular Expressions +@section Regular Expression Matching + +@Theglibc{} supports two interfaces for matching regular +expressions. One is the standard POSIX.2 interface, and the other is +what @theglibc{} has had for many years. + +Both interfaces are declared in the header file @file{regex.h}. +If you define @w{@code{_POSIX_C_SOURCE}}, then only the POSIX.2 +functions, structures, and constants are declared. +@c !!! we only document the POSIX.2 interface here!! + +@menu +* POSIX Regexp Compilation:: Using @code{regcomp} to prepare to match. +* Flags for POSIX Regexps:: Syntax variations for @code{regcomp}. +* Matching POSIX Regexps:: Using @code{regexec} to match the compiled + pattern that you get from @code{regcomp}. +* Regexp Subexpressions:: Finding which parts of the string were matched. +* Subexpression Complications:: Find points of which parts were matched. +* Regexp Cleanup:: Freeing storage; reporting errors. +@end menu + +@node POSIX Regexp Compilation +@subsection POSIX Regular Expression Compilation + +Before you can actually match a regular expression, you must +@dfn{compile} it. This is not true compilation---it produces a special +data structure, not machine instructions. But it is like ordinary +compilation in that its purpose is to enable you to ``execute'' the +pattern fast. (@xref{Matching POSIX Regexps}, for how to use the +compiled regular expression for matching.) + +There is a special data type for compiled regular expressions: + +@comment regex.h +@comment POSIX.2 +@deftp {Data Type} regex_t +This type of object holds a compiled regular expression. +It is actually a structure. It has just one field that your programs +should look at: + +@table @code +@item re_nsub +This field holds the number of parenthetical subexpressions in the +regular expression that was compiled. +@end table + +There are several other fields, but we don't describe them here, because +only the functions in the library should use them. +@end deftp + +After you create a @code{regex_t} object, you can compile a regular +expression into it by calling @code{regcomp}. + +@comment regex.h +@comment POSIX.2 +@deftypefun int regcomp (regex_t *restrict @var{compiled}, const char *restrict @var{pattern}, int @var{cflags}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} +@c All of the issues have to do with memory allocation and multi-byte +@c character handling present in the input string, or implied by ranges +@c or inverted character classes. +@c (re_)malloc @ascuheap @acsmem +@c re_compile_internal @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c (re_)realloc @ascuheap @acsmem [no @asucorrupt @acucorrupt for we zero the buffer] +@c init_dfa @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c (re_)malloc @ascuheap @acsmem +@c calloc @ascuheap @acsmem +@c _NL_CURRENT ok +@c _NL_CURRENT_WORD ok +@c btowc @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c libc_lock_init ok +@c re_string_construct @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_string_construct_common ok +@c re_string_realloc_buffers @ascuheap @acsmem +@c (re_)realloc dup @ascuheap @acsmem +@c build_wcs_upper_buffer @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c isascii ok +@c mbsinit ok +@c toupper ok +@c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c iswlower @mtslocale +@c towupper @mtslocale +@c wcrtomb dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c (re_)malloc dup @ascuheap @acsmem +@c build_upper_buffer ok (@mtslocale but optimized) +@c islower ok +@c toupper ok +@c build_wcs_buffer @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_string_translate_buffer ok +@c parse @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c fetch_token @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c peek_token @mtslocale +@c re_string_eoi ok +@c re_string_peek_byte ok +@c re_string_cur_idx ok +@c re_string_length ok +@c re_string_peek_byte_case @mtslocale +@c re_string_peek_byte dup ok +@c re_string_is_single_byte_char ok +@c isascii ok +@c re_string_peek_byte dup ok +@c re_string_wchar_at ok +@c re_string_skip_bytes ok +@c re_string_skip_bytes dup ok +@c parse_reg_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c parse_branch @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c parse_expression @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c create_token_tree dup @ascuheap @acsmem +@c re_string_eoi dup ok +@c re_string_first_byte ok +@c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c create_tree dup @ascuheap @acsmem +@c parse_sub_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c parse_reg_exp dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c postorder() @ascuheap @acsmem +@c free_tree @ascuheap @acsmem +@c free_token dup @ascuheap @acsmem +@c create_tree dup @ascuheap @acsmem +@c parse_bracket_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c _NL_CURRENT dup ok +@c _NL_CURRENT_WORD dup ok +@c calloc dup @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c peek_token_bracket ok +@c re_string_eoi dup ok +@c re_string_peek_byte dup ok +@c re_string_first_byte dup ok +@c re_string_cur_idx dup ok +@c re_string_length dup ok +@c re_string_skip_bytes dup ok +@c bitset_set ok +@c re_string_skip_bytes ok +@c parse_bracket_element @mtslocale +@c re_string_char_size_at ok +@c re_string_wchar_at dup ok +@c re_string_skip_bytes dup ok +@c parse_bracket_symbol @mtslocale +@c re_string_eoi dup ok +@c re_string_fetch_byte_case @mtslocale +@c re_string_fetch_byte ok +@c re_string_first_byte dup ok +@c isascii ok +@c re_string_char_size_at dup ok +@c re_string_skip_bytes dup ok +@c re_string_fetch_byte dup ok +@c re_string_peek_byte dup ok +@c re_string_skip_bytes dup ok +@c peek_token_bracket dup ok +@c auto build_range_exp @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c auto lookup_collation_sequence_value @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c btowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c collseq_table_lookup ok +@c auto seek_collating_symbol_entry dup ok +@c (re_)realloc dup @ascuheap @acsmem +@c collseq_table_lookup dup ok +@c bitset_set dup ok +@c (re_)realloc dup @ascuheap @acsmem +@c build_equiv_class @mtslocale @ascuheap @acsmem +@c _NL_CURRENT ok +@c auto findidx ok +@c bitset_set dup ok +@c (re_)realloc dup @ascuheap @acsmem +@c auto build_collating_symbol @ascuheap @acsmem +@c auto seek_collating_symbol_entry ok +@c bitset_set dup ok +@c (re_)realloc dup @ascuheap @acsmem +@c build_charclass @mtslocale @ascuheap @acsmem +@c (re_)realloc dup @ascuheap @acsmem +@c bitset_set dup ok +@c isalnum ok +@c iscntrl ok +@c isspace ok +@c isalpha ok +@c isdigit ok +@c isprint ok +@c isupper ok +@c isblank ok +@c isgraph ok +@c ispunct ok +@c isxdigit ok +@c bitset_not ok +@c bitset_mask ok +@c create_token_tree dup @ascuheap @acsmem +@c create_tree dup @ascuheap @acsmem +@c free_charset dup @ascuheap @acsmem +@c init_word_char @mtslocale +@c isalnum ok +@c build_charclass_op @mtslocale @ascuheap @acsmem +@c calloc dup @ascuheap @acsmem +@c build_charclass dup @mtslocale @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c free_charset dup @ascuheap @acsmem +@c bitset_set dup ok +@c bitset_not dup ok +@c bitset_mask dup ok +@c create_token_tree dup @ascuheap @acsmem +@c create_tree dup @ascuheap @acsmem +@c parse_dup_op @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_string_cur_idx dup ok +@c fetch_number @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_string_set_index ok +@c postorder() @ascuheap @acsmem +@c free_tree dup @ascuheap @acsmem +@c mark_opt_subexp ok +@c duplicate_tree @ascuheap @acsmem +@c create_token_tree dup @ascuheap @acsmem +@c create_tree dup @ascuheap @acsmem +@c postorder() @ascuheap @acsmem +@c free_tree dup @ascuheap @acsmem +@c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c parse_branch dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c create_tree dup @ascuheap @acsmem +@c create_tree @ascuheap @acsmem +@c create_token_tree @ascuheap @acsmem +@c (re_)malloc dup @ascuheap @acsmem +@c analyze @ascuheap @acsmem +@c (re_)malloc dup @ascuheap @acsmem +@c preorder() @ascuheap @acsmem +@c optimize_subexps ok +@c calc_next ok +@c link_nfa_nodes @ascuheap @acsmem +@c re_node_set_init_1 @ascuheap @acsmem +@c (re_)malloc dup @ascuheap @acsmem +@c re_node_set_init_2 @ascuheap @acsmem +@c (re_)malloc dup @ascuheap @acsmem +@c postorder() @ascuheap @acsmem +@c lower_subexps @ascuheap @acsmem +@c lower_subexp @ascuheap @acsmem +@c create_tree dup @ascuheap @acsmem +@c calc_first @ascuheap @acsmem +@c re_dfa_add_node @ascuheap @acsmem +@c (re_)realloc dup @ascuheap @acsmem +@c re_node_set_init_empty ok +@c calc_eclosure @ascuheap @acsmem +@c calc_eclosure_iter @ascuheap @acsmem +@c re_node_set_alloc @ascuheap @acsmem +@c (re_)malloc dup @ascuheap @acsmem +@c duplicate_node_closure @ascuheap @acsmem +@c re_node_set_empty ok +@c duplicate_node @ascuheap @acsmem +@c re_dfa_add_node dup @ascuheap @acsmem +@c re_node_set_insert @ascuheap @acsmem +@c (re_)realloc dup @ascuheap @acsmem +@c search_duplicated_node ok +@c re_node_set_merge @ascuheap @acsmem +@c (re_)realloc dup @ascuheap @acsmem +@c re_node_set_free @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c re_node_set_insert dup @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c calc_inveclosure @ascuheap @acsmem +@c re_node_set_init_empty dup ok +@c re_node_set_insert_last @ascuheap @acsmem +@c (re_)realloc dup @ascuheap @acsmem +@c optimize_utf8 ok +@c create_initial_state @ascuheap @acsmem +@c re_node_set_init_copy @ascuheap @acsmem +@c (re_)malloc dup @ascuheap @acsmem +@c re_node_set_init_empty dup ok +@c re_node_set_contains ok +@c re_node_set_merge dup @ascuheap @acsmem +@c re_acquire_state_context @ascuheap @acsmem +@c calc_state_hash ok +@c re_node_set_compare ok +@c create_cd_newstate @ascuheap @acsmem +@c calloc dup @ascuheap @acsmem +@c re_node_set_init_copy dup @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c free_state @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c NOT_SATISFY_PREV_CONSTRAINT ok +@c re_node_set_remove_at ok +@c register_state @ascuheap @acsmem +@c re_node_set_alloc dup @ascuheap @acsmem +@c re_node_set_insert_last dup @ascuheap @acsmem +@c (re_)realloc dup @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c free_workarea_compile @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c re_string_destruct @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c free_dfa_content @ascuheap @acsmem +@c free_token @ascuheap @acsmem +@c free_charset @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c re_compile_fastmap @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_compile_fastmap_iter @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_set_fastmap ok +@c tolower ok +@c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c wcrtomb dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c towlower @mtslocale +@c _NL_CURRENT ok +@c (re_)free @ascuheap @acsmem +The function @code{regcomp} ``compiles'' a regular expression into a +data structure that you can use with @code{regexec} to match against a +string. The compiled regular expression format is designed for +efficient matching. @code{regcomp} stores it into @code{*@var{compiled}}. + +It's up to you to allocate an object of type @code{regex_t} and pass its +address to @code{regcomp}. + +The argument @var{cflags} lets you specify various options that control +the syntax and semantics of regular expressions. @xref{Flags for POSIX +Regexps}. + +If you use the flag @code{REG_NOSUB}, then @code{regcomp} omits from +the compiled regular expression the information necessary to record +how subexpressions actually match. In this case, you might as well +pass @code{0} for the @var{matchptr} and @var{nmatch} arguments when +you call @code{regexec}. + +If you don't use @code{REG_NOSUB}, then the compiled regular expression +does have the capacity to record how subexpressions match. Also, +@code{regcomp} tells you how many subexpressions @var{pattern} has, by +storing the number in @code{@var{compiled}->re_nsub}. You can use that +value to decide how long an array to allocate to hold information about +subexpression matches. + +@code{regcomp} returns @code{0} if it succeeds in compiling the regular +expression; otherwise, it returns a nonzero error code (see the table +below). You can use @code{regerror} to produce an error message string +describing the reason for a nonzero value; see @ref{Regexp Cleanup}. + +@end deftypefun + +Here are the possible nonzero values that @code{regcomp} can return: + +@vtable @code +@comment regex.h +@comment POSIX.2 +@item REG_BADBR +There was an invalid @samp{\@{@dots{}\@}} construct in the regular +expression. A valid @samp{\@{@dots{}\@}} construct must contain either +a single number, or two numbers in increasing order separated by a +comma. + +@comment regex.h +@comment POSIX.2 +@item REG_BADPAT +There was a syntax error in the regular expression. + +@comment regex.h +@comment POSIX.2 +@item REG_BADRPT +A repetition operator such as @samp{?} or @samp{*} appeared in a bad +position (with no preceding subexpression to act on). + +@comment regex.h +@comment POSIX.2 +@item REG_ECOLLATE +The regular expression referred to an invalid collating element (one not +defined in the current locale for string collation). @xref{Locale +Categories}. + +@comment regex.h +@comment POSIX.2 +@item REG_ECTYPE +The regular expression referred to an invalid character class name. + +@comment regex.h +@comment POSIX.2 +@item REG_EESCAPE +The regular expression ended with @samp{\}. + +@comment regex.h +@comment POSIX.2 +@item REG_ESUBREG +There was an invalid number in the @samp{\@var{digit}} construct. + +@comment regex.h +@comment POSIX.2 +@item REG_EBRACK +There were unbalanced square brackets in the regular expression. + +@comment regex.h +@comment POSIX.2 +@item REG_EPAREN +An extended regular expression had unbalanced parentheses, +or a basic regular expression had unbalanced @samp{\(} and @samp{\)}. + +@comment regex.h +@comment POSIX.2 +@item REG_EBRACE +The regular expression had unbalanced @samp{\@{} and @samp{\@}}. + +@comment regex.h +@comment POSIX.2 +@item REG_ERANGE +One of the endpoints in a range expression was invalid. + +@comment regex.h +@comment POSIX.2 +@item REG_ESPACE +@code{regcomp} ran out of memory. +@end vtable + +@node Flags for POSIX Regexps +@subsection Flags for POSIX Regular Expressions + +These are the bit flags that you can use in the @var{cflags} operand when +compiling a regular expression with @code{regcomp}. + +@vtable @code +@comment regex.h +@comment POSIX.2 +@item REG_EXTENDED +Treat the pattern as an extended regular expression, rather than as a +basic regular expression. + +@comment regex.h +@comment POSIX.2 +@item REG_ICASE +Ignore case when matching letters. + +@comment regex.h +@comment POSIX.2 +@item REG_NOSUB +Don't bother storing the contents of the @var{matchptr} array. + +@comment regex.h +@comment POSIX.2 +@item REG_NEWLINE +Treat a newline in @var{string} as dividing @var{string} into multiple +lines, so that @samp{$} can match before the newline and @samp{^} can +match after. Also, don't permit @samp{.} to match a newline, and don't +permit @samp{[^@dots{}]} to match a newline. + +Otherwise, newline acts like any other ordinary character. +@end vtable + +@node Matching POSIX Regexps +@subsection Matching a Compiled POSIX Regular Expression + +Once you have compiled a regular expression, as described in @ref{POSIX +Regexp Compilation}, you can match it against strings using +@code{regexec}. A match anywhere inside the string counts as success, +unless the regular expression contains anchor characters (@samp{^} or +@samp{$}). + +@comment regex.h +@comment POSIX.2 +@deftypefun int regexec (const regex_t *restrict @var{compiled}, const char *restrict @var{string}, size_t @var{nmatch}, regmatch_t @var{matchptr}[restrict], int @var{eflags}) +@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} +@c libc_lock_lock @asulock @aculock +@c re_search_internal @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_string_allocate @ascuheap @acsmem +@c re_string_construct_common dup ok +@c re_string_realloc_buffers dup @ascuheap @acsmem +@c match_ctx_init @ascuheap @acsmem +@c (re_)malloc dup @ascuheap @acsmem +@c re_string_byte_at ok +@c re_string_first_byte dup ok +@c check_matching @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_string_cur_idx dup ok +@c acquire_init_state_context dup @ascuheap @acsmem +@c re_string_context_at ok +@c re_string_byte_at dup ok +@c bitset_contain ok +@c re_acquire_state_context dup @ascuheap @acsmem +@c check_subexp_matching_top @ascuheap @acsmem +@c match_ctx_add_subtop @ascuheap @acsmem +@c (re_)realloc dup @ascuheap @acsmem +@c calloc dup @ascuheap @acsmem +@c transit_state_bkref @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_string_cur_idx dup ok +@c re_string_context_at dup ok +@c NOT_SATISFY_NEXT_CONSTRAINT ok +@c get_subexp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_string_get_buffer ok +@c search_cur_bkref_entry ok +@c clean_state_log_if_needed @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c extend_buffers @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_string_realloc_buffers dup @ascuheap @acsmem +@c (re_)realloc dup @ascuheap @acsmem +@c build_wcs_upper_buffer dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c build_upper_buffer dup ok (@mtslocale but optimized) +@c build_wcs_buffer dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_string_translate_buffer dup ok +@c get_subexp_sub @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c check_arrival @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c (re_)realloc dup @ascuheap @acsmem +@c re_string_context_at dup ok +@c re_node_set_init_1 dup @ascuheap @acsmem +@c check_arrival_expand_ecl @ascuheap @acsmem +@c re_node_set_alloc dup @ascuheap @acsmem +@c find_subexp_node ok +@c re_node_set_merge dup @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c check_arrival_expand_ecl_sub @ascuheap @acsmem +@c re_node_set_contains dup ok +@c re_node_set_insert dup @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c re_node_set_init_copy dup @ascuheap @acsmem +@c re_node_set_init_empty dup ok +@c expand_bkref_cache @ascuheap @acsmem +@c search_cur_bkref_entry dup ok +@c re_node_set_contains dup ok +@c re_node_set_init_1 dup @ascuheap @acsmem +@c check_arrival_expand_ecl dup @ascuheap @acsmem +@c re_node_set_merge dup @ascuheap @acsmem +@c re_node_set_init_copy dup @ascuheap @acsmem +@c re_node_set_insert dup @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c re_acquire_state @ascuheap @acsmem +@c calc_state_hash dup ok +@c re_node_set_compare dup ok +@c create_ci_newstate @ascuheap @acsmem +@c calloc dup @ascuheap @acsmem +@c re_node_set_init_copy dup @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c register_state dup @ascuheap @acsmem +@c free_state dup @ascuheap @acsmem +@c re_acquire_state_context dup @ascuheap @acsmem +@c re_node_set_merge dup @ascuheap @acsmem +@c check_arrival_add_next_nodes @mtslocale @ascuheap @acsmem +@c re_node_set_init_empty dup ok +@c check_node_accept_bytes @mtslocale @ascuheap @acsmem +@c re_string_byte_at dup ok +@c re_string_char_size_at dup ok +@c re_string_elem_size_at @mtslocale +@c _NL_CURRENT_WORD dup ok +@c _NL_CURRENT dup ok +@c auto findidx dup ok +@c _NL_CURRENT_WORD dup ok +@c _NL_CURRENT dup ok +@c collseq_table_lookup dup ok +@c find_collation_sequence_value @mtslocale +@c _NL_CURRENT_WORD dup ok +@c _NL_CURRENT dup ok +@c auto findidx dup ok +@c wcscoll @mtslocale @ascuheap @acsmem +@c re_node_set_empty dup ok +@c re_node_set_merge dup @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c re_node_set_insert dup @ascuheap @acsmem +@c re_acquire_state dup @ascuheap @acsmem +@c check_node_accept ok +@c re_string_byte_at dup ok +@c bitset_contain dup ok +@c re_string_context_at dup ok +@c NOT_SATISFY_NEXT_CONSTRAINT dup ok +@c match_ctx_add_entry @ascuheap @acsmem +@c (re_)realloc dup @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c clean_state_log_if_needed dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c extend_buffers dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c find_subexp_node dup ok +@c calloc dup @ascuheap @acsmem +@c check_arrival dup *** +@c match_ctx_add_sublast @ascuheap @acsmem +@c (re_)realloc dup @ascuheap @acsmem +@c re_acquire_state_context dup @ascuheap @acsmem +@c re_node_set_init_union @ascuheap @acsmem +@c (re_)malloc dup @ascuheap @acsmem +@c re_node_set_init_copy dup @ascuheap @acsmem +@c re_node_set_init_empty dup ok +@c re_node_set_free dup @ascuheap @acsmem +@c check_subexp_matching_top dup @ascuheap @acsmem +@c check_halt_state_context ok +@c re_string_context_at dup ok +@c check_halt_node_context ok +@c NOT_SATISFY_NEXT_CONSTRAINT dup ok +@c re_string_eoi dup ok +@c extend_buffers dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c transit_state @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c transit_state_mb @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_string_context_at dup ok +@c NOT_SATISFY_NEXT_CONSTRAINT dup ok +@c check_node_accept_bytes dup @mtslocale @ascuheap @acsmem +@c re_string_cur_idx dup ok +@c clean_state_log_if_needed @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_node_set_init_union dup @ascuheap @acsmem +@c re_acquire_state_context dup @ascuheap @acsmem +@c re_string_fetch_byte dup ok +@c re_string_context_at dup ok +@c build_trtable @ascuheap @acsmem +@c (re_)malloc dup @ascuheap @acsmem +@c group_nodes_into_DFAstates @ascuheap @acsmem +@c bitset_empty dup ok +@c bitset_set dup ok +@c bitset_merge dup ok +@c bitset_set_all ok +@c bitset_clear ok +@c bitset_contain dup ok +@c bitset_copy ok +@c re_node_set_init_copy dup @ascuheap @acsmem +@c re_node_set_insert dup @ascuheap @acsmem +@c re_node_set_init_1 dup @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c re_node_set_alloc dup @ascuheap @acsmem +@c malloc dup @ascuheap @acsmem +@c free dup @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c bitset_empty ok +@c re_node_set_empty dup ok +@c re_node_set_merge dup @ascuheap @acsmem +@c re_acquire_state_context dup @ascuheap @acsmem +@c bitset_merge ok +@c calloc dup @ascuheap @acsmem +@c bitset_contain dup ok +@c merge_state_with_log @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c re_string_cur_idx dup ok +@c re_node_set_init_union dup @ascuheap @acsmem +@c re_string_context_at dup ok +@c re_node_set_free dup @ascuheap @acsmem +@c check_subexp_matching_top @ascuheap @acsmem +@c match_ctx_add_subtop dup @ascuheap @acsmem +@c transit_state_bkref dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c find_recover_state +@c re_string_cur_idx dup ok +@c re_string_skip_bytes dup ok +@c merge_state_with_log dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd +@c check_halt_state_context dup ok +@c prune_impossible_nodes @mtslocale @ascuheap @acsmem +@c (re_)malloc dup @ascuheap @acsmem +@c sift_ctx_init ok +@c re_node_set_init_empty dup ok +@c sift_states_backward @mtslocale @ascuheap @acsmem +@c re_node_set_init_1 dup @ascuheap @acsmem +@c update_cur_sifted_state @mtslocale @ascuheap @acsmem +@c add_epsilon_src_nodes @ascuheap @acsmem +@c re_acquire_state dup @ascuheap @acsmem +@c re_node_set_alloc dup @ascuheap @acsmem +@c re_node_set_merge dup @ascuheap @acsmem +@c re_node_set_add_intersect @ascuheap @acsmem +@c (re_)realloc dup @ascuheap @acsmem +@c check_subexp_limits @ascuheap @acsmem +@c sub_epsilon_src_nodes @ascuheap @acsmem +@c re_node_set_init_empty dup ok +@c re_node_set_contains dup ok +@c re_node_set_add_intersect dup @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c re_node_set_remove_at dup ok +@c re_node_set_contains dup ok +@c re_acquire_state dup @ascuheap @acsmem +@c sift_states_bkref @mtslocale @ascuheap @acsmem +@c search_cur_bkref_entry dup ok +@c check_dst_limits ok +@c search_cur_bkref_entry dup ok +@c check_dst_limits_calc_pos ok +@c check_dst_limits_calc_pos_1 ok +@c re_node_set_init_copy dup @ascuheap @acsmem +@c re_node_set_insert dup @ascuheap @acsmem +@c sift_states_backward dup @mtslocale @ascuheap @acsmem +@c merge_state_array dup @ascuheap @acsmem +@c re_node_set_remove ok +@c re_node_set_contains dup ok +@c re_node_set_remove_at dup ok +@c re_node_set_free dup @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c re_node_set_empty dup ok +@c build_sifted_states @mtslocale @ascuheap @acsmem +@c sift_states_iter_mb @mtslocale @ascuheap @acsmem +@c check_node_accept_bytes dup @mtslocale @ascuheap @acsmem +@c check_node_accept dup ok +@c check_dst_limits dup ok +@c re_node_set_insert dup @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c check_halt_state_context dup ok +@c merge_state_array @ascuheap @acsmem +@c re_node_set_init_union dup @ascuheap @acsmem +@c re_acquire_state dup @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c set_regs @ascuheap @acsmem +@c (re_)malloc dup @ascuheap @acsmem +@c re_node_set_init_empty dup ok +@c free_fail_stack_return @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c update_regs ok +@c re_node_set_free dup @ascuheap @acsmem +@c pop_fail_stack @ascuheap @acsmem +@c re_node_set_free dup @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c match_ctx_free @ascuheap @acsmem +@c match_ctx_clean @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c (re_)free dup @ascuheap @acsmem +@c re_string_destruct dup @ascuheap @acsmem +@c libc_lock_unlock @aculock +This function tries to match the compiled regular expression +@code{*@var{compiled}} against @var{string}. + +@code{regexec} returns @code{0} if the regular expression matches; +otherwise, it returns a nonzero value. See the table below for +what nonzero values mean. You can use @code{regerror} to produce an +error message string describing the reason for a nonzero value; +see @ref{Regexp Cleanup}. + +The argument @var{eflags} is a word of bit flags that enable various +options. + +If you want to get information about what part of @var{string} actually +matched the regular expression or its subexpressions, use the arguments +@var{matchptr} and @var{nmatch}. Otherwise, pass @code{0} for +@var{nmatch}, and @code{NULL} for @var{matchptr}. @xref{Regexp +Subexpressions}. +@end deftypefun + +You must match the regular expression with the same set of current +locales that were in effect when you compiled the regular expression. + +The function @code{regexec} accepts the following flags in the +@var{eflags} argument: + +@vtable @code +@comment regex.h +@comment POSIX.2 +@item REG_NOTBOL +Do not regard the beginning of the specified string as the beginning of +a line; more generally, don't make any assumptions about what text might +precede it. + +@comment regex.h +@comment POSIX.2 +@item REG_NOTEOL +Do not regard the end of the specified string as the end of a line; more +generally, don't make any assumptions about what text might follow it. +@end vtable + +Here are the possible nonzero values that @code{regexec} can return: + +@vtable @code +@comment regex.h +@comment POSIX.2 +@item REG_NOMATCH +The pattern didn't match the string. This isn't really an error. + +@comment regex.h +@comment POSIX.2 +@item REG_ESPACE +@code{regexec} ran out of memory. +@end vtable + +@node Regexp Subexpressions +@subsection Match Results with Subexpressions + +When @code{regexec} matches parenthetical subexpressions of +@var{pattern}, it records which parts of @var{string} they match. It +returns that information by storing the offsets into an array whose +elements are structures of type @code{regmatch_t}. The first element of +the array (index @code{0}) records the part of the string that matched +the entire regular expression. Each other element of the array records +the beginning and end of the part that matched a single parenthetical +subexpression. + +@comment regex.h +@comment POSIX.2 +@deftp {Data Type} regmatch_t +This is the data type of the @var{matchptr} array that you pass to +@code{regexec}. It contains two structure fields, as follows: + +@table @code +@item rm_so +The offset in @var{string} of the beginning of a substring. Add this +value to @var{string} to get the address of that part. + +@item rm_eo +The offset in @var{string} of the end of the substring. +@end table +@end deftp + +@comment regex.h +@comment POSIX.2 +@deftp {Data Type} regoff_t +@code{regoff_t} is an alias for another signed integer type. +The fields of @code{regmatch_t} have type @code{regoff_t}. +@end deftp + +The @code{regmatch_t} elements correspond to subexpressions +positionally; the first element (index @code{1}) records where the first +subexpression matched, the second element records the second +subexpression, and so on. The order of the subexpressions is the order +in which they begin. + +When you call @code{regexec}, you specify how long the @var{matchptr} +array is, with the @var{nmatch} argument. This tells @code{regexec} how +many elements to store. If the actual regular expression has more than +@var{nmatch} subexpressions, then you won't get offset information about +the rest of them. But this doesn't alter whether the pattern matches a +particular string or not. + +If you don't want @code{regexec} to return any information about where +the subexpressions matched, you can either supply @code{0} for +@var{nmatch}, or use the flag @code{REG_NOSUB} when you compile the +pattern with @code{regcomp}. + +@node Subexpression Complications +@subsection Complications in Subexpression Matching + +Sometimes a subexpression matches a substring of no characters. This +happens when @samp{f\(o*\)} matches the string @samp{fum}. (It really +matches just the @samp{f}.) In this case, both of the offsets identify +the point in the string where the null substring was found. In this +example, the offsets are both @code{1}. + +Sometimes the entire regular expression can match without using some of +its subexpressions at all---for example, when @samp{ba\(na\)*} matches the +string @samp{ba}, the parenthetical subexpression is not used. When +this happens, @code{regexec} stores @code{-1} in both fields of the +element for that subexpression. + +Sometimes matching the entire regular expression can match a particular +subexpression more than once---for example, when @samp{ba\(na\)*} +matches the string @samp{bananana}, the parenthetical subexpression +matches three times. When this happens, @code{regexec} usually stores +the offsets of the last part of the string that matched the +subexpression. In the case of @samp{bananana}, these offsets are +@code{6} and @code{8}. + +But the last match is not always the one that is chosen. It's more +accurate to say that the last @emph{opportunity} to match is the one +that takes precedence. What this means is that when one subexpression +appears within another, then the results reported for the inner +subexpression reflect whatever happened on the last match of the outer +subexpression. For an example, consider @samp{\(ba\(na\)*s \)*} matching +the string @samp{bananas bas }. The last time the inner expression +actually matches is near the end of the first word. But it is +@emph{considered} again in the second word, and fails to match there. +@code{regexec} reports nonuse of the ``na'' subexpression. + +Another place where this rule applies is when the regular expression +@smallexample +\(ba\(na\)*s \|nefer\(ti\)* \)* +@end smallexample +@noindent +matches @samp{bananas nefertiti}. The ``na'' subexpression does match +in the first word, but it doesn't match in the second word because the +other alternative is used there. Once again, the second repetition of +the outer subexpression overrides the first, and within that second +repetition, the ``na'' subexpression is not used. So @code{regexec} +reports nonuse of the ``na'' subexpression. + +@node Regexp Cleanup +@subsection POSIX Regexp Matching Cleanup + +When you are finished using a compiled regular expression, you can +free the storage it uses by calling @code{regfree}. + +@comment regex.h +@comment POSIX.2 +@deftypefun void regfree (regex_t *@var{compiled}) +@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} +@c (re_)free dup @ascuheap @acsmem +@c free_dfa_content dup @ascuheap @acsmem +Calling @code{regfree} frees all the storage that @code{*@var{compiled}} +points to. This includes various internal fields of the @code{regex_t} +structure that aren't documented in this manual. + +@code{regfree} does not free the object @code{*@var{compiled}} itself. +@end deftypefun + +You should always free the space in a @code{regex_t} structure with +@code{regfree} before using the structure to compile another regular +expression. + +When @code{regcomp} or @code{regexec} reports an error, you can use +the function @code{regerror} to turn it into an error message string. + +@comment regex.h +@comment POSIX.2 +@deftypefun size_t regerror (int @var{errcode}, const regex_t *restrict @var{compiled}, char *restrict @var{buffer}, size_t @var{length}) +@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} +@c regerror calls gettext, strcmp and mempcpy or memcpy. +This function produces an error message string for the error code +@var{errcode}, and stores the string in @var{length} bytes of memory +starting at @var{buffer}. For the @var{compiled} argument, supply the +same compiled regular expression structure that @code{regcomp} or +@code{regexec} was working with when it got the error. Alternatively, +you can supply @code{NULL} for @var{compiled}; you will still get a +meaningful error message, but it might not be as detailed. + +If the error message can't fit in @var{length} bytes (including a +terminating null character), then @code{regerror} truncates it. +The string that @code{regerror} stores is always null-terminated +even if it has been truncated. + +The return value of @code{regerror} is the minimum length needed to +store the entire error message. If this is less than @var{length}, then +the error message was not truncated, and you can use it. Otherwise, you +should call @code{regerror} again with a larger buffer. + +Here is a function which uses @code{regerror}, but always dynamically +allocates a buffer for the error message: + +@smallexample +char *get_regerror (int errcode, regex_t *compiled) +@{ + size_t length = regerror (errcode, compiled, NULL, 0); + char *buffer = xmalloc (length); + (void) regerror (errcode, compiled, buffer, length); + return buffer; +@} +@end smallexample +@end deftypefun + +@node Word Expansion +@section Shell-Style Word Expansion +@cindex word expansion +@cindex expansion of shell words + +@dfn{Word expansion} means the process of splitting a string into +@dfn{words} and substituting for variables, commands, and wildcards +just as the shell does. + +For example, when you write @samp{ls -l foo.c}, this string is split +into three separate words---@samp{ls}, @samp{-l} and @samp{foo.c}. +This is the most basic function of word expansion. + +When you write @samp{ls *.c}, this can become many words, because +the word @samp{*.c} can be replaced with any number of file names. +This is called @dfn{wildcard expansion}, and it is also a part of +word expansion. + +When you use @samp{echo $PATH} to print your path, you are taking +advantage of @dfn{variable substitution}, which is also part of word +expansion. + +Ordinary programs can perform word expansion just like the shell by +calling the library function @code{wordexp}. + +@menu +* Expansion Stages:: What word expansion does to a string. +* Calling Wordexp:: How to call @code{wordexp}. +* Flags for Wordexp:: Options you can enable in @code{wordexp}. +* Wordexp Example:: A sample program that does word expansion. +* Tilde Expansion:: Details of how tilde expansion works. +* Variable Substitution:: Different types of variable substitution. +@end menu + +@node Expansion Stages +@subsection The Stages of Word Expansion + +When word expansion is applied to a sequence of words, it performs the +following transformations in the order shown here: + +@enumerate +@item +@cindex tilde expansion +@dfn{Tilde expansion}: Replacement of @samp{~foo} with the name of +the home directory of @samp{foo}. + +@item +Next, three different transformations are applied in the same step, +from left to right: + +@itemize @bullet +@item +@cindex variable substitution +@cindex substitution of variables and commands +@dfn{Variable substitution}: Environment variables are substituted for +references such as @samp{$foo}. + +@item +@cindex command substitution +@dfn{Command substitution}: Constructs such as @w{@samp{`cat foo`}} and +the equivalent @w{@samp{$(cat foo)}} are replaced with the output from +the inner command. + +@item +@cindex arithmetic expansion +@dfn{Arithmetic expansion}: Constructs such as @samp{$(($x-1))} are +replaced with the result of the arithmetic computation. +@end itemize + +@item +@cindex field splitting +@dfn{Field splitting}: subdivision of the text into @dfn{words}. + +@item +@cindex wildcard expansion +@dfn{Wildcard expansion}: The replacement of a construct such as @samp{*.c} +with a list of @samp{.c} file names. Wildcard expansion applies to an +entire word at a time, and replaces that word with 0 or more file names +that are themselves words. + +@item +@cindex quote removal +@cindex removal of quotes +@dfn{Quote removal}: The deletion of string-quotes, now that they have +done their job by inhibiting the above transformations when appropriate. +@end enumerate + +For the details of these transformations, and how to write the constructs +that use them, see @w{@cite{The BASH Manual}} (to appear). + +@node Calling Wordexp +@subsection Calling @code{wordexp} + +All the functions, constants and data types for word expansion are +declared in the header file @file{wordexp.h}. + +Word expansion produces a vector of words (strings). To return this +vector, @code{wordexp} uses a special data type, @code{wordexp_t}, which +is a structure. You pass @code{wordexp} the address of the structure, +and it fills in the structure's fields to tell you about the results. + +@comment wordexp.h +@comment POSIX.2 +@deftp {Data Type} {wordexp_t} +This data type holds a pointer to a word vector. More precisely, it +records both the address of the word vector and its size. + +@table @code +@item we_wordc +The number of elements in the vector. + +@item we_wordv +The address of the vector. This field has type @w{@code{char **}}. + +@item we_offs +The offset of the first real element of the vector, from its nominal +address in the @code{we_wordv} field. Unlike the other fields, this +is always an input to @code{wordexp}, rather than an output from it. + +If you use a nonzero offset, then that many elements at the beginning of +the vector are left empty. (The @code{wordexp} function fills them with +null pointers.) + +The @code{we_offs} field is meaningful only if you use the +@code{WRDE_DOOFFS} flag. Otherwise, the offset is always zero +regardless of what is in this field, and the first real element comes at +the beginning of the vector. +@end table +@end deftp + +@comment wordexp.h +@comment POSIX.2 +@deftypefun int wordexp (const char *@var{words}, wordexp_t *@var{word-vector-ptr}, int @var{flags}) +@safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtasuconst{:@mtsenv{}} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @ascuplugin{} @ascuintl{} @ascuheap{} @asucorrupt{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} +@c wordexp @mtasurace:utent @mtasuconst:@mtsenv @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuintl @ascuheap @asucorrupt @asulock @acucorrupt @aculock @acsfd @acsmem +@c w_newword ok +@c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem +@c calloc dup @ascuheap @acsmem +@c getenv dup @mtsenv +@c strcpy dup ok +@c parse_backslash @ascuheap @acsmem +@c w_addchar dup @ascuheap @acsmem +@c parse_dollars @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c w_addchar dup @ascuheap @acsmem +@c parse_arith @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c w_newword dup ok +@c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c parse_backtick dup @ascuplugin @ascuheap @aculock @acsfd @acsmem +@c parse_qtd_backslash dup @ascuheap @acsmem +@c eval_expr @mtslocale +@c eval_expr_multidiv @mtslocale +@c eval_expr_val @mtslocale +@c isspace dup @mtslocale +@c eval_expr dup @mtslocale +@c isspace dup @mtslocale +@c isspace dup @mtslocale +@c free dup @ascuheap @acsmem +@c w_addchar dup @ascuheap @acsmem +@c w_addstr dup @ascuheap @acsmem +@c itoa_word dup ok +@c parse_comm @ascuplugin @ascuheap @aculock @acsfd @acsmem +@c w_newword dup ok +@c pthread_setcancelstate @ascuplugin @ascuheap @acsmem +@c (disable cancellation around exec_comm; it may do_cancel the +@c second time, if async cancel is enabled) +@c THREAD_ATOMIC_CMPXCHG_VAL dup ok +@c CANCEL_ENABLED_AND_CANCELED_AND_ASYNCHRONOUS dup ok +@c do_cancel @ascuplugin @ascuheap @acsmem +@c THREAD_ATOMIC_BIT_SET dup ok +@c pthread_unwind @ascuplugin @ascuheap @acsmem +@c Unwind_ForcedUnwind if available @ascuplugin @ascuheap @acsmem +@c libc_unwind_longjmp otherwise +@c cleanups +@c exec_comm @ascuplugin @ascuheap @aculock @acsfd @acsmem +@c pipe2 dup ok +@c pipe dup ok +@c fork dup @ascuplugin @aculock +@c close dup @acsfd +@c on child: exec_comm_child -> exec or abort +@c waitpid dup ok +@c read dup ok +@c w_addmem dup @ascuheap @acsmem +@c strchr dup ok +@c w_addword dup @ascuheap @acsmem +@c w_newword dup ok +@c w_addchar dup @ascuheap @acsmem +@c free dup @ascuheap @acsmem +@c kill dup ok +@c free dup @ascuheap @acsmem +@c parse_param @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c reads from __libc_argc and __libc_argv without guards +@c w_newword dup ok +@c isalpha dup @mtslocale^^ +@c w_addchar dup @ascuheap @acsmem +@c isalnum dup @mtslocale^^ +@c isdigit dup @mtslocale^^ +@c strchr dup ok +@c itoa_word dup ok +@c atoi dup @mtslocale +@c getpid dup ok +@c w_addstr dup @ascuheap @acsmem +@c free dup @ascuheap @acsmem +@c strlen dup ok +@c malloc dup @ascuheap @acsmem +@c stpcpy dup ok +@c w_addword dup @ascuheap @acsmem +@c strdup dup @ascuheap @acsmem +@c getenv dup @mtsenv +@c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c parse_tilde dup @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c fnmatch dup @mtsenv @mtslocale @ascuheap @acsmem +@c mempcpy dup ok +@c _ dup @ascuintl +@c fxprintf dup @aculock +@c setenv dup @mtasuconst:@mtsenv @ascuheap @asulock @acucorrupt @aculock @acsmem +@c strspn dup ok +@c strcspn dup ok +@c parse_backtick @ascuplugin @ascuheap @aculock @acsfd @acsmem +@c w_newword dup ok +@c exec_comm dup @ascuplugin @ascuheap @aculock @acsfd @acsmem +@c free dup @ascuheap @acsmem +@c parse_qtd_backslash dup @ascuheap @acsmem +@c parse_backslash dup @ascuheap @acsmem +@c w_addchar dup @ascuheap @acsmem +@c parse_dquote @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c parse_backtick dup @ascuplugin @ascuheap @aculock @acsfd @acsmem +@c parse_qtd_backslash dup @ascuheap @acsmem +@c w_addchar dup @ascuheap @acsmem +@c w_addword dup @ascuheap @acsmem +@c strdup dup @ascuheap @acsmem +@c realloc dup @ascuheap @acsmem +@c free dup @ascuheap @acsmem +@c parse_squote dup @ascuheap @acsmem +@c w_addchar dup @ascuheap @acsmem +@c parse_tilde @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c strchr dup ok +@c w_addchar dup @ascuheap @acsmem +@c getenv dup @mtsenv +@c w_addstr dup @ascuheap @acsmem +@c strlen dup ok +@c w_addmem dup @ascuheap @acsmem +@c realloc dup @ascuheap @acsmem +@c free dup @ascuheap @acsmem +@c mempcpy dup ok +@c getuid dup ok +@c getpwuid_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c getpwnam_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c parse_glob @mtasurace:utent @mtasuconst:@mtsenv @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c strchr dup ok +@c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem +@c parse_qtd_backslash @ascuheap @acsmem +@c w_addchar dup @ascuheap @acsmem +@c parse_backslash dup @ascuheap @acsmem +@c w_addchar dup @ascuheap @acsmem +@c w_addword dup @ascuheap @acsmem +@c w_newword dup ok +@c do_parse_glob @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @aculock @acsfd @acsmem +@c glob dup @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @aculock @acsfd @acsmem [auto glob_t avoids @asucorrupt @acucorrupt] +@c w_addstr dup @ascuheap @acsmem +@c w_addchar dup @ascuheap @acsmem +@c globfree dup @ascuheap @acsmem [auto glob_t avoids @asucorrupt @acucorrupt] +@c free dup @ascuheap @acsmem +@c w_newword dup ok +@c strdup dup @ascuheap @acsmem +@c w_addword dup @ascuheap @acsmem +@c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem +@c strchr dup ok +@c w_addchar dup @ascuheap @acsmem +@c realloc dup @ascuheap @acsmem +@c free dup @ascuheap @acsmem +@c free dup @ascuheap @acsmem +Perform word expansion on the string @var{words}, putting the result in +a newly allocated vector, and store the size and address of this vector +into @code{*@var{word-vector-ptr}}. The argument @var{flags} is a +combination of bit flags; see @ref{Flags for Wordexp}, for details of +the flags. + +You shouldn't use any of the characters @samp{|&;<>} in the string +@var{words} unless they are quoted; likewise for newline. If you use +these characters unquoted, you will get the @code{WRDE_BADCHAR} error +code. Don't use parentheses or braces unless they are quoted or part of +a word expansion construct. If you use quotation characters @samp{'"`}, +they should come in pairs that balance. + +The results of word expansion are a sequence of words. The function +@code{wordexp} allocates a string for each resulting word, then +allocates a vector of type @code{char **} to store the addresses of +these strings. The last element of the vector is a null pointer. +This vector is called the @dfn{word vector}. + +To return this vector, @code{wordexp} stores both its address and its +length (number of elements, not counting the terminating null pointer) +into @code{*@var{word-vector-ptr}}. + +If @code{wordexp} succeeds, it returns 0. Otherwise, it returns one +of these error codes: + +@vtable @code +@comment wordexp.h +@comment POSIX.2 +@item WRDE_BADCHAR +The input string @var{words} contains an unquoted invalid character such +as @samp{|}. + +@comment wordexp.h +@comment POSIX.2 +@item WRDE_BADVAL +The input string refers to an undefined shell variable, and you used the flag +@code{WRDE_UNDEF} to forbid such references. + +@comment wordexp.h +@comment POSIX.2 +@item WRDE_CMDSUB +The input string uses command substitution, and you used the flag +@code{WRDE_NOCMD} to forbid command substitution. + +@comment wordexp.h +@comment POSIX.2 +@item WRDE_NOSPACE +It was impossible to allocate memory to hold the result. In this case, +@code{wordexp} can store part of the results---as much as it could +allocate room for. + +@comment wordexp.h +@comment POSIX.2 +@item WRDE_SYNTAX +There was a syntax error in the input string. For example, an unmatched +quoting character is a syntax error. This error code is also used to +signal division by zero and overflow in arithmetic expansion. +@end vtable +@end deftypefun + +@comment wordexp.h +@comment POSIX.2 +@deftypefun void wordfree (wordexp_t *@var{word-vector-ptr}) +@safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} +@c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem +@c free dup @ascuheap @acsmem +Free the storage used for the word-strings and vector that +@code{*@var{word-vector-ptr}} points to. This does not free the +structure @code{*@var{word-vector-ptr}} itself---only the other +data it points to. +@end deftypefun + +@node Flags for Wordexp +@subsection Flags for Word Expansion + +This section describes the flags that you can specify in the +@var{flags} argument to @code{wordexp}. Choose the flags you want, +and combine them with the C operator @code{|}. + +@vtable @code +@comment wordexp.h +@comment POSIX.2 +@item WRDE_APPEND +Append the words from this expansion to the vector of words produced by +previous calls to @code{wordexp}. This way you can effectively expand +several words as if they were concatenated with spaces between them. + +In order for appending to work, you must not modify the contents of the +word vector structure between calls to @code{wordexp}. And, if you set +@code{WRDE_DOOFFS} in the first call to @code{wordexp}, you must also +set it when you append to the results. + +@comment wordexp.h +@comment POSIX.2 +@item WRDE_DOOFFS +Leave blank slots at the beginning of the vector of words. +The @code{we_offs} field says how many slots to leave. +The blank slots contain null pointers. + +@comment wordexp.h +@comment POSIX.2 +@item WRDE_NOCMD +Don't do command substitution; if the input requests command substitution, +report an error. + +@comment wordexp.h +@comment POSIX.2 +@item WRDE_REUSE +Reuse a word vector made by a previous call to @code{wordexp}. +Instead of allocating a new vector of words, this call to @code{wordexp} +will use the vector that already exists (making it larger if necessary). + +Note that the vector may move, so it is not safe to save an old pointer +and use it again after calling @code{wordexp}. You must fetch +@code{we_pathv} anew after each call. + +@comment wordexp.h +@comment POSIX.2 +@item WRDE_SHOWERR +Do show any error messages printed by commands run by command substitution. +More precisely, allow these commands to inherit the standard error output +stream of the current process. By default, @code{wordexp} gives these +commands a standard error stream that discards all output. + +@comment wordexp.h +@comment POSIX.2 +@item WRDE_UNDEF +If the input refers to a shell variable that is not defined, report an +error. +@end vtable + +@node Wordexp Example +@subsection @code{wordexp} Example + +Here is an example of using @code{wordexp} to expand several strings +and use the results to run a shell command. It also shows the use of +@code{WRDE_APPEND} to concatenate the expansions and of @code{wordfree} +to free the space allocated by @code{wordexp}. + +@smallexample +int +expand_and_execute (const char *program, const char **options) +@{ + wordexp_t result; + pid_t pid + int status, i; + + /* @r{Expand the string for the program to run.} */ + switch (wordexp (program, &result, 0)) + @{ + case 0: /* @r{Successful}. */ + break; + case WRDE_NOSPACE: + /* @r{If the error was @code{WRDE_NOSPACE},} + @r{then perhaps part of the result was allocated.} */ + wordfree (&result); + default: /* @r{Some other error.} */ + return -1; + @} + + /* @r{Expand the strings specified for the arguments.} */ + for (i = 0; options[i] != NULL; i++) + @{ + if (wordexp (options[i], &result, WRDE_APPEND)) + @{ + wordfree (&result); + return -1; + @} + @} + + pid = fork (); + if (pid == 0) + @{ + /* @r{This is the child process. Execute the command.} */ + execv (result.we_wordv[0], result.we_wordv); + exit (EXIT_FAILURE); + @} + else if (pid < 0) + /* @r{The fork failed. Report failure.} */ + status = -1; + else + /* @r{This is the parent process. Wait for the child to complete.} */ + if (waitpid (pid, &status, 0) != pid) + status = -1; + + wordfree (&result); + return status; +@} +@end smallexample + +@node Tilde Expansion +@subsection Details of Tilde Expansion + +It's a standard part of shell syntax that you can use @samp{~} at the +beginning of a file name to stand for your own home directory. You +can use @samp{~@var{user}} to stand for @var{user}'s home directory. + +@dfn{Tilde expansion} is the process of converting these abbreviations +to the directory names that they stand for. + +Tilde expansion applies to the @samp{~} plus all following characters up +to whitespace or a slash. It takes place only at the beginning of a +word, and only if none of the characters to be transformed is quoted in +any way. + +Plain @samp{~} uses the value of the environment variable @code{HOME} +as the proper home directory name. @samp{~} followed by a user name +uses @code{getpwname} to look up that user in the user database, and +uses whatever directory is recorded there. Thus, @samp{~} followed +by your own name can give different results from plain @samp{~}, if +the value of @code{HOME} is not really your home directory. + +@node Variable Substitution +@subsection Details of Variable Substitution + +Part of ordinary shell syntax is the use of @samp{$@var{variable}} to +substitute the value of a shell variable into a command. This is called +@dfn{variable substitution}, and it is one part of doing word expansion. + +There are two basic ways you can write a variable reference for +substitution: + +@table @code +@item $@{@var{variable}@} +If you write braces around the variable name, then it is completely +unambiguous where the variable name ends. You can concatenate +additional letters onto the end of the variable value by writing them +immediately after the close brace. For example, @samp{$@{foo@}s} +expands into @samp{tractors}. + +@item $@var{variable} +If you do not put braces around the variable name, then the variable +name consists of all the alphanumeric characters and underscores that +follow the @samp{$}. The next punctuation character ends the variable +name. Thus, @samp{$foo-bar} refers to the variable @code{foo} and expands +into @samp{tractor-bar}. +@end table + +When you use braces, you can also use various constructs to modify the +value that is substituted, or test it in various ways. + +@table @code +@item $@{@var{variable}:-@var{default}@} +Substitute the value of @var{variable}, but if that is empty or +undefined, use @var{default} instead. + +@item $@{@var{variable}:=@var{default}@} +Substitute the value of @var{variable}, but if that is empty or +undefined, use @var{default} instead and set the variable to +@var{default}. + +@item $@{@var{variable}:?@var{message}@} +If @var{variable} is defined and not empty, substitute its value. + +Otherwise, print @var{message} as an error message on the standard error +stream, and consider word expansion a failure. + +@c ??? How does wordexp report such an error? +@c WRDE_BADVAL is returned. + +@item $@{@var{variable}:+@var{replacement}@} +Substitute @var{replacement}, but only if @var{variable} is defined and +nonempty. Otherwise, substitute nothing for this construct. +@end table + +@table @code +@item $@{#@var{variable}@} +Substitute a numeral which expresses in base ten the number of +characters in the value of @var{variable}. @samp{$@{#foo@}} stands for +@samp{7}, because @samp{tractor} is seven characters. +@end table + +These variants of variable substitution let you remove part of the +variable's value before substituting it. The @var{prefix} and +@var{suffix} are not mere strings; they are wildcard patterns, just +like the patterns that you use to match multiple file names. But +in this context, they match against parts of the variable value +rather than against file names. + +@table @code +@item $@{@var{variable}%%@var{suffix}@} +Substitute the value of @var{variable}, but first discard from that +variable any portion at the end that matches the pattern @var{suffix}. + +If there is more than one alternative for how to match against +@var{suffix}, this construct uses the longest possible match. + +Thus, @samp{$@{foo%%r*@}} substitutes @samp{t}, because the largest +match for @samp{r*} at the end of @samp{tractor} is @samp{ractor}. + +@item $@{@var{variable}%@var{suffix}@} +Substitute the value of @var{variable}, but first discard from that +variable any portion at the end that matches the pattern @var{suffix}. + +If there is more than one alternative for how to match against +@var{suffix}, this construct uses the shortest possible alternative. + +Thus, @samp{$@{foo%r*@}} substitutes @samp{tracto}, because the shortest +match for @samp{r*} at the end of @samp{tractor} is just @samp{r}. + +@item $@{@var{variable}##@var{prefix}@} +Substitute the value of @var{variable}, but first discard from that +variable any portion at the beginning that matches the pattern @var{prefix}. + +If there is more than one alternative for how to match against +@var{prefix}, this construct uses the longest possible match. + +Thus, @samp{$@{foo##*t@}} substitutes @samp{or}, because the largest +match for @samp{*t} at the beginning of @samp{tractor} is @samp{tract}. + +@item $@{@var{variable}#@var{prefix}@} +Substitute the value of @var{variable}, but first discard from that +variable any portion at the beginning that matches the pattern @var{prefix}. + +If there is more than one alternative for how to match against +@var{prefix}, this construct uses the shortest possible alternative. + +Thus, @samp{$@{foo#*t@}} substitutes @samp{ractor}, because the shortest +match for @samp{*t} at the beginning of @samp{tractor} is just @samp{t}. + +@end table |