diff options
Diffstat (limited to 'manual/llio.texi')
-rw-r--r-- | manual/llio.texi | 4429 |
1 files changed, 0 insertions, 4429 deletions
diff --git a/manual/llio.texi b/manual/llio.texi deleted file mode 100644 index 8d18509d45..0000000000 --- a/manual/llio.texi +++ /dev/null @@ -1,4429 +0,0 @@ -@node Low-Level I/O, File System Interface, I/O on Streams, Top -@c %MENU% Low-level, less portable I/O -@chapter Low-Level Input/Output - -This chapter describes functions for performing low-level input/output -operations on file descriptors. These functions include the primitives -for the higher-level I/O functions described in @ref{I/O on Streams}, as -well as functions for performing low-level control operations for which -there are no equivalents on streams. - -Stream-level I/O is more flexible and usually more convenient; -therefore, programmers generally use the descriptor-level functions only -when necessary. These are some of the usual reasons: - -@itemize @bullet -@item -For reading binary files in large chunks. - -@item -For reading an entire file into core before parsing it. - -@item -To perform operations other than data transfer, which can only be done -with a descriptor. (You can use @code{fileno} to get the descriptor -corresponding to a stream.) - -@item -To pass descriptors to a child process. (The child can create its own -stream to use a descriptor that it inherits, but cannot inherit a stream -directly.) -@end itemize - -@menu -* Opening and Closing Files:: How to open and close file - descriptors. -* I/O Primitives:: Reading and writing data. -* File Position Primitive:: Setting a descriptor's file - position. -* Descriptors and Streams:: Converting descriptor to stream - or vice-versa. -* Stream/Descriptor Precautions:: Precautions needed if you use both - descriptors and streams. -* Scatter-Gather:: Fast I/O to discontinuous buffers. -* Memory-mapped I/O:: Using files like memory. -* Waiting for I/O:: How to check for input or output - on multiple file descriptors. -* Synchronizing I/O:: Making sure all I/O actions completed. -* Asynchronous I/O:: Perform I/O in parallel. -* Control Operations:: Various other operations on file - descriptors. -* Duplicating Descriptors:: Fcntl commands for duplicating - file descriptors. -* Descriptor Flags:: Fcntl commands for manipulating - flags associated with file - descriptors. -* File Status Flags:: Fcntl commands for manipulating - flags associated with open files. -* File Locks:: Fcntl commands for implementing - file locking. -* Open File Description Locks:: Fcntl commands for implementing - open file description locking. -* Open File Description Locks Example:: An example of open file description lock - usage -* Interrupt Input:: Getting an asynchronous signal when - input arrives. -* IOCTLs:: Generic I/O Control operations. -@end menu - - -@node Opening and Closing Files -@section Opening and Closing Files - -@cindex opening a file descriptor -@cindex closing a file descriptor -This section describes the primitives for opening and closing files -using file descriptors. The @code{open} and @code{creat} functions are -declared in the header file @file{fcntl.h}, while @code{close} is -declared in @file{unistd.h}. -@pindex unistd.h -@pindex fcntl.h - -@comment fcntl.h -@comment POSIX.1 -@deftypefun int open (const char *@var{filename}, int @var{flags}[, mode_t @var{mode}]) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} -The @code{open} function creates and returns a new file descriptor for -the file named by @var{filename}. Initially, the file position -indicator for the file is at the beginning of the file. The argument -@var{mode} (@pxref{Permission Bits}) is used only when a file is -created, but it doesn't hurt to supply the argument in any case. - -The @var{flags} argument controls how the file is to be opened. This is -a bit mask; you create the value by the bitwise OR of the appropriate -parameters (using the @samp{|} operator in C). -@xref{File Status Flags}, for the parameters available. - -The normal return value from @code{open} is a non-negative integer file -descriptor. In the case of an error, a value of @math{-1} is returned -instead. In addition to the usual file name errors (@pxref{File -Name Errors}), the following @code{errno} error conditions are defined -for this function: - -@table @code -@item EACCES -The file exists but is not readable/writable as requested by the @var{flags} -argument, or the file does not exist and the directory is unwritable so -it cannot be created. - -@item EEXIST -Both @code{O_CREAT} and @code{O_EXCL} are set, and the named file already -exists. - -@item EINTR -The @code{open} operation was interrupted by a signal. -@xref{Interrupted Primitives}. - -@item EISDIR -The @var{flags} argument specified write access, and the file is a directory. - -@item EMFILE -The process has too many files open. -The maximum number of file descriptors is controlled by the -@code{RLIMIT_NOFILE} resource limit; @pxref{Limits on Resources}. - -@item ENFILE -The entire system, or perhaps the file system which contains the -directory, cannot support any additional open files at the moment. -(This problem cannot happen on @gnuhurdsystems{}.) - -@item ENOENT -The named file does not exist, and @code{O_CREAT} is not specified. - -@item ENOSPC -The directory or file system that would contain the new file cannot be -extended, because there is no disk space left. - -@item ENXIO -@code{O_NONBLOCK} and @code{O_WRONLY} are both set in the @var{flags} -argument, the file named by @var{filename} is a FIFO (@pxref{Pipes and -FIFOs}), and no process has the file open for reading. - -@item EROFS -The file resides on a read-only file system and any of @w{@code{O_WRONLY}}, -@code{O_RDWR}, and @code{O_TRUNC} are set in the @var{flags} argument, -or @code{O_CREAT} is set and the file does not already exist. -@end table - -@c !!! umask - -If on a 32 bit machine the sources are translated with -@code{_FILE_OFFSET_BITS == 64} the function @code{open} returns a file -descriptor opened in the large file mode which enables the file handling -functions to use files up to @twoexp{63} bytes in size and offset from -@minus{}@twoexp{63} to @twoexp{63}. This happens transparently for the user -since all of the low-level file handling functions are equally replaced. - -This function is a cancellation point in multi-threaded programs. This -is a problem if the thread allocates some resources (like memory, file -descriptors, semaphores or whatever) at the time @code{open} is -called. If the thread gets canceled these resources stay allocated -until the program ends. To avoid this calls to @code{open} should be -protected using cancellation handlers. -@c ref pthread_cleanup_push / pthread_cleanup_pop - -The @code{open} function is the underlying primitive for the @code{fopen} -and @code{freopen} functions, that create streams. -@end deftypefun - -@comment fcntl.h -@comment Unix98 -@deftypefun int open64 (const char *@var{filename}, int @var{flags}[, mode_t @var{mode}]) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} -This function is similar to @code{open}. It returns a file descriptor -which can be used to access the file named by @var{filename}. The only -difference is that on 32 bit systems the file is opened in the -large file mode. I.e., file length and file offsets can exceed 31 bits. - -When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this -function is actually available under the name @code{open}. I.e., the -new, extended API using 64 bit file sizes and offsets transparently -replaces the old API. -@end deftypefun - -@comment fcntl.h -@comment POSIX.1 -@deftypefn {Obsolete function} int creat (const char *@var{filename}, mode_t @var{mode}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} -This function is obsolete. The call: - -@smallexample -creat (@var{filename}, @var{mode}) -@end smallexample - -@noindent -is equivalent to: - -@smallexample -open (@var{filename}, O_WRONLY | O_CREAT | O_TRUNC, @var{mode}) -@end smallexample - -If on a 32 bit machine the sources are translated with -@code{_FILE_OFFSET_BITS == 64} the function @code{creat} returns a file -descriptor opened in the large file mode which enables the file handling -functions to use files up to @twoexp{63} in size and offset from -@minus{}@twoexp{63} to @twoexp{63}. This happens transparently for the user -since all of the low-level file handling functions are equally replaced. -@end deftypefn - -@comment fcntl.h -@comment Unix98 -@deftypefn {Obsolete function} int creat64 (const char *@var{filename}, mode_t @var{mode}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} -This function is similar to @code{creat}. It returns a file descriptor -which can be used to access the file named by @var{filename}. The only -difference is that on 32 bit systems the file is opened in the -large file mode. I.e., file length and file offsets can exceed 31 bits. - -To use this file descriptor one must not use the normal operations but -instead the counterparts named @code{*64}, e.g., @code{read64}. - -When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this -function is actually available under the name @code{open}. I.e., the -new, extended API using 64 bit file sizes and offsets transparently -replaces the old API. -@end deftypefn - -@comment unistd.h -@comment POSIX.1 -@deftypefun int close (int @var{filedes}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} -The function @code{close} closes the file descriptor @var{filedes}. -Closing a file has the following consequences: - -@itemize @bullet -@item -The file descriptor is deallocated. - -@item -Any record locks owned by the process on the file are unlocked. - -@item -When all file descriptors associated with a pipe or FIFO have been closed, -any unread data is discarded. -@end itemize - -This function is a cancellation point in multi-threaded programs. This -is a problem if the thread allocates some resources (like memory, file -descriptors, semaphores or whatever) at the time @code{close} is -called. If the thread gets canceled these resources stay allocated -until the program ends. To avoid this, calls to @code{close} should be -protected using cancellation handlers. -@c ref pthread_cleanup_push / pthread_cleanup_pop - -The normal return value from @code{close} is @math{0}; a value of @math{-1} -is returned in case of failure. The following @code{errno} error -conditions are defined for this function: - -@table @code -@item EBADF -The @var{filedes} argument is not a valid file descriptor. - -@item EINTR -The @code{close} call was interrupted by a signal. -@xref{Interrupted Primitives}. -Here is an example of how to handle @code{EINTR} properly: - -@smallexample -TEMP_FAILURE_RETRY (close (desc)); -@end smallexample - -@item ENOSPC -@itemx EIO -@itemx EDQUOT -When the file is accessed by NFS, these errors from @code{write} can sometimes -not be detected until @code{close}. @xref{I/O Primitives}, for details -on their meaning. -@end table - -Please note that there is @emph{no} separate @code{close64} function. -This is not necessary since this function does not determine nor depend -on the mode of the file. The kernel which performs the @code{close} -operation knows which mode the descriptor is used for and can handle -this situation. -@end deftypefun - -To close a stream, call @code{fclose} (@pxref{Closing Streams}) instead -of trying to close its underlying file descriptor with @code{close}. -This flushes any buffered output and updates the stream object to -indicate that it is closed. - -@node I/O Primitives -@section Input and Output Primitives - -This section describes the functions for performing primitive input and -output operations on file descriptors: @code{read}, @code{write}, and -@code{lseek}. These functions are declared in the header file -@file{unistd.h}. -@pindex unistd.h - -@comment unistd.h -@comment POSIX.1 -@deftp {Data Type} ssize_t -This data type is used to represent the sizes of blocks that can be -read or written in a single operation. It is similar to @code{size_t}, -but must be a signed type. -@end deftp - -@cindex reading from a file descriptor -@comment unistd.h -@comment POSIX.1 -@deftypefun ssize_t read (int @var{filedes}, void *@var{buffer}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{read} function reads up to @var{size} bytes from the file -with descriptor @var{filedes}, storing the results in the @var{buffer}. -(This is not necessarily a character string, and no terminating null -character is added.) - -@cindex end-of-file, on a file descriptor -The return value is the number of bytes actually read. This might be -less than @var{size}; for example, if there aren't that many bytes left -in the file or if there aren't that many bytes immediately available. -The exact behavior depends on what kind of file it is. Note that -reading less than @var{size} bytes is not an error. - -A value of zero indicates end-of-file (except if the value of the -@var{size} argument is also zero). This is not considered an error. -If you keep calling @code{read} while at end-of-file, it will keep -returning zero and doing nothing else. - -If @code{read} returns at least one character, there is no way you can -tell whether end-of-file was reached. But if you did reach the end, the -next read will return zero. - -In case of an error, @code{read} returns @math{-1}. The following -@code{errno} error conditions are defined for this function: - -@table @code -@item EAGAIN -Normally, when no input is immediately available, @code{read} waits for -some input. But if the @code{O_NONBLOCK} flag is set for the file -(@pxref{File Status Flags}), @code{read} returns immediately without -reading any data, and reports this error. - -@strong{Compatibility Note:} Most versions of BSD Unix use a different -error code for this: @code{EWOULDBLOCK}. In @theglibc{}, -@code{EWOULDBLOCK} is an alias for @code{EAGAIN}, so it doesn't matter -which name you use. - -On some systems, reading a large amount of data from a character special -file can also fail with @code{EAGAIN} if the kernel cannot find enough -physical memory to lock down the user's pages. This is limited to -devices that transfer with direct memory access into the user's memory, -which means it does not include terminals, since they always use -separate buffers inside the kernel. This problem never happens on -@gnuhurdsystems{}. - -Any condition that could result in @code{EAGAIN} can instead result in a -successful @code{read} which returns fewer bytes than requested. -Calling @code{read} again immediately would result in @code{EAGAIN}. - -@item EBADF -The @var{filedes} argument is not a valid file descriptor, -or is not open for reading. - -@item EINTR -@code{read} was interrupted by a signal while it was waiting for input. -@xref{Interrupted Primitives}. A signal will not necessarily cause -@code{read} to return @code{EINTR}; it may instead result in a -successful @code{read} which returns fewer bytes than requested. - -@item EIO -For many devices, and for disk files, this error code indicates -a hardware error. - -@code{EIO} also occurs when a background process tries to read from the -controlling terminal, and the normal action of stopping the process by -sending it a @code{SIGTTIN} signal isn't working. This might happen if -the signal is being blocked or ignored, or because the process group is -orphaned. @xref{Job Control}, for more information about job control, -and @ref{Signal Handling}, for information about signals. - -@item EINVAL -In some systems, when reading from a character or block device, position -and size offsets must be aligned to a particular block size. This error -indicates that the offsets were not properly aligned. -@end table - -Please note that there is no function named @code{read64}. This is not -necessary since this function does not directly modify or handle the -possibly wide file offset. Since the kernel handles this state -internally, the @code{read} function can be used for all cases. - -This function is a cancellation point in multi-threaded programs. This -is a problem if the thread allocates some resources (like memory, file -descriptors, semaphores or whatever) at the time @code{read} is -called. If the thread gets canceled these resources stay allocated -until the program ends. To avoid this, calls to @code{read} should be -protected using cancellation handlers. -@c ref pthread_cleanup_push / pthread_cleanup_pop - -The @code{read} function is the underlying primitive for all of the -functions that read from streams, such as @code{fgetc}. -@end deftypefun - -@comment unistd.h -@comment Unix98 -@deftypefun ssize_t pread (int @var{filedes}, void *@var{buffer}, size_t @var{size}, off_t @var{offset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c This is usually a safe syscall. The sysdeps/posix fallback emulation -@c is not MT-Safe because it uses lseek, read and lseek back, but is it -@c used anywhere? -The @code{pread} function is similar to the @code{read} function. The -first three arguments are identical, and the return values and error -codes also correspond. - -The difference is the fourth argument and its handling. The data block -is not read from the current position of the file descriptor -@code{filedes}. Instead the data is read from the file starting at -position @var{offset}. The position of the file descriptor itself is -not affected by the operation. The value is the same as before the call. - -When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the -@code{pread} function is in fact @code{pread64} and the type -@code{off_t} has 64 bits, which makes it possible to handle files up to -@twoexp{63} bytes in length. - -The return value of @code{pread} describes the number of bytes read. -In the error case it returns @math{-1} like @code{read} does and the -error codes are also the same, with these additions: - -@table @code -@item EINVAL -The value given for @var{offset} is negative and therefore illegal. - -@item ESPIPE -The file descriptor @var{filedes} is associated with a pipe or a FIFO and -this device does not allow positioning of the file pointer. -@end table - -The function is an extension defined in the Unix Single Specification -version 2. -@end deftypefun - -@comment unistd.h -@comment Unix98 -@deftypefun ssize_t pread64 (int @var{filedes}, void *@var{buffer}, size_t @var{size}, off64_t @var{offset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c This is usually a safe syscall. The sysdeps/posix fallback emulation -@c is not MT-Safe because it uses lseek64, read and lseek64 back, but is -@c it used anywhere? -This function is similar to the @code{pread} function. The difference -is that the @var{offset} parameter is of type @code{off64_t} instead of -@code{off_t} which makes it possible on 32 bit machines to address -files larger than @twoexp{31} bytes and up to @twoexp{63} bytes. The -file descriptor @code{filedes} must be opened using @code{open64} since -otherwise the large offsets possible with @code{off64_t} will lead to -errors with a descriptor in small file mode. - -When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a -32 bit machine this function is actually available under the name -@code{pread} and so transparently replaces the 32 bit interface. -@end deftypefun - -@cindex writing to a file descriptor -@comment unistd.h -@comment POSIX.1 -@deftypefun ssize_t write (int @var{filedes}, const void *@var{buffer}, size_t @var{size}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c Some say write is thread-unsafe on Linux without O_APPEND. In the VFS layer -@c the vfs_write() does no locking around the acquisition of a file offset and -@c therefore multiple threads / kernel tasks may race and get the same offset -@c resulting in data loss. -@c -@c See: -@c http://thread.gmane.org/gmane.linux.kernel/397980 -@c http://lwn.net/Articles/180387/ -@c -@c The counter argument is that POSIX only says that the write starts at the -@c file position and that the file position is updated *before* the function -@c returns. What that really means is that any expectation of atomic writes is -@c strictly an invention of the interpretation of the reader. Data loss could -@c happen if two threads start the write at the same time. Only writes that -@c come after the return of another write are guaranteed to follow the other -@c write. -@c -@c The other side of the coin is that POSIX goes on further to say in -@c "2.9.7 Thread Interactions with Regular File Operations" that threads -@c should never see interleaving sets of file operations, but it is insane -@c to do anything like that because it kills performance, so you don't get -@c those guarantees in Linux. -@c -@c So we mark it thread safe, it doesn't blow up, but you might loose -@c data, and we don't strictly meet the POSIX requirements. -@c -@c The fix for file offsets racing was merged in 3.14, the commits were: -@c 9c225f2655e36a470c4f58dbbc99244c5fc7f2d4, and -@c d7a15f8d0777955986a2ab00ab181795cab14b01. Therefore after Linux 3.14 you -@c should get mostly MT-safe writes. -The @code{write} function writes up to @var{size} bytes from -@var{buffer} to the file with descriptor @var{filedes}. The data in -@var{buffer} is not necessarily a character string and a null character is -output like any other character. - -The return value is the number of bytes actually written. This may be -@var{size}, but can always be smaller. Your program should always call -@code{write} in a loop, iterating until all the data is written. - -Once @code{write} returns, the data is enqueued to be written and can be -read back right away, but it is not necessarily written out to permanent -storage immediately. You can use @code{fsync} when you need to be sure -your data has been permanently stored before continuing. (It is more -efficient for the system to batch up consecutive writes and do them all -at once when convenient. Normally they will always be written to disk -within a minute or less.) Modern systems provide another function -@code{fdatasync} which guarantees integrity only for the file data and -is therefore faster. -@c !!! xref fsync, fdatasync -You can use the @code{O_FSYNC} open mode to make @code{write} always -store the data to disk before returning; @pxref{Operating Modes}. - -In the case of an error, @code{write} returns @math{-1}. The following -@code{errno} error conditions are defined for this function: - -@table @code -@item EAGAIN -Normally, @code{write} blocks until the write operation is complete. -But if the @code{O_NONBLOCK} flag is set for the file (@pxref{Control -Operations}), it returns immediately without writing any data and -reports this error. An example of a situation that might cause the -process to block on output is writing to a terminal device that supports -flow control, where output has been suspended by receipt of a STOP -character. - -@strong{Compatibility Note:} Most versions of BSD Unix use a different -error code for this: @code{EWOULDBLOCK}. In @theglibc{}, -@code{EWOULDBLOCK} is an alias for @code{EAGAIN}, so it doesn't matter -which name you use. - -On some systems, writing a large amount of data from a character special -file can also fail with @code{EAGAIN} if the kernel cannot find enough -physical memory to lock down the user's pages. This is limited to -devices that transfer with direct memory access into the user's memory, -which means it does not include terminals, since they always use -separate buffers inside the kernel. This problem does not arise on -@gnuhurdsystems{}. - -@item EBADF -The @var{filedes} argument is not a valid file descriptor, -or is not open for writing. - -@item EFBIG -The size of the file would become larger than the implementation can support. - -@item EINTR -The @code{write} operation was interrupted by a signal while it was -blocked waiting for completion. A signal will not necessarily cause -@code{write} to return @code{EINTR}; it may instead result in a -successful @code{write} which writes fewer bytes than requested. -@xref{Interrupted Primitives}. - -@item EIO -For many devices, and for disk files, this error code indicates -a hardware error. - -@item ENOSPC -The device containing the file is full. - -@item EPIPE -This error is returned when you try to write to a pipe or FIFO that -isn't open for reading by any process. When this happens, a @code{SIGPIPE} -signal is also sent to the process; see @ref{Signal Handling}. - -@item EINVAL -In some systems, when writing to a character or block device, position -and size offsets must be aligned to a particular block size. This error -indicates that the offsets were not properly aligned. -@end table - -Unless you have arranged to prevent @code{EINTR} failures, you should -check @code{errno} after each failing call to @code{write}, and if the -error was @code{EINTR}, you should simply repeat the call. -@xref{Interrupted Primitives}. The easy way to do this is with the -macro @code{TEMP_FAILURE_RETRY}, as follows: - -@smallexample -nbytes = TEMP_FAILURE_RETRY (write (desc, buffer, count)); -@end smallexample - -Please note that there is no function named @code{write64}. This is not -necessary since this function does not directly modify or handle the -possibly wide file offset. Since the kernel handles this state -internally the @code{write} function can be used for all cases. - -This function is a cancellation point in multi-threaded programs. This -is a problem if the thread allocates some resources (like memory, file -descriptors, semaphores or whatever) at the time @code{write} is -called. If the thread gets canceled these resources stay allocated -until the program ends. To avoid this, calls to @code{write} should be -protected using cancellation handlers. -@c ref pthread_cleanup_push / pthread_cleanup_pop - -The @code{write} function is the underlying primitive for all of the -functions that write to streams, such as @code{fputc}. -@end deftypefun - -@comment unistd.h -@comment Unix98 -@deftypefun ssize_t pwrite (int @var{filedes}, const void *@var{buffer}, size_t @var{size}, off_t @var{offset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c This is usually a safe syscall. The sysdeps/posix fallback emulation -@c is not MT-Safe because it uses lseek, write and lseek back, but is it -@c used anywhere? -The @code{pwrite} function is similar to the @code{write} function. The -first three arguments are identical, and the return values and error codes -also correspond. - -The difference is the fourth argument and its handling. The data block -is not written to the current position of the file descriptor -@code{filedes}. Instead the data is written to the file starting at -position @var{offset}. The position of the file descriptor itself is -not affected by the operation. The value is the same as before the call. - -However, on Linux, if a file is opened with @code{O_APPEND}, @code{pwrite} -appends data to the end of the file, regardless of the value of -@code{offset}. - -When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the -@code{pwrite} function is in fact @code{pwrite64} and the type -@code{off_t} has 64 bits, which makes it possible to handle files up to -@twoexp{63} bytes in length. - -The return value of @code{pwrite} describes the number of written bytes. -In the error case it returns @math{-1} like @code{write} does and the -error codes are also the same, with these additions: - -@table @code -@item EINVAL -The value given for @var{offset} is negative and therefore illegal. - -@item ESPIPE -The file descriptor @var{filedes} is associated with a pipe or a FIFO and -this device does not allow positioning of the file pointer. -@end table - -The function is an extension defined in the Unix Single Specification -version 2. -@end deftypefun - -@comment unistd.h -@comment Unix98 -@deftypefun ssize_t pwrite64 (int @var{filedes}, const void *@var{buffer}, size_t @var{size}, off64_t @var{offset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c This is usually a safe syscall. The sysdeps/posix fallback emulation -@c is not MT-Safe because it uses lseek64, write and lseek64 back, but -@c is it used anywhere? -This function is similar to the @code{pwrite} function. The difference -is that the @var{offset} parameter is of type @code{off64_t} instead of -@code{off_t} which makes it possible on 32 bit machines to address -files larger than @twoexp{31} bytes and up to @twoexp{63} bytes. The -file descriptor @code{filedes} must be opened using @code{open64} since -otherwise the large offsets possible with @code{off64_t} will lead to -errors with a descriptor in small file mode. - -When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a -32 bit machine this function is actually available under the name -@code{pwrite} and so transparently replaces the 32 bit interface. -@end deftypefun - -@comment sys/uio.h -@comment BSD -@deftypefun ssize_t preadv (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off_t @var{offset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c This is a syscall for Linux 3.2 for all architectures but microblaze -@c (which was added on 3.15). The sysdeps/posix fallback emulation -@c is also MT-Safe since it calls pread, and it is now a syscall on all -@c targets. - -This function is similar to the @code{readv} function, with the difference -it adds an extra @var{offset} parameter of type @code{off_t} similar to -@code{pread}. The data is written to the file starting at position -@var{offset}. The position of the file descriptor itself is not affected -by the operation. The value is the same as before the call. - -When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the -@code{preadv} function is in fact @code{preadv64} and the type -@code{off_t} has 64 bits, which makes it possible to handle files up to -@twoexp{63} bytes in length. - -The return value is a count of bytes (@emph{not} buffers) read, @math{0} -indicating end-of-file, or @math{-1} indicating an error. The possible -errors are the same as in @code{readv} and @code{pread}. -@end deftypefun - -@comment unistd.h -@comment BSD -@deftypefun ssize_t preadv64 (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off64_t @var{offset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c This is a syscall for Linux 3.2 for all architectures but microblaze -@c (which was added on 3.15). The sysdeps/posix fallback emulation -@c is also MT-Safe since it calls pread64, and it is now a syscall on all -@c targets. - -This function is similar to the @code{preadv} function with the difference -is that the @var{offset} parameter is of type @code{off64_t} instead of -@code{off_t}. It makes it possible on 32 bit machines to address -files larger than @twoexp{31} bytes and up to @twoexp{63} bytes. The -file descriptor @code{filedes} must be opened using @code{open64} since -otherwise the large offsets possible with @code{off64_t} will lead to -errors with a descriptor in small file mode. - -When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a -32 bit machine this function is actually available under the name -@code{preadv} and so transparently replaces the 32 bit interface. -@end deftypefun - -@comment sys/uio.h -@comment BSD -@deftypefun ssize_t pwritev (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off_t @var{offset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c This is a syscall for Linux 3.2 for all architectures but microblaze -@c (which was added on 3.15). The sysdeps/posix fallback emulation -@c is also MT-Safe since it calls pwrite, and it is now a syscall on all -@c targets. - -This function is similar to the @code{writev} function, with the difference -it adds an extra @var{offset} parameter of type @code{off_t} similar to -@code{pwrite}. The data is written to the file starting at position -@var{offset}. The position of the file descriptor itself is not affected -by the operation. The value is the same as before the call. - -However, on Linux, if a file is opened with @code{O_APPEND}, @code{pwrite} -appends data to the end of the file, regardless of the value of -@code{offset}. - -When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the -@code{pwritev} function is in fact @code{pwritev64} and the type -@code{off_t} has 64 bits, which makes it possible to handle files up to -@twoexp{63} bytes in length. - -The return value is a count of bytes (@emph{not} buffers) written, @math{0} -indicating end-of-file, or @math{-1} indicating an error. The possible -errors are the same as in @code{writev} and @code{pwrite}. -@end deftypefun - -@comment unistd.h -@comment BSD -@deftypefun ssize_t pwritev64 (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off64_t @var{offset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c This is a syscall for Linux 3.2 for all architectures but microblaze -@c (which was added on 3.15). The sysdeps/posix fallback emulation -@c is also MT-Safe since it calls pwrite64, and it is now a syscall on all -@c targets. - -This function is similar to the @code{pwritev} function with the difference -is that the @var{offset} parameter is of type @code{off64_t} instead of -@code{off_t}. It makes it possible on 32 bit machines to address -files larger than @twoexp{31} bytes and up to @twoexp{63} bytes. The -file descriptor @code{filedes} must be opened using @code{open64} since -otherwise the large offsets possible with @code{off64_t} will lead to -errors with a descriptor in small file mode. - -When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a -32 bit machine this function is actually available under the name -@code{pwritev} and so transparently replaces the 32 bit interface. -@end deftypefun - -@comment sys/uio.h -@comment GNU -@deftypefun ssize_t preadv2 (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off_t @var{offset}, int @var{flags}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c This is a syscall for Linux v4.6. The sysdeps/posix fallback emulation -@c is also MT-Safe since it calls preadv. - -This function is similar to the @code{preadv} function, with the difference -it adds an extra @var{flags} parameter of type @code{int}. The supported -@var{flags} are dependent of the underlying system. For Linux it supports: - -@vtable @code -@item RWF_HIPRI -High priority request. This adds a flag that tells the file system that -this is a high priority request for which it is worth to poll the hardware. -The flag is purely advisory and can be ignored if not supported. The -@var{fd} must be opened using @code{O_DIRECT}. - -@item RWF_DSYNC -Per-IO synchronization as if the file was opened with @code{O_DSYNC} flag. - -@item RWF_SYNC -Per-IO synchronization as if the file was opened with @code{O_SYNC} flag. -@end vtable - -When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the -@code{preadv2} function is in fact @code{preadv64v2} and the type -@code{off_t} has 64 bits, which makes it possible to handle files up to -@twoexp{63} bytes in length. - -The return value is a count of bytes (@emph{not} buffers) read, @math{0} -indicating end-of-file, or @math{-1} indicating an error. The possible -errors are the same as in @code{preadv} with the addition of: - -@table @code - -@item EOPNOTSUPP - -@c The default sysdeps/posix code will return it for any flags value -@c different than 0. -An unsupported @var{flags} was used. - -@end table - -@end deftypefun - -@comment unistd.h -@comment GNU -@deftypefun ssize_t preadv64v2 (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off64_t @var{offset}, int @var{flags}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c This is a syscall for Linux v4.6. The sysdeps/posix fallback emulation -@c is also MT-Safe since it calls preadv. - -This function is similar to the @code{preadv2} function with the difference -is that the @var{offset} parameter is of type @code{off64_t} instead of -@code{off_t}. It makes it possible on 32 bit machines to address -files larger than @twoexp{31} bytes and up to @twoexp{63} bytes. The -file descriptor @code{filedes} must be opened using @code{open64} since -otherwise the large offsets possible with @code{off64_t} will lead to -errors with a descriptor in small file mode. - -When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a -32 bit machine this function is actually available under the name -@code{preadv2} and so transparently replaces the 32 bit interface. -@end deftypefun - - -@comment sys/uio.h -@comment GNU -@deftypefun ssize_t pwritev2 (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off_t @var{offset}, int @var{flags}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c This is a syscall for Linux v4.6. The sysdeps/posix fallback emulation -@c is also MT-Safe since it calls pwritev. - -This function is similar to the @code{pwritev} function, with the difference -it adds an extra @var{flags} parameter of type @code{int}. The supported -@var{flags} are dependent of the underlying system and for Linux it supports -the same ones as for @code{preadv2}. - -When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the -@code{pwritev2} function is in fact @code{pwritev64v2} and the type -@code{off_t} has 64 bits, which makes it possible to handle files up to -@twoexp{63} bytes in length. - -The return value is a count of bytes (@emph{not} buffers) write, @math{0} -indicating end-of-file, or @math{-1} indicating an error. The possible -errors are the same as in @code{preadv2}. -@end deftypefun - -@comment unistd.h -@comment GNU -@deftypefun ssize_t pwritev64v2 (int @var{fd}, const struct iovec *@var{iov}, int @var{iovcnt}, off64_t @var{offset}, int @var{flags}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c This is a syscall for Linux v4.6. The sysdeps/posix fallback emulation -@c is also MT-Safe since it calls pwritev. - -This function is similar to the @code{pwritev2} function with the difference -is that the @var{offset} parameter is of type @code{off64_t} instead of -@code{off_t}. It makes it possible on 32 bit machines to address -files larger than @twoexp{31} bytes and up to @twoexp{63} bytes. The -file descriptor @code{filedes} must be opened using @code{open64} since -otherwise the large offsets possible with @code{off64_t} will lead to -errors with a descriptor in small file mode. - -When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a -32 bit machine this function is actually available under the name -@code{pwritev2} and so transparently replaces the 32 bit interface. -@end deftypefun - - -@node File Position Primitive -@section Setting the File Position of a Descriptor - -Just as you can set the file position of a stream with @code{fseek}, you -can set the file position of a descriptor with @code{lseek}. This -specifies the position in the file for the next @code{read} or -@code{write} operation. @xref{File Positioning}, for more information -on the file position and what it means. - -To read the current file position value from a descriptor, use -@code{lseek (@var{desc}, 0, SEEK_CUR)}. - -@cindex file positioning on a file descriptor -@cindex positioning a file descriptor -@cindex seeking on a file descriptor -@comment unistd.h -@comment POSIX.1 -@deftypefun off_t lseek (int @var{filedes}, off_t @var{offset}, int @var{whence}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{lseek} function is used to change the file position of the -file with descriptor @var{filedes}. - -The @var{whence} argument specifies how the @var{offset} should be -interpreted, in the same way as for the @code{fseek} function, and it must -be one of the symbolic constants @code{SEEK_SET}, @code{SEEK_CUR}, or -@code{SEEK_END}. - -@vtable @code -@item SEEK_SET -Specifies that @var{offset} is a count of characters from the beginning -of the file. - -@item SEEK_CUR -Specifies that @var{offset} is a count of characters from the current -file position. This count may be positive or negative. - -@item SEEK_END -Specifies that @var{offset} is a count of characters from the end of -the file. A negative count specifies a position within the current -extent of the file; a positive count specifies a position past the -current end. If you set the position past the current end, and -actually write data, you will extend the file with zeros up to that -position. -@end vtable - -The return value from @code{lseek} is normally the resulting file -position, measured in bytes from the beginning of the file. -You can use this feature together with @code{SEEK_CUR} to read the -current file position. - -If you want to append to the file, setting the file position to the -current end of file with @code{SEEK_END} is not sufficient. Another -process may write more data after you seek but before you write, -extending the file so the position you write onto clobbers their data. -Instead, use the @code{O_APPEND} operating mode; @pxref{Operating Modes}. - -You can set the file position past the current end of the file. This -does not by itself make the file longer; @code{lseek} never changes the -file. But subsequent output at that position will extend the file. -Characters between the previous end of file and the new position are -filled with zeros. Extending the file in this way can create a -``hole'': the blocks of zeros are not actually allocated on disk, so the -file takes up less space than it appears to; it is then called a -``sparse file''. -@cindex sparse files -@cindex holes in files - -If the file position cannot be changed, or the operation is in some way -invalid, @code{lseek} returns a value of @math{-1}. The following -@code{errno} error conditions are defined for this function: - -@table @code -@item EBADF -The @var{filedes} is not a valid file descriptor. - -@item EINVAL -The @var{whence} argument value is not valid, or the resulting -file offset is not valid. A file offset is invalid. - -@item ESPIPE -The @var{filedes} corresponds to an object that cannot be positioned, -such as a pipe, FIFO or terminal device. (POSIX.1 specifies this error -only for pipes and FIFOs, but on @gnusystems{}, you always get -@code{ESPIPE} if the object is not seekable.) -@end table - -When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the -@code{lseek} function is in fact @code{lseek64} and the type -@code{off_t} has 64 bits which makes it possible to handle files up to -@twoexp{63} bytes in length. - -This function is a cancellation point in multi-threaded programs. This -is a problem if the thread allocates some resources (like memory, file -descriptors, semaphores or whatever) at the time @code{lseek} is -called. If the thread gets canceled these resources stay allocated -until the program ends. To avoid this calls to @code{lseek} should be -protected using cancellation handlers. -@c ref pthread_cleanup_push / pthread_cleanup_pop - -The @code{lseek} function is the underlying primitive for the -@code{fseek}, @code{fseeko}, @code{ftell}, @code{ftello} and -@code{rewind} functions, which operate on streams instead of file -descriptors. -@end deftypefun - -@comment unistd.h -@comment Unix98 -@deftypefun off64_t lseek64 (int @var{filedes}, off64_t @var{offset}, int @var{whence}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is similar to the @code{lseek} function. The difference -is that the @var{offset} parameter is of type @code{off64_t} instead of -@code{off_t} which makes it possible on 32 bit machines to address -files larger than @twoexp{31} bytes and up to @twoexp{63} bytes. The -file descriptor @code{filedes} must be opened using @code{open64} since -otherwise the large offsets possible with @code{off64_t} will lead to -errors with a descriptor in small file mode. - -When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a -32 bits machine this function is actually available under the name -@code{lseek} and so transparently replaces the 32 bit interface. -@end deftypefun - -You can have multiple descriptors for the same file if you open the file -more than once, or if you duplicate a descriptor with @code{dup}. -Descriptors that come from separate calls to @code{open} have independent -file positions; using @code{lseek} on one descriptor has no effect on the -other. For example, - -@smallexample -@group -@{ - int d1, d2; - char buf[4]; - d1 = open ("foo", O_RDONLY); - d2 = open ("foo", O_RDONLY); - lseek (d1, 1024, SEEK_SET); - read (d2, buf, 4); -@} -@end group -@end smallexample - -@noindent -will read the first four characters of the file @file{foo}. (The -error-checking code necessary for a real program has been omitted here -for brevity.) - -By contrast, descriptors made by duplication share a common file -position with the original descriptor that was duplicated. Anything -which alters the file position of one of the duplicates, including -reading or writing data, affects all of them alike. Thus, for example, - -@smallexample -@{ - int d1, d2, d3; - char buf1[4], buf2[4]; - d1 = open ("foo", O_RDONLY); - d2 = dup (d1); - d3 = dup (d2); - lseek (d3, 1024, SEEK_SET); - read (d1, buf1, 4); - read (d2, buf2, 4); -@} -@end smallexample - -@noindent -will read four characters starting with the 1024'th character of -@file{foo}, and then four more characters starting with the 1028'th -character. - -@comment sys/types.h -@comment POSIX.1 -@deftp {Data Type} off_t -This is a signed integer type used to represent file sizes. In -@theglibc{}, this type is no narrower than @code{int}. - -If the source is compiled with @code{_FILE_OFFSET_BITS == 64} this type -is transparently replaced by @code{off64_t}. -@end deftp - -@comment sys/types.h -@comment Unix98 -@deftp {Data Type} off64_t -This type is used similar to @code{off_t}. The difference is that even -on 32 bit machines, where the @code{off_t} type would have 32 bits, -@code{off64_t} has 64 bits and so is able to address files up to -@twoexp{63} bytes in length. - -When compiling with @code{_FILE_OFFSET_BITS == 64} this type is -available under the name @code{off_t}. -@end deftp - -These aliases for the @samp{SEEK_@dots{}} constants exist for the sake -of compatibility with older BSD systems. They are defined in two -different header files: @file{fcntl.h} and @file{sys/file.h}. - -@vtable @code -@item L_SET -An alias for @code{SEEK_SET}. - -@item L_INCR -An alias for @code{SEEK_CUR}. - -@item L_XTND -An alias for @code{SEEK_END}. -@end vtable - -@node Descriptors and Streams -@section Descriptors and Streams -@cindex streams, and file descriptors -@cindex converting file descriptor to stream -@cindex extracting file descriptor from stream - -Given an open file descriptor, you can create a stream for it with the -@code{fdopen} function. You can get the underlying file descriptor for -an existing stream with the @code{fileno} function. These functions are -declared in the header file @file{stdio.h}. -@pindex stdio.h - -@comment stdio.h -@comment POSIX.1 -@deftypefun {FILE *} fdopen (int @var{filedes}, const char *@var{opentype}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{} @asulock{}}@acunsafe{@acsmem{} @aculock{}}} -The @code{fdopen} function returns a new stream for the file descriptor -@var{filedes}. - -The @var{opentype} argument is interpreted in the same way as for the -@code{fopen} function (@pxref{Opening Streams}), except that -the @samp{b} option is not permitted; this is because @gnusystems{} make no -distinction between text and binary files. Also, @code{"w"} and -@code{"w+"} do not cause truncation of the file; these have an effect only -when opening a file, and in this case the file has already been opened. -You must make sure that the @var{opentype} argument matches the actual -mode of the open file descriptor. - -The return value is the new stream. If the stream cannot be created -(for example, if the modes for the file indicated by the file descriptor -do not permit the access specified by the @var{opentype} argument), a -null pointer is returned instead. - -In some other systems, @code{fdopen} may fail to detect that the modes -for file descriptors do not permit the access specified by -@code{opentype}. @Theglibc{} always checks for this. -@end deftypefun - -For an example showing the use of the @code{fdopen} function, -see @ref{Creating a Pipe}. - -@comment stdio.h -@comment POSIX.1 -@deftypefun int fileno (FILE *@var{stream}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function returns the file descriptor associated with the stream -@var{stream}. If an error is detected (for example, if the @var{stream} -is not valid) or if @var{stream} does not do I/O to a file, -@code{fileno} returns @math{-1}. -@end deftypefun - -@comment stdio.h -@comment GNU -@deftypefun int fileno_unlocked (FILE *@var{stream}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{fileno_unlocked} function is equivalent to the @code{fileno} -function except that it does not implicitly lock the stream if the state -is @code{FSETLOCKING_INTERNAL}. - -This function is a GNU extension. -@end deftypefun - -@cindex standard file descriptors -@cindex file descriptors, standard -There are also symbolic constants defined in @file{unistd.h} for the -file descriptors belonging to the standard streams @code{stdin}, -@code{stdout}, and @code{stderr}; see @ref{Standard Streams}. -@pindex unistd.h - -@vtable @code -@comment unistd.h -@comment POSIX.1 -@item STDIN_FILENO -This macro has value @code{0}, which is the file descriptor for -standard input. -@cindex standard input file descriptor - -@comment unistd.h -@comment POSIX.1 -@item STDOUT_FILENO -This macro has value @code{1}, which is the file descriptor for -standard output. -@cindex standard output file descriptor - -@comment unistd.h -@comment POSIX.1 -@item STDERR_FILENO -This macro has value @code{2}, which is the file descriptor for -standard error output. -@end vtable -@cindex standard error file descriptor - -@node Stream/Descriptor Precautions -@section Dangers of Mixing Streams and Descriptors -@cindex channels -@cindex streams and descriptors -@cindex descriptors and streams -@cindex mixing descriptors and streams - -You can have multiple file descriptors and streams (let's call both -streams and descriptors ``channels'' for short) connected to the same -file, but you must take care to avoid confusion between channels. There -are two cases to consider: @dfn{linked} channels that share a single -file position value, and @dfn{independent} channels that have their own -file positions. - -It's best to use just one channel in your program for actual data -transfer to any given file, except when all the access is for input. -For example, if you open a pipe (something you can only do at the file -descriptor level), either do all I/O with the descriptor, or construct a -stream from the descriptor with @code{fdopen} and then do all I/O with -the stream. - -@menu -* Linked Channels:: Dealing with channels sharing a file position. -* Independent Channels:: Dealing with separately opened, unlinked channels. -* Cleaning Streams:: Cleaning a stream makes it safe to use - another channel. -@end menu - -@node Linked Channels -@subsection Linked Channels -@cindex linked channels - -Channels that come from a single opening share the same file position; -we call them @dfn{linked} channels. Linked channels result when you -make a stream from a descriptor using @code{fdopen}, when you get a -descriptor from a stream with @code{fileno}, when you copy a descriptor -with @code{dup} or @code{dup2}, and when descriptors are inherited -during @code{fork}. For files that don't support random access, such as -terminals and pipes, @emph{all} channels are effectively linked. On -random-access files, all append-type output streams are effectively -linked to each other. - -@cindex cleaning up a stream -If you have been using a stream for I/O (or have just opened the stream), -and you want to do I/O using -another channel (either a stream or a descriptor) that is linked to it, -you must first @dfn{clean up} the stream that you have been using. -@xref{Cleaning Streams}. - -Terminating a process, or executing a new program in the process, -destroys all the streams in the process. If descriptors linked to these -streams persist in other processes, their file positions become -undefined as a result. To prevent this, you must clean up the streams -before destroying them. - -@node Independent Channels -@subsection Independent Channels -@cindex independent channels - -When you open channels (streams or descriptors) separately on a seekable -file, each channel has its own file position. These are called -@dfn{independent channels}. - -The system handles each channel independently. Most of the time, this -is quite predictable and natural (especially for input): each channel -can read or write sequentially at its own place in the file. However, -if some of the channels are streams, you must take these precautions: - -@itemize @bullet -@item -You should clean an output stream after use, before doing anything else -that might read or write from the same part of the file. - -@item -You should clean an input stream before reading data that may have been -modified using an independent channel. Otherwise, you might read -obsolete data that had been in the stream's buffer. -@end itemize - -If you do output to one channel at the end of the file, this will -certainly leave the other independent channels positioned somewhere -before the new end. You cannot reliably set their file positions to the -new end of file before writing, because the file can always be extended -by another process between when you set the file position and when you -write the data. Instead, use an append-type descriptor or stream; they -always output at the current end of the file. In order to make the -end-of-file position accurate, you must clean the output channel you -were using, if it is a stream. - -It's impossible for two channels to have separate file pointers for a -file that doesn't support random access. Thus, channels for reading or -writing such files are always linked, never independent. Append-type -channels are also always linked. For these channels, follow the rules -for linked channels; see @ref{Linked Channels}. - -@node Cleaning Streams -@subsection Cleaning Streams - -You can use @code{fflush} to clean a stream in most -cases. - -You can skip the @code{fflush} if you know the stream -is already clean. A stream is clean whenever its buffer is empty. For -example, an unbuffered stream is always clean. An input stream that is -at end-of-file is clean. A line-buffered stream is clean when the last -character output was a newline. However, a just-opened input stream -might not be clean, as its input buffer might not be empty. - -There is one case in which cleaning a stream is impossible on most -systems. This is when the stream is doing input from a file that is not -random-access. Such streams typically read ahead, and when the file is -not random access, there is no way to give back the excess data already -read. When an input stream reads from a random-access file, -@code{fflush} does clean the stream, but leaves the file pointer at an -unpredictable place; you must set the file pointer before doing any -further I/O. - -Closing an output-only stream also does @code{fflush}, so this is a -valid way of cleaning an output stream. - -You need not clean a stream before using its descriptor for control -operations such as setting terminal modes; these operations don't affect -the file position and are not affected by it. You can use any -descriptor for these operations, and all channels are affected -simultaneously. However, text already ``output'' to a stream but still -buffered by the stream will be subject to the new terminal modes when -subsequently flushed. To make sure ``past'' output is covered by the -terminal settings that were in effect at the time, flush the output -streams for that terminal before setting the modes. @xref{Terminal -Modes}. - -@node Scatter-Gather -@section Fast Scatter-Gather I/O -@cindex scatter-gather - -Some applications may need to read or write data to multiple buffers, -which are separated in memory. Although this can be done easily enough -with multiple calls to @code{read} and @code{write}, it is inefficient -because there is overhead associated with each kernel call. - -Instead, many platforms provide special high-speed primitives to perform -these @dfn{scatter-gather} operations in a single kernel call. @Theglibc{} -will provide an emulation on any system that lacks these -primitives, so they are not a portability threat. They are defined in -@code{sys/uio.h}. - -These functions are controlled with arrays of @code{iovec} structures, -which describe the location and size of each buffer. - -@comment sys/uio.h -@comment BSD -@deftp {Data Type} {struct iovec} - -The @code{iovec} structure describes a buffer. It contains two fields: - -@table @code - -@item void *iov_base -Contains the address of a buffer. - -@item size_t iov_len -Contains the length of the buffer. - -@end table -@end deftp - -@comment sys/uio.h -@comment BSD -@deftypefun ssize_t readv (int @var{filedes}, const struct iovec *@var{vector}, int @var{count}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -@c The fallback sysdeps/posix implementation, used even on GNU/Linux -@c with old kernels that lack a full readv/writev implementation, may -@c malloc the buffer into which data is read, if the total read size is -@c too large for alloca. - -The @code{readv} function reads data from @var{filedes} and scatters it -into the buffers described in @var{vector}, which is taken to be -@var{count} structures long. As each buffer is filled, data is sent to the -next. - -Note that @code{readv} is not guaranteed to fill all the buffers. -It may stop at any point, for the same reasons @code{read} would. - -The return value is a count of bytes (@emph{not} buffers) read, @math{0} -indicating end-of-file, or @math{-1} indicating an error. The possible -errors are the same as in @code{read}. - -@end deftypefun - -@comment sys/uio.h -@comment BSD -@deftypefun ssize_t writev (int @var{filedes}, const struct iovec *@var{vector}, int @var{count}) -@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} -@c The fallback sysdeps/posix implementation, used even on GNU/Linux -@c with old kernels that lack a full readv/writev implementation, may -@c malloc the buffer from which data is written, if the total write size -@c is too large for alloca. - -The @code{writev} function gathers data from the buffers described in -@var{vector}, which is taken to be @var{count} structures long, and writes -them to @code{filedes}. As each buffer is written, it moves on to the -next. - -Like @code{readv}, @code{writev} may stop midstream under the same -conditions @code{write} would. - -The return value is a count of bytes written, or @math{-1} indicating an -error. The possible errors are the same as in @code{write}. - -@end deftypefun - -@c Note - I haven't read this anywhere. I surmised it from my knowledge -@c of computer science. Thus, there could be subtleties I'm missing. - -Note that if the buffers are small (under about 1kB), high-level streams -may be easier to use than these functions. However, @code{readv} and -@code{writev} are more efficient when the individual buffers themselves -(as opposed to the total output), are large. In that case, a high-level -stream would not be able to cache the data efficiently. - -@node Memory-mapped I/O -@section Memory-mapped I/O - -On modern operating systems, it is possible to @dfn{mmap} (pronounced -``em-map'') a file to a region of memory. When this is done, the file can -be accessed just like an array in the program. - -This is more efficient than @code{read} or @code{write}, as only the regions -of the file that a program actually accesses are loaded. Accesses to -not-yet-loaded parts of the mmapped region are handled in the same way as -swapped out pages. - -Since mmapped pages can be stored back to their file when physical -memory is low, it is possible to mmap files orders of magnitude larger -than both the physical memory @emph{and} swap space. The only limit is -address space. The theoretical limit is 4GB on a 32-bit machine - -however, the actual limit will be smaller since some areas will be -reserved for other purposes. If the LFS interface is used the file size -on 32-bit systems is not limited to 2GB (offsets are signed which -reduces the addressable area of 4GB by half); the full 64-bit are -available. - -Memory mapping only works on entire pages of memory. Thus, addresses -for mapping must be page-aligned, and length values will be rounded up. -To determine the size of a page the machine uses one should use - -@vindex _SC_PAGESIZE -@smallexample -size_t page_size = (size_t) sysconf (_SC_PAGESIZE); -@end smallexample - -@noindent -These functions are declared in @file{sys/mman.h}. - -@comment sys/mman.h -@comment POSIX -@deftypefun {void *} mmap (void *@var{address}, size_t @var{length}, int @var{protect}, int @var{flags}, int @var{filedes}, off_t @var{offset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} - -The @code{mmap} function creates a new mapping, connected to bytes -(@var{offset}) to (@var{offset} + @var{length} - 1) in the file open on -@var{filedes}. A new reference for the file specified by @var{filedes} -is created, which is not removed by closing the file. - -@var{address} gives a preferred starting address for the mapping. -@code{NULL} expresses no preference. Any previous mapping at that -address is automatically removed. The address you give may still be -changed, unless you use the @code{MAP_FIXED} flag. - -@vindex PROT_READ -@vindex PROT_WRITE -@vindex PROT_EXEC -@var{protect} contains flags that control what kind of access is -permitted. They include @code{PROT_READ}, @code{PROT_WRITE}, and -@code{PROT_EXEC}, which permit reading, writing, and execution, -respectively. Inappropriate access will cause a segfault (@pxref{Program -Error Signals}). - -Note that most hardware designs cannot support write permission without -read permission, and many do not distinguish read and execute permission. -Thus, you may receive wider permissions than you ask for, and mappings of -write-only files may be denied even if you do not use @code{PROT_READ}. - -@var{flags} contains flags that control the nature of the map. -One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified. - -They include: - -@vtable @code -@item MAP_PRIVATE -This specifies that writes to the region should never be written back -to the attached file. Instead, a copy is made for the process, and the -region will be swapped normally if memory runs low. No other process will -see the changes. - -Since private mappings effectively revert to ordinary memory -when written to, you must have enough virtual memory for a copy of -the entire mmapped region if you use this mode with @code{PROT_WRITE}. - -@item MAP_SHARED -This specifies that writes to the region will be written back to the -file. Changes made will be shared immediately with other processes -mmaping the same file. - -Note that actual writing may take place at any time. You need to use -@code{msync}, described below, if it is important that other processes -using conventional I/O get a consistent view of the file. - -@item MAP_FIXED -This forces the system to use the exact mapping address specified in -@var{address} and fail if it can't. - -@c One of these is official - the other is obviously an obsolete synonym -@c Which is which? -@item MAP_ANONYMOUS -@itemx MAP_ANON -This flag tells the system to create an anonymous mapping, not connected -to a file. @var{filedes} and @var{offset} are ignored, and the region is -initialized with zeros. - -Anonymous maps are used as the basic primitive to extend the heap on some -systems. They are also useful to share data between multiple tasks -without creating a file. - -On some systems using private anonymous mmaps is more efficient than using -@code{malloc} for large blocks. This is not an issue with @theglibc{}, -as the included @code{malloc} automatically uses @code{mmap} where appropriate. - -@c Linux has some other MAP_ options, which I have not discussed here. -@c MAP_DENYWRITE, MAP_EXECUTABLE and MAP_GROWSDOWN don't seem applicable to -@c user programs (and I don't understand the last two). MAP_LOCKED does -@c not appear to be implemented. - -@end vtable - -@code{mmap} returns the address of the new mapping, or -@code{MAP_FAILED} for an error. - -Possible errors include: - -@table @code - -@item EINVAL - -Either @var{address} was unusable, or inconsistent @var{flags} were -given. - -@item EACCES - -@var{filedes} was not open for the type of access specified in @var{protect}. - -@item ENOMEM - -Either there is not enough memory for the operation, or the process is -out of address space. - -@item ENODEV - -This file is of a type that doesn't support mapping. - -@item ENOEXEC - -The file is on a filesystem that doesn't support mapping. - -@c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock. -@c However mandatory locks are not discussed in this manual. -@c -@c Similarly, ETXTBSY will occur if the MAP_DENYWRITE flag (not documented -@c here) is used and the file is already open for writing. - -@end table - -@end deftypefun - -@comment sys/mman.h -@comment LFS -@deftypefun {void *} mmap64 (void *@var{address}, size_t @var{length}, int @var{protect}, int @var{flags}, int @var{filedes}, off64_t @var{offset}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -@c The page_shift auto detection when MMAP2_PAGE_SHIFT is -1 (it never -@c is) would be thread-unsafe. -The @code{mmap64} function is equivalent to the @code{mmap} function but -the @var{offset} parameter is of type @code{off64_t}. On 32-bit systems -this allows the file associated with the @var{filedes} descriptor to be -larger than 2GB. @var{filedes} must be a descriptor returned from a -call to @code{open64} or @code{fopen64} and @code{freopen64} where the -descriptor is retrieved with @code{fileno}. - -When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this -function is actually available under the name @code{mmap}. I.e., the -new, extended API using 64 bit file sizes and offsets transparently -replaces the old API. -@end deftypefun - -@comment sys/mman.h -@comment POSIX -@deftypefun int munmap (void *@var{addr}, size_t @var{length}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} - -@code{munmap} removes any memory maps from (@var{addr}) to (@var{addr} + -@var{length}). @var{length} should be the length of the mapping. - -It is safe to unmap multiple mappings in one command, or include unmapped -space in the range. It is also possible to unmap only part of an existing -mapping. However, only entire pages can be removed. If @var{length} is not -an even number of pages, it will be rounded up. - -It returns @math{0} for success and @math{-1} for an error. - -One error is possible: - -@table @code - -@item EINVAL -The memory range given was outside the user mmap range or wasn't page -aligned. - -@end table - -@end deftypefun - -@comment sys/mman.h -@comment POSIX -@deftypefun int msync (void *@var{address}, size_t @var{length}, int @var{flags}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} - -When using shared mappings, the kernel can write the file at any time -before the mapping is removed. To be certain data has actually been -written to the file and will be accessible to non-memory-mapped I/O, it -is necessary to use this function. - -It operates on the region @var{address} to (@var{address} + @var{length}). -It may be used on part of a mapping or multiple mappings, however the -region given should not contain any unmapped space. - -@var{flags} can contain some options: - -@vtable @code - -@item MS_SYNC - -This flag makes sure the data is actually written @emph{to disk}. -Normally @code{msync} only makes sure that accesses to a file with -conventional I/O reflect the recent changes. - -@item MS_ASYNC - -This tells @code{msync} to begin the synchronization, but not to wait for -it to complete. - -@c Linux also has MS_INVALIDATE, which I don't understand. - -@end vtable - -@code{msync} returns @math{0} for success and @math{-1} for -error. Errors include: - -@table @code - -@item EINVAL -An invalid region was given, or the @var{flags} were invalid. - -@item EFAULT -There is no existing mapping in at least part of the given region. - -@end table - -@end deftypefun - -@comment sys/mman.h -@comment GNU -@deftypefun {void *} mremap (void *@var{address}, size_t @var{length}, size_t @var{new_length}, int @var{flag}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} - -This function can be used to change the size of an existing memory -area. @var{address} and @var{length} must cover a region entirely mapped -in the same @code{mmap} statement. A new mapping with the same -characteristics will be returned with the length @var{new_length}. - -One option is possible, @code{MREMAP_MAYMOVE}. If it is given in -@var{flags}, the system may remove the existing mapping and create a new -one of the desired length in another location. - -The address of the resulting mapping is returned, or @math{-1}. Possible -error codes include: - -@table @code - -@item EFAULT -There is no existing mapping in at least part of the original region, or -the region covers two or more distinct mappings. - -@item EINVAL -The address given is misaligned or inappropriate. - -@item EAGAIN -The region has pages locked, and if extended it would exceed the -process's resource limit for locked pages. @xref{Limits on Resources}. - -@item ENOMEM -The region is private writable, and insufficient virtual memory is -available to extend it. Also, this error will occur if -@code{MREMAP_MAYMOVE} is not given and the extension would collide with -another mapped region. - -@end table -@end deftypefun - -This function is only available on a few systems. Except for performing -optional optimizations one should not rely on this function. - -Not all file descriptors may be mapped. Sockets, pipes, and most devices -only allow sequential access and do not fit into the mapping abstraction. -In addition, some regular files may not be mmapable, and older kernels may -not support mapping at all. Thus, programs using @code{mmap} should -have a fallback method to use should it fail. @xref{Mmap,,,standards,GNU -Coding Standards}. - -@comment sys/mman.h -@comment POSIX -@deftypefun int madvise (void *@var{addr}, size_t @var{length}, int @var{advice}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} - -This function can be used to provide the system with @var{advice} about -the intended usage patterns of the memory region starting at @var{addr} -and extending @var{length} bytes. - -The valid BSD values for @var{advice} are: - -@vtable @code - -@item MADV_NORMAL -The region should receive no further special treatment. - -@item MADV_RANDOM -The region will be accessed via random page references. The kernel -should page-in the minimal number of pages for each page fault. - -@item MADV_SEQUENTIAL -The region will be accessed via sequential page references. This -may cause the kernel to aggressively read-ahead, expecting further -sequential references after any page fault within this region. - -@item MADV_WILLNEED -The region will be needed. The pages within this region may -be pre-faulted in by the kernel. - -@item MADV_DONTNEED -The region is no longer needed. The kernel may free these pages, -causing any changes to the pages to be lost, as well as swapped -out pages to be discarded. - -@end vtable - -The POSIX names are slightly different, but with the same meanings: - -@vtable @code - -@item POSIX_MADV_NORMAL -This corresponds with BSD's @code{MADV_NORMAL}. - -@item POSIX_MADV_RANDOM -This corresponds with BSD's @code{MADV_RANDOM}. - -@item POSIX_MADV_SEQUENTIAL -This corresponds with BSD's @code{MADV_SEQUENTIAL}. - -@item POSIX_MADV_WILLNEED -This corresponds with BSD's @code{MADV_WILLNEED}. - -@item POSIX_MADV_DONTNEED -This corresponds with BSD's @code{MADV_DONTNEED}. - -@end vtable - -@code{madvise} returns @math{0} for success and @math{-1} for -error. Errors include: -@table @code - -@item EINVAL -An invalid region was given, or the @var{advice} was invalid. - -@item EFAULT -There is no existing mapping in at least part of the given region. - -@end table -@end deftypefun - -@comment sys/mman.h -@comment POSIX -@deftypefn Function int shm_open (const char *@var{name}, int @var{oflag}, mode_t @var{mode}) -@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asuinit{} @ascuheap{} @asulock{}}@acunsafe{@aculock{} @acsmem{} @acsfd{}}} -@c shm_open @mtslocale @asuinit @ascuheap @asulock @aculock @acsmem @acsfd -@c libc_once(where_is_shmfs) @mtslocale @asuinit @ascuheap @asulock @aculock @acsmem @acsfd -@c where_is_shmfs @mtslocale @ascuheap @asulock @aculock @acsmem @acsfd -@c statfs dup ok -@c setmntent dup @ascuheap @asulock @acsmem @acsfd @aculock -@c getmntent_r dup @mtslocale @ascuheap @aculock @acsmem [no @asucorrupt @acucorrupt; exclusive stream] -@c strcmp dup ok -@c strlen dup ok -@c malloc dup @ascuheap @acsmem -@c mempcpy dup ok -@c endmntent dup @ascuheap @asulock @aculock @acsmem @acsfd -@c strlen dup ok -@c strchr dup ok -@c mempcpy dup ok -@c open dup @acsfd -@c fcntl dup ok -@c close dup @acsfd - -This function returns a file descriptor that can be used to allocate shared -memory via mmap. Unrelated processes can use same @var{name} to create or -open existing shared memory objects. - -A @var{name} argument specifies the shared memory object to be opened. -In @theglibc{} it must be a string smaller than @code{NAME_MAX} bytes starting -with an optional slash but containing no other slashes. - -The semantics of @var{oflag} and @var{mode} arguments is same as in @code{open}. - -@code{shm_open} returns the file descriptor on success or @math{-1} on error. -On failure @code{errno} is set. -@end deftypefn - -@deftypefn Function int shm_unlink (const char *@var{name}) -@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asuinit{} @ascuheap{} @asulock{}}@acunsafe{@aculock{} @acsmem{} @acsfd{}}} -@c shm_unlink @mtslocale @asuinit @ascuheap @asulock @aculock @acsmem @acsfd -@c libc_once(where_is_shmfs) dup @mtslocale @asuinit @ascuheap @asulock @aculock @acsmem @acsfd -@c strlen dup ok -@c strchr dup ok -@c mempcpy dup ok -@c unlink dup ok - -This function is the inverse of @code{shm_open} and removes the object with -the given @var{name} previously created by @code{shm_open}. - -@code{shm_unlink} returns @math{0} on success or @math{-1} on error. -On failure @code{errno} is set. -@end deftypefn - -@node Waiting for I/O -@section Waiting for Input or Output -@cindex waiting for input or output -@cindex multiplexing input -@cindex input from multiple files - -Sometimes a program needs to accept input on multiple input channels -whenever input arrives. For example, some workstations may have devices -such as a digitizing tablet, function button box, or dial box that are -connected via normal asynchronous serial interfaces; good user interface -style requires responding immediately to input on any device. Another -example is a program that acts as a server to several other processes -via pipes or sockets. - -You cannot normally use @code{read} for this purpose, because this -blocks the program until input is available on one particular file -descriptor; input on other channels won't wake it up. You could set -nonblocking mode and poll each file descriptor in turn, but this is very -inefficient. - -A better solution is to use the @code{select} function. This blocks the -program until input or output is ready on a specified set of file -descriptors, or until a timer expires, whichever comes first. This -facility is declared in the header file @file{sys/types.h}. -@pindex sys/types.h - -In the case of a server socket (@pxref{Listening}), we say that -``input'' is available when there are pending connections that could be -accepted (@pxref{Accepting Connections}). @code{accept} for server -sockets blocks and interacts with @code{select} just as @code{read} does -for normal input. - -@cindex file descriptor sets, for @code{select} -The file descriptor sets for the @code{select} function are specified -as @code{fd_set} objects. Here is the description of the data type -and some macros for manipulating these objects. - -@comment sys/types.h -@comment BSD -@deftp {Data Type} fd_set -The @code{fd_set} data type represents file descriptor sets for the -@code{select} function. It is actually a bit array. -@end deftp - -@comment sys/types.h -@comment BSD -@deftypevr Macro int FD_SETSIZE -The value of this macro is the maximum number of file descriptors that a -@code{fd_set} object can hold information about. On systems with a -fixed maximum number, @code{FD_SETSIZE} is at least that number. On -some systems, including GNU, there is no absolute limit on the number of -descriptors open, but this macro still has a constant value which -controls the number of bits in an @code{fd_set}; if you get a file -descriptor with a value as high as @code{FD_SETSIZE}, you cannot put -that descriptor into an @code{fd_set}. -@end deftypevr - -@comment sys/types.h -@comment BSD -@deftypefn Macro void FD_ZERO (fd_set *@var{set}) -@safety{@prelim{}@mtsafe{@mtsrace{:set}}@assafe{}@acsafe{}} -This macro initializes the file descriptor set @var{set} to be the -empty set. -@end deftypefn - -@comment sys/types.h -@comment BSD -@deftypefn Macro void FD_SET (int @var{filedes}, fd_set *@var{set}) -@safety{@prelim{}@mtsafe{@mtsrace{:set}}@assafe{}@acsafe{}} -@c Setting a bit isn't necessarily atomic, so there's a potential race -@c here if set is not used exclusively. -This macro adds @var{filedes} to the file descriptor set @var{set}. - -The @var{filedes} parameter must not have side effects since it is -evaluated more than once. -@end deftypefn - -@comment sys/types.h -@comment BSD -@deftypefn Macro void FD_CLR (int @var{filedes}, fd_set *@var{set}) -@safety{@prelim{}@mtsafe{@mtsrace{:set}}@assafe{}@acsafe{}} -@c Setting a bit isn't necessarily atomic, so there's a potential race -@c here if set is not used exclusively. -This macro removes @var{filedes} from the file descriptor set @var{set}. - -The @var{filedes} parameter must not have side effects since it is -evaluated more than once. -@end deftypefn - -@comment sys/types.h -@comment BSD -@deftypefn Macro int FD_ISSET (int @var{filedes}, const fd_set *@var{set}) -@safety{@prelim{}@mtsafe{@mtsrace{:set}}@assafe{}@acsafe{}} -This macro returns a nonzero value (true) if @var{filedes} is a member -of the file descriptor set @var{set}, and zero (false) otherwise. - -The @var{filedes} parameter must not have side effects since it is -evaluated more than once. -@end deftypefn - -Next, here is the description of the @code{select} function itself. - -@comment sys/types.h -@comment BSD -@deftypefun int select (int @var{nfds}, fd_set *@var{read-fds}, fd_set *@var{write-fds}, fd_set *@var{except-fds}, struct timeval *@var{timeout}) -@safety{@prelim{}@mtsafe{@mtsrace{:read-fds} @mtsrace{:write-fds} @mtsrace{:except-fds}}@assafe{}@acsafe{}} -@c The select syscall is preferred, but pselect6 may be used instead, -@c which requires converting timeout to a timespec and back. The -@c conversions are not atomic. -The @code{select} function blocks the calling process until there is -activity on any of the specified sets of file descriptors, or until the -timeout period has expired. - -The file descriptors specified by the @var{read-fds} argument are -checked to see if they are ready for reading; the @var{write-fds} file -descriptors are checked to see if they are ready for writing; and the -@var{except-fds} file descriptors are checked for exceptional -conditions. You can pass a null pointer for any of these arguments if -you are not interested in checking for that kind of condition. - -A file descriptor is considered ready for reading if a @code{read} -call will not block. This usually includes the read offset being at -the end of the file or there is an error to report. A server socket -is considered ready for reading if there is a pending connection which -can be accepted with @code{accept}; @pxref{Accepting Connections}. A -client socket is ready for writing when its connection is fully -established; @pxref{Connecting}. - -``Exceptional conditions'' does not mean errors---errors are reported -immediately when an erroneous system call is executed, and do not -constitute a state of the descriptor. Rather, they include conditions -such as the presence of an urgent message on a socket. (@xref{Sockets}, -for information on urgent messages.) - -The @code{select} function checks only the first @var{nfds} file -descriptors. The usual thing is to pass @code{FD_SETSIZE} as the value -of this argument. - -The @var{timeout} specifies the maximum time to wait. If you pass a -null pointer for this argument, it means to block indefinitely until one -of the file descriptors is ready. Otherwise, you should provide the -time in @code{struct timeval} format; see @ref{High-Resolution -Calendar}. Specify zero as the time (a @code{struct timeval} containing -all zeros) if you want to find out which descriptors are ready without -waiting if none are ready. - -The normal return value from @code{select} is the total number of ready file -descriptors in all of the sets. Each of the argument sets is overwritten -with information about the descriptors that are ready for the corresponding -operation. Thus, to see if a particular descriptor @var{desc} has input, -use @code{FD_ISSET (@var{desc}, @var{read-fds})} after @code{select} returns. - -If @code{select} returns because the timeout period expires, it returns -a value of zero. - -Any signal will cause @code{select} to return immediately. So if your -program uses signals, you can't rely on @code{select} to keep waiting -for the full time specified. If you want to be sure of waiting for a -particular amount of time, you must check for @code{EINTR} and repeat -the @code{select} with a newly calculated timeout based on the current -time. See the example below. See also @ref{Interrupted Primitives}. - -If an error occurs, @code{select} returns @code{-1} and does not modify -the argument file descriptor sets. The following @code{errno} error -conditions are defined for this function: - -@table @code -@item EBADF -One of the file descriptor sets specified an invalid file descriptor. - -@item EINTR -The operation was interrupted by a signal. @xref{Interrupted Primitives}. - -@item EINVAL -The @var{timeout} argument is invalid; one of the components is negative -or too large. -@end table -@end deftypefun - -@strong{Portability Note:} The @code{select} function is a BSD Unix -feature. - -Here is an example showing how you can use @code{select} to establish a -timeout period for reading from a file descriptor. The @code{input_timeout} -function blocks the calling process until input is available on the -file descriptor, or until the timeout period expires. - -@smallexample -@include select.c.texi -@end smallexample - -There is another example showing the use of @code{select} to multiplex -input from multiple sockets in @ref{Server Example}. - - -@node Synchronizing I/O -@section Synchronizing I/O operations - -@cindex synchronizing -In most modern operating systems, the normal I/O operations are not -executed synchronously. I.e., even if a @code{write} system call -returns, this does not mean the data is actually written to the media, -e.g., the disk. - -In situations where synchronization points are necessary, you can use -special functions which ensure that all operations finish before -they return. - -@comment unistd.h -@comment X/Open -@deftypefun void sync (void) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -A call to this function will not return as long as there is data which -has not been written to the device. All dirty buffers in the kernel will -be written and so an overall consistent system can be achieved (if no -other process in parallel writes data). - -A prototype for @code{sync} can be found in @file{unistd.h}. -@end deftypefun - -Programs more often want to ensure that data written to a given file is -committed, rather than all data in the system. For this, @code{sync} is overkill. - - -@comment unistd.h -@comment POSIX -@deftypefun int fsync (int @var{fildes}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{fsync} function can be used to make sure all data associated with -the open file @var{fildes} is written to the device associated with the -descriptor. The function call does not return unless all actions have -finished. - -A prototype for @code{fsync} can be found in @file{unistd.h}. - -This function is a cancellation point in multi-threaded programs. This -is a problem if the thread allocates some resources (like memory, file -descriptors, semaphores or whatever) at the time @code{fsync} is -called. If the thread gets canceled these resources stay allocated -until the program ends. To avoid this, calls to @code{fsync} should be -protected using cancellation handlers. -@c ref pthread_cleanup_push / pthread_cleanup_pop - -The return value of the function is zero if no error occurred. Otherwise -it is @math{-1} and the global variable @var{errno} is set to the -following values: -@table @code -@item EBADF -The descriptor @var{fildes} is not valid. - -@item EINVAL -No synchronization is possible since the system does not implement this. -@end table -@end deftypefun - -Sometimes it is not even necessary to write all data associated with a -file descriptor. E.g., in database files which do not change in size it -is enough to write all the file content data to the device. -Meta-information, like the modification time etc., are not that important -and leaving such information uncommitted does not prevent a successful -recovery of the file in case of a problem. - -@comment unistd.h -@comment POSIX -@deftypefun int fdatasync (int @var{fildes}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -When a call to the @code{fdatasync} function returns, it is ensured -that all of the file data is written to the device. For all pending I/O -operations, the parts guaranteeing data integrity finished. - -Not all systems implement the @code{fdatasync} operation. On systems -missing this functionality @code{fdatasync} is emulated by a call to -@code{fsync} since the performed actions are a superset of those -required by @code{fdatasync}. - -The prototype for @code{fdatasync} is in @file{unistd.h}. - -The return value of the function is zero if no error occurred. Otherwise -it is @math{-1} and the global variable @var{errno} is set to the -following values: -@table @code -@item EBADF -The descriptor @var{fildes} is not valid. - -@item EINVAL -No synchronization is possible since the system does not implement this. -@end table -@end deftypefun - - -@node Asynchronous I/O -@section Perform I/O Operations in Parallel - -The POSIX.1b standard defines a new set of I/O operations which can -significantly reduce the time an application spends waiting for I/O. The -new functions allow a program to initiate one or more I/O operations and -then immediately resume normal work while the I/O operations are -executed in parallel. This functionality is available if the -@file{unistd.h} file defines the symbol @code{_POSIX_ASYNCHRONOUS_IO}. - -These functions are part of the library with realtime functions named -@file{librt}. They are not actually part of the @file{libc} binary. -The implementation of these functions can be done using support in the -kernel (if available) or using an implementation based on threads at -userlevel. In the latter case it might be necessary to link applications -with the thread library @file{libpthread} in addition to @file{librt}. - -All AIO operations operate on files which were opened previously. There -might be arbitrarily many operations running for one file. The -asynchronous I/O operations are controlled using a data structure named -@code{struct aiocb} (@dfn{AIO control block}). It is defined in -@file{aio.h} as follows. - -@comment aio.h -@comment POSIX.1b -@deftp {Data Type} {struct aiocb} -The POSIX.1b standard mandates that the @code{struct aiocb} structure -contains at least the members described in the following table. There -might be more elements which are used by the implementation, but -depending upon these elements is not portable and is highly deprecated. - -@table @code -@item int aio_fildes -This element specifies the file descriptor to be used for the -operation. It must be a legal descriptor, otherwise the operation will -fail. - -The device on which the file is opened must allow the seek operation. -I.e., it is not possible to use any of the AIO operations on devices -like terminals where an @code{lseek} call would lead to an error. - -@item off_t aio_offset -This element specifies the offset in the file at which the operation (input -or output) is performed. Since the operations are carried out in arbitrary -order and more than one operation for one file descriptor can be -started, one cannot expect a current read/write position of the file -descriptor. - -@item volatile void *aio_buf -This is a pointer to the buffer with the data to be written or the place -where the read data is stored. - -@item size_t aio_nbytes -This element specifies the length of the buffer pointed to by @code{aio_buf}. - -@item int aio_reqprio -If the platform has defined @code{_POSIX_PRIORITIZED_IO} and -@code{_POSIX_PRIORITY_SCHEDULING}, the AIO requests are -processed based on the current scheduling priority. The -@code{aio_reqprio} element can then be used to lower the priority of the -AIO operation. - -@item struct sigevent aio_sigevent -This element specifies how the calling process is notified once the -operation terminates. If the @code{sigev_notify} element is -@code{SIGEV_NONE}, no notification is sent. If it is @code{SIGEV_SIGNAL}, -the signal determined by @code{sigev_signo} is sent. Otherwise, -@code{sigev_notify} must be @code{SIGEV_THREAD}. In this case, a thread -is created which starts executing the function pointed to by -@code{sigev_notify_function}. - -@item int aio_lio_opcode -This element is only used by the @code{lio_listio} and -@code{lio_listio64} functions. Since these functions allow an -arbitrary number of operations to start at once, and each operation can be -input or output (or nothing), the information must be stored in the -control block. The possible values are: - -@vtable @code -@item LIO_READ -Start a read operation. Read from the file at position -@code{aio_offset} and store the next @code{aio_nbytes} bytes in the -buffer pointed to by @code{aio_buf}. - -@item LIO_WRITE -Start a write operation. Write @code{aio_nbytes} bytes starting at -@code{aio_buf} into the file starting at position @code{aio_offset}. - -@item LIO_NOP -Do nothing for this control block. This value is useful sometimes when -an array of @code{struct aiocb} values contains holes, i.e., some of the -values must not be handled although the whole array is presented to the -@code{lio_listio} function. -@end vtable -@end table - -When the sources are compiled using @code{_FILE_OFFSET_BITS == 64} on a -32 bit machine, this type is in fact @code{struct aiocb64}, since the LFS -interface transparently replaces the @code{struct aiocb} definition. -@end deftp - -For use with the AIO functions defined in the LFS, there is a similar type -defined which replaces the types of the appropriate members with larger -types but otherwise is equivalent to @code{struct aiocb}. Particularly, -all member names are the same. - -@comment aio.h -@comment POSIX.1b -@deftp {Data Type} {struct aiocb64} -@table @code -@item int aio_fildes -This element specifies the file descriptor which is used for the -operation. It must be a legal descriptor since otherwise the operation -fails for obvious reasons. - -The device on which the file is opened must allow the seek operation. -I.e., it is not possible to use any of the AIO operations on devices -like terminals where an @code{lseek} call would lead to an error. - -@item off64_t aio_offset -This element specifies at which offset in the file the operation (input -or output) is performed. Since the operation are carried in arbitrary -order and more than one operation for one file descriptor can be -started, one cannot expect a current read/write position of the file -descriptor. - -@item volatile void *aio_buf -This is a pointer to the buffer with the data to be written or the place -where the read data is stored. - -@item size_t aio_nbytes -This element specifies the length of the buffer pointed to by @code{aio_buf}. - -@item int aio_reqprio -If for the platform @code{_POSIX_PRIORITIZED_IO} and -@code{_POSIX_PRIORITY_SCHEDULING} are defined the AIO requests are -processed based on the current scheduling priority. The -@code{aio_reqprio} element can then be used to lower the priority of the -AIO operation. - -@item struct sigevent aio_sigevent -This element specifies how the calling process is notified once the -operation terminates. If the @code{sigev_notify} element is -@code{SIGEV_NONE} no notification is sent. If it is @code{SIGEV_SIGNAL}, -the signal determined by @code{sigev_signo} is sent. Otherwise, -@code{sigev_notify} must be @code{SIGEV_THREAD} in which case a thread -is created which starts executing the function pointed to by -@code{sigev_notify_function}. - -@item int aio_lio_opcode -This element is only used by the @code{lio_listio} and -@code{lio_listio64} functions. Since these functions allow an -arbitrary number of operations to start at once, and since each operation can be -input or output (or nothing), the information must be stored in the -control block. See the description of @code{struct aiocb} for a description -of the possible values. -@end table - -When the sources are compiled using @code{_FILE_OFFSET_BITS == 64} on a -32 bit machine, this type is available under the name @code{struct -aiocb64}, since the LFS transparently replaces the old interface. -@end deftp - -@menu -* Asynchronous Reads/Writes:: Asynchronous Read and Write Operations. -* Status of AIO Operations:: Getting the Status of AIO Operations. -* Synchronizing AIO Operations:: Getting into a consistent state. -* Cancel AIO Operations:: Cancellation of AIO Operations. -* Configuration of AIO:: How to optimize the AIO implementation. -@end menu - -@node Asynchronous Reads/Writes -@subsection Asynchronous Read and Write Operations - -@comment aio.h -@comment POSIX.1b -@deftypefun int aio_read (struct aiocb *@var{aiocbp}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} -@c Calls aio_enqueue_request. -@c aio_enqueue_request @asulock @ascuheap @aculock @acsmem -@c pthread_self ok -@c pthread_getschedparam @asulock @aculock -@c lll_lock (pthread descriptor's lock) @asulock @aculock -@c sched_getparam ok -@c sched_getscheduler ok -@c lll_unlock @aculock -@c pthread_mutex_lock (aio_requests_mutex) @asulock @aculock -@c get_elem @ascuheap @acsmem [@asucorrupt @acucorrupt] -@c realloc @ascuheap @acsmem -@c calloc @ascuheap @acsmem -@c aio_create_helper_thread @asulock @ascuheap @aculock @acsmem -@c pthread_attr_init ok -@c pthread_attr_setdetachstate ok -@c pthread_get_minstack ok -@c pthread_attr_setstacksize ok -@c sigfillset ok -@c memset ok -@c sigdelset ok -@c SYSCALL rt_sigprocmask ok -@c pthread_create @asulock @ascuheap @aculock @acsmem -@c lll_lock (default_pthread_attr_lock) @asulock @aculock -@c alloca/malloc @ascuheap @acsmem -@c lll_unlock @aculock -@c allocate_stack @asulock @ascuheap @aculock @acsmem -@c getpagesize dup -@c lll_lock (default_pthread_attr_lock) @asulock @aculock -@c lll_unlock @aculock -@c _dl_allocate_tls @ascuheap @acsmem -@c _dl_allocate_tls_storage @ascuheap @acsmem -@c memalign @ascuheap @acsmem -@c memset ok -@c allocate_dtv dup -@c free @ascuheap @acsmem -@c allocate_dtv @ascuheap @acsmem -@c calloc @ascuheap @acsmem -@c INSTALL_DTV ok -@c list_add dup -@c get_cached_stack -@c lll_lock (stack_cache_lock) @asulock @aculock -@c list_for_each ok -@c list_entry dup -@c FREE_P dup -@c stack_list_del dup -@c stack_list_add dup -@c lll_unlock @aculock -@c _dl_allocate_tls_init ok -@c GET_DTV ok -@c mmap ok -@c atomic_increment_val ok -@c munmap ok -@c change_stack_perm ok -@c mprotect ok -@c mprotect ok -@c stack_list_del dup -@c _dl_deallocate_tls dup -@c munmap ok -@c THREAD_COPY_STACK_GUARD ok -@c THREAD_COPY_POINTER_GUARD ok -@c atomic_exchange_acq ok -@c lll_futex_wake ok -@c deallocate_stack @asulock @ascuheap @aculock @acsmem -@c lll_lock (state_cache_lock) @asulock @aculock -@c stack_list_del ok -@c atomic_write_barrier ok -@c list_del ok -@c atomic_write_barrier ok -@c queue_stack @ascuheap @acsmem -@c stack_list_add ok -@c atomic_write_barrier ok -@c list_add ok -@c atomic_write_barrier ok -@c free_stacks @ascuheap @acsmem -@c list_for_each_prev_safe ok -@c list_entry ok -@c FREE_P ok -@c stack_list_del dup -@c _dl_deallocate_tls dup -@c munmap ok -@c _dl_deallocate_tls @ascuheap @acsmem -@c free @ascuheap @acsmem -@c lll_unlock @aculock -@c create_thread @asulock @ascuheap @aculock @acsmem -@c td_eventword -@c td_eventmask -@c do_clone @asulock @ascuheap @aculock @acsmem -@c PREPARE_CREATE ok -@c lll_lock (pd->lock) @asulock @aculock -@c atomic_increment ok -@c clone ok -@c atomic_decrement ok -@c atomic_exchange_acq ok -@c lll_futex_wake ok -@c deallocate_stack dup -@c sched_setaffinity ok -@c tgkill ok -@c sched_setscheduler ok -@c atomic_compare_and_exchange_bool_acq ok -@c nptl_create_event ok -@c lll_unlock (pd->lock) @aculock -@c free @ascuheap @acsmem -@c pthread_attr_destroy ok (cpuset won't be set, so free isn't called) -@c add_request_to_runlist ok -@c pthread_cond_signal ok -@c aio_free_request ok -@c pthread_mutex_unlock @aculock - -@c (in the new thread, initiated with clone) -@c start_thread ok -@c HP_TIMING_NOW ok -@c ctype_init @mtslocale -@c atomic_exchange_acq ok -@c lll_futex_wake ok -@c sigemptyset ok -@c sigaddset ok -@c setjmp ok -@c CANCEL_ASYNC -> pthread_enable_asynccancel ok -@c do_cancel ok -@c pthread_unwind ok -@c Unwind_ForcedUnwind or longjmp ok [@ascuheap @acsmem?] -@c lll_lock @asulock @aculock -@c lll_unlock @asulock @aculock -@c CANCEL_RESET -> pthread_disable_asynccancel ok -@c lll_futex_wait ok -@c ->start_routine ok ----- -@c call_tls_dtors @asulock @ascuheap @aculock @acsmem -@c user-supplied dtor -@c rtld_lock_lock_recursive (dl_load_lock) @asulock @aculock -@c rtld_lock_unlock_recursive @aculock -@c free @ascuheap @acsmem -@c nptl_deallocate_tsd @ascuheap @acsmem -@c tsd user-supplied dtors ok -@c free @ascuheap @acsmem -@c libc_thread_freeres -@c libc_thread_subfreeres ok -@c atomic_decrement_and_test ok -@c td_eventword ok -@c td_eventmask ok -@c atomic_compare_exchange_bool_acq ok -@c nptl_death_event ok -@c lll_robust_dead ok -@c getpagesize ok -@c madvise ok -@c free_tcb @asulock @ascuheap @aculock @acsmem -@c free @ascuheap @acsmem -@c deallocate_stack @asulock @ascuheap @aculock @acsmem -@c lll_futex_wait ok -@c exit_thread_inline ok -@c syscall(exit) ok - -This function initiates an asynchronous read operation. It -immediately returns after the operation was enqueued or when an -error was encountered. - -The first @code{aiocbp->aio_nbytes} bytes of the file for which -@code{aiocbp->aio_fildes} is a descriptor are written to the buffer -starting at @code{aiocbp->aio_buf}. Reading starts at the absolute -position @code{aiocbp->aio_offset} in the file. - -If prioritized I/O is supported by the platform the -@code{aiocbp->aio_reqprio} value is used to adjust the priority before -the request is actually enqueued. - -The calling process is notified about the termination of the read -request according to the @code{aiocbp->aio_sigevent} value. - -When @code{aio_read} returns, the return value is zero if no error -occurred that can be found before the process is enqueued. If such an -early error is found, the function returns @math{-1} and sets -@code{errno} to one of the following values: - -@table @code -@item EAGAIN -The request was not enqueued due to (temporarily) exceeded resource -limitations. -@item ENOSYS -The @code{aio_read} function is not implemented. -@item EBADF -The @code{aiocbp->aio_fildes} descriptor is not valid. This condition -need not be recognized before enqueueing the request and so this error -might also be signaled asynchronously. -@item EINVAL -The @code{aiocbp->aio_offset} or @code{aiocbp->aio_reqpiro} value is -invalid. This condition need not be recognized before enqueueing the -request and so this error might also be signaled asynchronously. -@end table - -If @code{aio_read} returns zero, the current status of the request -can be queried using @code{aio_error} and @code{aio_return} functions. -As long as the value returned by @code{aio_error} is @code{EINPROGRESS} -the operation has not yet completed. If @code{aio_error} returns zero, -the operation successfully terminated, otherwise the value is to be -interpreted as an error code. If the function terminated, the result of -the operation can be obtained using a call to @code{aio_return}. The -returned value is the same as an equivalent call to @code{read} would -have returned. Possible error codes returned by @code{aio_error} are: - -@table @code -@item EBADF -The @code{aiocbp->aio_fildes} descriptor is not valid. -@item ECANCELED -The operation was canceled before the operation was finished -(@pxref{Cancel AIO Operations}) -@item EINVAL -The @code{aiocbp->aio_offset} value is invalid. -@end table - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this -function is in fact @code{aio_read64} since the LFS interface transparently -replaces the normal implementation. -@end deftypefun - -@comment aio.h -@comment Unix98 -@deftypefun int aio_read64 (struct aiocb64 *@var{aiocbp}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} -This function is similar to the @code{aio_read} function. The only -difference is that on @w{32 bit} machines, the file descriptor should -be opened in the large file mode. Internally, @code{aio_read64} uses -functionality equivalent to @code{lseek64} (@pxref{File Position -Primitive}) to position the file descriptor correctly for the reading, -as opposed to the @code{lseek} functionality used in @code{aio_read}. - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this -function is available under the name @code{aio_read} and so transparently -replaces the interface for small files on 32 bit machines. -@end deftypefun - -To write data asynchronously to a file, there exists an equivalent pair -of functions with a very similar interface. - -@comment aio.h -@comment POSIX.1b -@deftypefun int aio_write (struct aiocb *@var{aiocbp}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} -This function initiates an asynchronous write operation. The function -call immediately returns after the operation was enqueued or if before -this happens an error was encountered. - -The first @code{aiocbp->aio_nbytes} bytes from the buffer starting at -@code{aiocbp->aio_buf} are written to the file for which -@code{aiocbp->aio_fildes} is a descriptor, starting at the absolute -position @code{aiocbp->aio_offset} in the file. - -If prioritized I/O is supported by the platform, the -@code{aiocbp->aio_reqprio} value is used to adjust the priority before -the request is actually enqueued. - -The calling process is notified about the termination of the read -request according to the @code{aiocbp->aio_sigevent} value. - -When @code{aio_write} returns, the return value is zero if no error -occurred that can be found before the process is enqueued. If such an -early error is found the function returns @math{-1} and sets -@code{errno} to one of the following values. - -@table @code -@item EAGAIN -The request was not enqueued due to (temporarily) exceeded resource -limitations. -@item ENOSYS -The @code{aio_write} function is not implemented. -@item EBADF -The @code{aiocbp->aio_fildes} descriptor is not valid. This condition -may not be recognized before enqueueing the request, and so this error -might also be signaled asynchronously. -@item EINVAL -The @code{aiocbp->aio_offset} or @code{aiocbp->aio_reqprio} value is -invalid. This condition may not be recognized before enqueueing the -request and so this error might also be signaled asynchronously. -@end table - -In the case @code{aio_write} returns zero, the current status of the -request can be queried using the @code{aio_error} and @code{aio_return} -functions. As long as the value returned by @code{aio_error} is -@code{EINPROGRESS} the operation has not yet completed. If -@code{aio_error} returns zero, the operation successfully terminated, -otherwise the value is to be interpreted as an error code. If the -function terminated, the result of the operation can be obtained using a call -to @code{aio_return}. The returned value is the same as an equivalent -call to @code{read} would have returned. Possible error codes returned -by @code{aio_error} are: - -@table @code -@item EBADF -The @code{aiocbp->aio_fildes} descriptor is not valid. -@item ECANCELED -The operation was canceled before the operation was finished. -(@pxref{Cancel AIO Operations}) -@item EINVAL -The @code{aiocbp->aio_offset} value is invalid. -@end table - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this -function is in fact @code{aio_write64} since the LFS interface transparently -replaces the normal implementation. -@end deftypefun - -@comment aio.h -@comment Unix98 -@deftypefun int aio_write64 (struct aiocb64 *@var{aiocbp}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} -This function is similar to the @code{aio_write} function. The only -difference is that on @w{32 bit} machines the file descriptor should -be opened in the large file mode. Internally @code{aio_write64} uses -functionality equivalent to @code{lseek64} (@pxref{File Position -Primitive}) to position the file descriptor correctly for the writing, -as opposed to the @code{lseek} functionality used in @code{aio_write}. - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this -function is available under the name @code{aio_write} and so transparently -replaces the interface for small files on 32 bit machines. -@end deftypefun - -Besides these functions with the more or less traditional interface, -POSIX.1b also defines a function which can initiate more than one -operation at a time, and which can handle freely mixed read and write -operations. It is therefore similar to a combination of @code{readv} and -@code{writev}. - -@comment aio.h -@comment POSIX.1b -@deftypefun int lio_listio (int @var{mode}, struct aiocb *const @var{list}[], int @var{nent}, struct sigevent *@var{sig}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} -@c Call lio_listio_internal, that takes the aio_requests_mutex lock and -@c enqueues each request. Then, it waits for notification or prepares -@c for it before releasing the lock. Even though it performs memory -@c allocation and locking of its own, it doesn't add any classes of -@c safety issues that aren't already covered by aio_enqueue_request. -The @code{lio_listio} function can be used to enqueue an arbitrary -number of read and write requests at one time. The requests can all be -meant for the same file, all for different files or every solution in -between. - -@code{lio_listio} gets the @var{nent} requests from the array pointed to -by @var{list}. The operation to be performed is determined by the -@code{aio_lio_opcode} member in each element of @var{list}. If this -field is @code{LIO_READ} a read operation is enqueued, similar to a call -of @code{aio_read} for this element of the array (except that the way -the termination is signalled is different, as we will see below). If -the @code{aio_lio_opcode} member is @code{LIO_WRITE} a write operation -is enqueued. Otherwise the @code{aio_lio_opcode} must be @code{LIO_NOP} -in which case this element of @var{list} is simply ignored. This -``operation'' is useful in situations where one has a fixed array of -@code{struct aiocb} elements from which only a few need to be handled at -a time. Another situation is where the @code{lio_listio} call was -canceled before all requests are processed (@pxref{Cancel AIO -Operations}) and the remaining requests have to be reissued. - -The other members of each element of the array pointed to by -@code{list} must have values suitable for the operation as described in -the documentation for @code{aio_read} and @code{aio_write} above. - -The @var{mode} argument determines how @code{lio_listio} behaves after -having enqueued all the requests. If @var{mode} is @code{LIO_WAIT} it -waits until all requests terminated. Otherwise @var{mode} must be -@code{LIO_NOWAIT} and in this case the function returns immediately after -having enqueued all the requests. In this case the caller gets a -notification of the termination of all requests according to the -@var{sig} parameter. If @var{sig} is @code{NULL} no notification is -sent. Otherwise a signal is sent or a thread is started, just as -described in the description for @code{aio_read} or @code{aio_write}. - -If @var{mode} is @code{LIO_WAIT}, the return value of @code{lio_listio} -is @math{0} when all requests completed successfully. Otherwise the -function returns @math{-1} and @code{errno} is set accordingly. To find -out which request or requests failed one has to use the @code{aio_error} -function on all the elements of the array @var{list}. - -In case @var{mode} is @code{LIO_NOWAIT}, the function returns @math{0} if -all requests were enqueued correctly. The current state of the requests -can be found using @code{aio_error} and @code{aio_return} as described -above. If @code{lio_listio} returns @math{-1} in this mode, the -global variable @code{errno} is set accordingly. If a request did not -yet terminate, a call to @code{aio_error} returns @code{EINPROGRESS}. If -the value is different, the request is finished and the error value (or -@math{0}) is returned and the result of the operation can be retrieved -using @code{aio_return}. - -Possible values for @code{errno} are: - -@table @code -@item EAGAIN -The resources necessary to queue all the requests are not available at -the moment. The error status for each element of @var{list} must be -checked to determine which request failed. - -Another reason could be that the system wide limit of AIO requests is -exceeded. This cannot be the case for the implementation on @gnusystems{} -since no arbitrary limits exist. -@item EINVAL -The @var{mode} parameter is invalid or @var{nent} is larger than -@code{AIO_LISTIO_MAX}. -@item EIO -One or more of the request's I/O operations failed. The error status of -each request should be checked to determine which one failed. -@item ENOSYS -The @code{lio_listio} function is not supported. -@end table - -If the @var{mode} parameter is @code{LIO_NOWAIT} and the caller cancels -a request, the error status for this request returned by -@code{aio_error} is @code{ECANCELED}. - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this -function is in fact @code{lio_listio64} since the LFS interface -transparently replaces the normal implementation. -@end deftypefun - -@comment aio.h -@comment Unix98 -@deftypefun int lio_listio64 (int @var{mode}, struct aiocb64 *const @var{list}[], int @var{nent}, struct sigevent *@var{sig}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} -This function is similar to the @code{lio_listio} function. The only -difference is that on @w{32 bit} machines, the file descriptor should -be opened in the large file mode. Internally, @code{lio_listio64} uses -functionality equivalent to @code{lseek64} (@pxref{File Position -Primitive}) to position the file descriptor correctly for the reading or -writing, as opposed to the @code{lseek} functionality used in -@code{lio_listio}. - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this -function is available under the name @code{lio_listio} and so -transparently replaces the interface for small files on 32 bit -machines. -@end deftypefun - -@node Status of AIO Operations -@subsection Getting the Status of AIO Operations - -As already described in the documentation of the functions in the last -section, it must be possible to get information about the status of an I/O -request. When the operation is performed truly asynchronously (as with -@code{aio_read} and @code{aio_write} and with @code{lio_listio} when the -mode is @code{LIO_NOWAIT}), one sometimes needs to know whether a -specific request already terminated and if so, what the result was. -The following two functions allow you to get this kind of information. - -@comment aio.h -@comment POSIX.1b -@deftypefun int aio_error (const struct aiocb *@var{aiocbp}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function determines the error state of the request described by the -@code{struct aiocb} variable pointed to by @var{aiocbp}. If the -request has not yet terminated the value returned is always -@code{EINPROGRESS}. Once the request has terminated the value -@code{aio_error} returns is either @math{0} if the request completed -successfully or it returns the value which would be stored in the -@code{errno} variable if the request would have been done using -@code{read}, @code{write}, or @code{fsync}. - -The function can return @code{ENOSYS} if it is not implemented. It -could also return @code{EINVAL} if the @var{aiocbp} parameter does not -refer to an asynchronous operation whose return status is not yet known. - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this -function is in fact @code{aio_error64} since the LFS interface -transparently replaces the normal implementation. -@end deftypefun - -@comment aio.h -@comment Unix98 -@deftypefun int aio_error64 (const struct aiocb64 *@var{aiocbp}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is similar to @code{aio_error} with the only difference -that the argument is a reference to a variable of type @code{struct -aiocb64}. - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this -function is available under the name @code{aio_error} and so -transparently replaces the interface for small files on 32 bit -machines. -@end deftypefun - -@comment aio.h -@comment POSIX.1b -@deftypefun ssize_t aio_return (struct aiocb *@var{aiocbp}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function can be used to retrieve the return status of the operation -carried out by the request described in the variable pointed to by -@var{aiocbp}. As long as the error status of this request as returned -by @code{aio_error} is @code{EINPROGRESS} the return value of this function is -undefined. - -Once the request is finished this function can be used exactly once to -retrieve the return value. Following calls might lead to undefined -behavior. The return value itself is the value which would have been -returned by the @code{read}, @code{write}, or @code{fsync} call. - -The function can return @code{ENOSYS} if it is not implemented. It -could also return @code{EINVAL} if the @var{aiocbp} parameter does not -refer to an asynchronous operation whose return status is not yet known. - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this -function is in fact @code{aio_return64} since the LFS interface -transparently replaces the normal implementation. -@end deftypefun - -@comment aio.h -@comment Unix98 -@deftypefun ssize_t aio_return64 (struct aiocb64 *@var{aiocbp}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is similar to @code{aio_return} with the only difference -that the argument is a reference to a variable of type @code{struct -aiocb64}. - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this -function is available under the name @code{aio_return} and so -transparently replaces the interface for small files on 32 bit -machines. -@end deftypefun - -@node Synchronizing AIO Operations -@subsection Getting into a Consistent State - -When dealing with asynchronous operations it is sometimes necessary to -get into a consistent state. This would mean for AIO that one wants to -know whether a certain request or a group of requests were processed. -This could be done by waiting for the notification sent by the system -after the operation terminated, but this sometimes would mean wasting -resources (mainly computation time). Instead POSIX.1b defines two -functions which will help with most kinds of consistency. - -The @code{aio_fsync} and @code{aio_fsync64} functions are only available -if the symbol @code{_POSIX_SYNCHRONIZED_IO} is defined in @file{unistd.h}. - -@cindex synchronizing -@comment aio.h -@comment POSIX.1b -@deftypefun int aio_fsync (int @var{op}, struct aiocb *@var{aiocbp}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} -@c After fcntl to check that the FD is open, it calls -@c aio_enqueue_request. -Calling this function forces all I/O operations queued at the -time of the function call operating on the file descriptor -@code{aiocbp->aio_fildes} into the synchronized I/O completion state -(@pxref{Synchronizing I/O}). The @code{aio_fsync} function returns -immediately but the notification through the method described in -@code{aiocbp->aio_sigevent} will happen only after all requests for this -file descriptor have terminated and the file is synchronized. This also -means that requests for this very same file descriptor which are queued -after the synchronization request are not affected. - -If @var{op} is @code{O_DSYNC} the synchronization happens as with a call -to @code{fdatasync}. Otherwise @var{op} should be @code{O_SYNC} and -the synchronization happens as with @code{fsync}. - -As long as the synchronization has not happened, a call to -@code{aio_error} with the reference to the object pointed to by -@var{aiocbp} returns @code{EINPROGRESS}. Once the synchronization is -done @code{aio_error} return @math{0} if the synchronization was not -successful. Otherwise the value returned is the value to which the -@code{fsync} or @code{fdatasync} function would have set the -@code{errno} variable. In this case nothing can be assumed about the -consistency of the data written to this file descriptor. - -The return value of this function is @math{0} if the request was -successfully enqueued. Otherwise the return value is @math{-1} and -@code{errno} is set to one of the following values: - -@table @code -@item EAGAIN -The request could not be enqueued due to temporary lack of resources. -@item EBADF -The file descriptor @code{@var{aiocbp}->aio_fildes} is not valid. -@item EINVAL -The implementation does not support I/O synchronization or the @var{op} -parameter is other than @code{O_DSYNC} and @code{O_SYNC}. -@item ENOSYS -This function is not implemented. -@end table - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this -function is in fact @code{aio_fsync64} since the LFS interface -transparently replaces the normal implementation. -@end deftypefun - -@comment aio.h -@comment Unix98 -@deftypefun int aio_fsync64 (int @var{op}, struct aiocb64 *@var{aiocbp}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} -This function is similar to @code{aio_fsync} with the only difference -that the argument is a reference to a variable of type @code{struct -aiocb64}. - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this -function is available under the name @code{aio_fsync} and so -transparently replaces the interface for small files on 32 bit -machines. -@end deftypefun - -Another method of synchronization is to wait until one or more requests of a -specific set terminated. This could be achieved by the @code{aio_*} -functions to notify the initiating process about the termination but in -some situations this is not the ideal solution. In a program which -constantly updates clients somehow connected to the server it is not -always the best solution to go round robin since some connections might -be slow. On the other hand letting the @code{aio_*} functions notify the -caller might also be not the best solution since whenever the process -works on preparing data for a client it makes no sense to be -interrupted by a notification since the new client will not be handled -before the current client is served. For situations like this -@code{aio_suspend} should be used. - -@comment aio.h -@comment POSIX.1b -@deftypefun int aio_suspend (const struct aiocb *const @var{list}[], int @var{nent}, const struct timespec *@var{timeout}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}} -@c Take aio_requests_mutex, set up waitlist and requestlist, wait -@c for completion or timeout, and release the mutex. -When calling this function, the calling thread is suspended until at -least one of the requests pointed to by the @var{nent} elements of the -array @var{list} has completed. If any of the requests has already -completed at the time @code{aio_suspend} is called, the function returns -immediately. Whether a request has terminated or not is determined by -comparing the error status of the request with @code{EINPROGRESS}. If -an element of @var{list} is @code{NULL}, the entry is simply ignored. - -If no request has finished, the calling process is suspended. If -@var{timeout} is @code{NULL}, the process is not woken until a request -has finished. If @var{timeout} is not @code{NULL}, the process remains -suspended at least as long as specified in @var{timeout}. In this case, -@code{aio_suspend} returns with an error. - -The return value of the function is @math{0} if one or more requests -from the @var{list} have terminated. Otherwise the function returns -@math{-1} and @code{errno} is set to one of the following values: - -@table @code -@item EAGAIN -None of the requests from the @var{list} completed in the time specified -by @var{timeout}. -@item EINTR -A signal interrupted the @code{aio_suspend} function. This signal might -also be sent by the AIO implementation while signalling the termination -of one of the requests. -@item ENOSYS -The @code{aio_suspend} function is not implemented. -@end table - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this -function is in fact @code{aio_suspend64} since the LFS interface -transparently replaces the normal implementation. -@end deftypefun - -@comment aio.h -@comment Unix98 -@deftypefun int aio_suspend64 (const struct aiocb64 *const @var{list}[], int @var{nent}, const struct timespec *@var{timeout}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}} -This function is similar to @code{aio_suspend} with the only difference -that the argument is a reference to a variable of type @code{struct -aiocb64}. - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this -function is available under the name @code{aio_suspend} and so -transparently replaces the interface for small files on 32 bit -machines. -@end deftypefun - -@node Cancel AIO Operations -@subsection Cancellation of AIO Operations - -When one or more requests are asynchronously processed, it might be -useful in some situations to cancel a selected operation, e.g., if it -becomes obvious that the written data is no longer accurate and would -have to be overwritten soon. As an example, assume an application, which -writes data in files in a situation where new incoming data would have -to be written in a file which will be updated by an enqueued request. -The POSIX AIO implementation provides such a function, but this function -is not capable of forcing the cancellation of the request. It is up to the -implementation to decide whether it is possible to cancel the operation -or not. Therefore using this function is merely a hint. - -@comment aio.h -@comment POSIX.1b -@deftypefun int aio_cancel (int @var{fildes}, struct aiocb *@var{aiocbp}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} -@c After fcntl to check the fd is open, hold aio_requests_mutex, call -@c aio_find_req_fd, aio_remove_request, then aio_notify and -@c aio_free_request each request before releasing the lock. -@c aio_notify calls aio_notify_only and free, besides cond signal or -@c similar. aio_notify_only calls pthread_attr_init, -@c pthread_attr_setdetachstate, malloc, pthread_create, -@c notify_func_wrapper, aio_sigqueue, getpid, raise. -@c notify_func_wraper calls aio_start_notify_thread, free and then the -@c notifier function. -The @code{aio_cancel} function can be used to cancel one or more -outstanding requests. If the @var{aiocbp} parameter is @code{NULL}, the -function tries to cancel all of the outstanding requests which would process -the file descriptor @var{fildes} (i.e., whose @code{aio_fildes} member -is @var{fildes}). If @var{aiocbp} is not @code{NULL}, @code{aio_cancel} -attempts to cancel the specific request pointed to by @var{aiocbp}. - -For requests which were successfully canceled, the normal notification -about the termination of the request should take place. I.e., depending -on the @code{struct sigevent} object which controls this, nothing -happens, a signal is sent or a thread is started. If the request cannot -be canceled, it terminates the usual way after performing the operation. - -After a request is successfully canceled, a call to @code{aio_error} with -a reference to this request as the parameter will return -@code{ECANCELED} and a call to @code{aio_return} will return @math{-1}. -If the request wasn't canceled and is still running the error status is -still @code{EINPROGRESS}. - -The return value of the function is @code{AIO_CANCELED} if there were -requests which haven't terminated and which were successfully canceled. -If there is one or more requests left which couldn't be canceled, the -return value is @code{AIO_NOTCANCELED}. In this case @code{aio_error} -must be used to find out which of the, perhaps multiple, requests (if -@var{aiocbp} is @code{NULL}) weren't successfully canceled. If all -requests already terminated at the time @code{aio_cancel} is called the -return value is @code{AIO_ALLDONE}. - -If an error occurred during the execution of @code{aio_cancel} the -function returns @math{-1} and sets @code{errno} to one of the following -values. - -@table @code -@item EBADF -The file descriptor @var{fildes} is not valid. -@item ENOSYS -@code{aio_cancel} is not implemented. -@end table - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this -function is in fact @code{aio_cancel64} since the LFS interface -transparently replaces the normal implementation. -@end deftypefun - -@comment aio.h -@comment Unix98 -@deftypefun int aio_cancel64 (int @var{fildes}, struct aiocb64 *@var{aiocbp}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} -This function is similar to @code{aio_cancel} with the only difference -that the argument is a reference to a variable of type @code{struct -aiocb64}. - -When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this -function is available under the name @code{aio_cancel} and so -transparently replaces the interface for small files on 32 bit -machines. -@end deftypefun - -@node Configuration of AIO -@subsection How to optimize the AIO implementation - -The POSIX standard does not specify how the AIO functions are -implemented. They could be system calls, but it is also possible to -emulate them at userlevel. - -At the time of writing, the available implementation is a user-level -implementation which uses threads for handling the enqueued requests. -While this implementation requires making some decisions about -limitations, hard limitations are something best avoided -in @theglibc{}. Therefore, @theglibc{} provides a means -for tuning the AIO implementation according to the individual use. - -@comment aio.h -@comment GNU -@deftp {Data Type} {struct aioinit} -This data type is used to pass the configuration or tunable parameters -to the implementation. The program has to initialize the members of -this struct and pass it to the implementation using the @code{aio_init} -function. - -@table @code -@item int aio_threads -This member specifies the maximal number of threads which may be used -at any one time. -@item int aio_num -This number provides an estimate on the maximal number of simultaneously -enqueued requests. -@item int aio_locks -Unused. -@item int aio_usedba -Unused. -@item int aio_debug -Unused. -@item int aio_numusers -Unused. -@item int aio_reserved[2] -Unused. -@end table -@end deftp - -@comment aio.h -@comment GNU -@deftypefun void aio_init (const struct aioinit *@var{init}) -@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}} -@c All changes to global objects are guarded by aio_requests_mutex. -This function must be called before any other AIO function. Calling it -is completely voluntary, as it is only meant to help the AIO -implementation perform better. - -Before calling @code{aio_init}, the members of a variable of -type @code{struct aioinit} must be initialized. Then a reference to -this variable is passed as the parameter to @code{aio_init} which itself -may or may not pay attention to the hints. - -The function has no return value and no error cases are defined. It is -an extension which follows a proposal from the SGI implementation in -@w{Irix 6}. It is not covered by POSIX.1b or Unix98. -@end deftypefun - -@node Control Operations -@section Control Operations on Files - -@cindex control operations on files -@cindex @code{fcntl} function -This section describes how you can perform various other operations on -file descriptors, such as inquiring about or setting flags describing -the status of the file descriptor, manipulating record locks, and the -like. All of these operations are performed by the function @code{fcntl}. - -The second argument to the @code{fcntl} function is a command that -specifies which operation to perform. The function and macros that name -various flags that are used with it are declared in the header file -@file{fcntl.h}. Many of these flags are also used by the @code{open} -function; see @ref{Opening and Closing Files}. -@pindex fcntl.h - -@comment fcntl.h -@comment POSIX.1 -@deftypefun int fcntl (int @var{filedes}, int @var{command}, @dots{}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -The @code{fcntl} function performs the operation specified by -@var{command} on the file descriptor @var{filedes}. Some commands -require additional arguments to be supplied. These additional arguments -and the return value and error conditions are given in the detailed -descriptions of the individual commands. - -Briefly, here is a list of what the various commands are. - -@vtable @code -@item F_DUPFD -Duplicate the file descriptor (return another file descriptor pointing -to the same open file). @xref{Duplicating Descriptors}. - -@item F_GETFD -Get flags associated with the file descriptor. @xref{Descriptor Flags}. - -@item F_SETFD -Set flags associated with the file descriptor. @xref{Descriptor Flags}. - -@item F_GETFL -Get flags associated with the open file. @xref{File Status Flags}. - -@item F_SETFL -Set flags associated with the open file. @xref{File Status Flags}. - -@item F_GETLK -Test a file lock. @xref{File Locks}. - -@item F_SETLK -Set or clear a file lock. @xref{File Locks}. - -@item F_SETLKW -Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. - -@item F_OFD_GETLK -Test an open file description lock. @xref{Open File Description Locks}. -Specific to Linux. - -@item F_OFD_SETLK -Set or clear an open file description lock. @xref{Open File Description Locks}. -Specific to Linux. - -@item F_OFD_SETLKW -Like @code{F_OFD_SETLK}, but block until lock is acquired. -@xref{Open File Description Locks}. Specific to Linux. - -@item F_GETOWN -Get process or process group ID to receive @code{SIGIO} signals. -@xref{Interrupt Input}. - -@item F_SETOWN -Set process or process group ID to receive @code{SIGIO} signals. -@xref{Interrupt Input}. -@end vtable - -This function is a cancellation point in multi-threaded programs. This -is a problem if the thread allocates some resources (like memory, file -descriptors, semaphores or whatever) at the time @code{fcntl} is -called. If the thread gets canceled these resources stay allocated -until the program ends. To avoid this calls to @code{fcntl} should be -protected using cancellation handlers. -@c ref pthread_cleanup_push / pthread_cleanup_pop -@end deftypefun - - -@node Duplicating Descriptors -@section Duplicating Descriptors - -@cindex duplicating file descriptors -@cindex redirecting input and output - -You can @dfn{duplicate} a file descriptor, or allocate another file -descriptor that refers to the same open file as the original. Duplicate -descriptors share one file position and one set of file status flags -(@pxref{File Status Flags}), but each has its own set of file descriptor -flags (@pxref{Descriptor Flags}). - -The major use of duplicating a file descriptor is to implement -@dfn{redirection} of input or output: that is, to change the -file or pipe that a particular file descriptor corresponds to. - -You can perform this operation using the @code{fcntl} function with the -@code{F_DUPFD} command, but there are also convenient functions -@code{dup} and @code{dup2} for duplicating descriptors. - -@pindex unistd.h -@pindex fcntl.h -The @code{fcntl} function and flags are declared in @file{fcntl.h}, -while prototypes for @code{dup} and @code{dup2} are in the header file -@file{unistd.h}. - -@comment unistd.h -@comment POSIX.1 -@deftypefun int dup (int @var{old}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function copies descriptor @var{old} to the first available -descriptor number (the first number not currently open). It is -equivalent to @code{fcntl (@var{old}, F_DUPFD, 0)}. -@end deftypefun - -@comment unistd.h -@comment POSIX.1 -@deftypefun int dup2 (int @var{old}, int @var{new}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function copies the descriptor @var{old} to descriptor number -@var{new}. - -If @var{old} is an invalid descriptor, then @code{dup2} does nothing; it -does not close @var{new}. Otherwise, the new duplicate of @var{old} -replaces any previous meaning of descriptor @var{new}, as if @var{new} -were closed first. - -If @var{old} and @var{new} are different numbers, and @var{old} is a -valid descriptor number, then @code{dup2} is equivalent to: - -@smallexample -close (@var{new}); -fcntl (@var{old}, F_DUPFD, @var{new}) -@end smallexample - -However, @code{dup2} does this atomically; there is no instant in the -middle of calling @code{dup2} at which @var{new} is closed and not yet a -duplicate of @var{old}. -@end deftypefun - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int F_DUPFD -This macro is used as the @var{command} argument to @code{fcntl}, to -copy the file descriptor given as the first argument. - -The form of the call in this case is: - -@smallexample -fcntl (@var{old}, F_DUPFD, @var{next-filedes}) -@end smallexample - -The @var{next-filedes} argument is of type @code{int} and specifies that -the file descriptor returned should be the next available one greater -than or equal to this value. - -The return value from @code{fcntl} with this command is normally the value -of the new file descriptor. A return value of @math{-1} indicates an -error. The following @code{errno} error conditions are defined for -this command: - -@table @code -@item EBADF -The @var{old} argument is invalid. - -@item EINVAL -The @var{next-filedes} argument is invalid. - -@item EMFILE -There are no more file descriptors available---your program is already -using the maximum. In BSD and GNU, the maximum is controlled by a -resource limit that can be changed; @pxref{Limits on Resources}, for -more information about the @code{RLIMIT_NOFILE} limit. -@end table - -@code{ENFILE} is not a possible error code for @code{dup2} because -@code{dup2} does not create a new opening of a file; duplicate -descriptors do not count toward the limit which @code{ENFILE} -indicates. @code{EMFILE} is possible because it refers to the limit on -distinct descriptor numbers in use in one process. -@end deftypevr - -Here is an example showing how to use @code{dup2} to do redirection. -Typically, redirection of the standard streams (like @code{stdin}) is -done by a shell or shell-like program before calling one of the -@code{exec} functions (@pxref{Executing a File}) to execute a new -program in a child process. When the new program is executed, it -creates and initializes the standard streams to point to the -corresponding file descriptors, before its @code{main} function is -invoked. - -So, to redirect standard input to a file, the shell could do something -like: - -@smallexample -pid = fork (); -if (pid == 0) - @{ - char *filename; - char *program; - int file; - @dots{} - file = TEMP_FAILURE_RETRY (open (filename, O_RDONLY)); - dup2 (file, STDIN_FILENO); - TEMP_FAILURE_RETRY (close (file)); - execv (program, NULL); - @} -@end smallexample - -There is also a more detailed example showing how to implement redirection -in the context of a pipeline of processes in @ref{Launching Jobs}. - - -@node Descriptor Flags -@section File Descriptor Flags -@cindex file descriptor flags - -@dfn{File descriptor flags} are miscellaneous attributes of a file -descriptor. These flags are associated with particular file -descriptors, so that if you have created duplicate file descriptors -from a single opening of a file, each descriptor has its own set of flags. - -Currently there is just one file descriptor flag: @code{FD_CLOEXEC}, -which causes the descriptor to be closed if you use any of the -@code{exec@dots{}} functions (@pxref{Executing a File}). - -The symbols in this section are defined in the header file -@file{fcntl.h}. -@pindex fcntl.h - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int F_GETFD -This macro is used as the @var{command} argument to @code{fcntl}, to -specify that it should return the file descriptor flags associated -with the @var{filedes} argument. - -The normal return value from @code{fcntl} with this command is a -nonnegative number which can be interpreted as the bitwise OR of the -individual flags (except that currently there is only one flag to use). - -In case of an error, @code{fcntl} returns @math{-1}. The following -@code{errno} error conditions are defined for this command: - -@table @code -@item EBADF -The @var{filedes} argument is invalid. -@end table -@end deftypevr - - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int F_SETFD -This macro is used as the @var{command} argument to @code{fcntl}, to -specify that it should set the file descriptor flags associated with the -@var{filedes} argument. This requires a third @code{int} argument to -specify the new flags, so the form of the call is: - -@smallexample -fcntl (@var{filedes}, F_SETFD, @var{new-flags}) -@end smallexample - -The normal return value from @code{fcntl} with this command is an -unspecified value other than @math{-1}, which indicates an error. -The flags and error conditions are the same as for the @code{F_GETFD} -command. -@end deftypevr - -The following macro is defined for use as a file descriptor flag with -the @code{fcntl} function. The value is an integer constant usable -as a bit mask value. - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int FD_CLOEXEC -@cindex close-on-exec (file descriptor flag) -This flag specifies that the file descriptor should be closed when -an @code{exec} function is invoked; see @ref{Executing a File}. When -a file descriptor is allocated (as with @code{open} or @code{dup}), -this bit is initially cleared on the new file descriptor, meaning that -descriptor will survive into the new program after @code{exec}. -@end deftypevr - -If you want to modify the file descriptor flags, you should get the -current flags with @code{F_GETFD} and modify the value. Don't assume -that the flags listed here are the only ones that are implemented; your -program may be run years from now and more flags may exist then. For -example, here is a function to set or clear the flag @code{FD_CLOEXEC} -without altering any other flags: - -@smallexample -/* @r{Set the @code{FD_CLOEXEC} flag of @var{desc} if @var{value} is nonzero,} - @r{or clear the flag if @var{value} is 0.} - @r{Return 0 on success, or -1 on error with @code{errno} set.} */ - -int -set_cloexec_flag (int desc, int value) -@{ - int oldflags = fcntl (desc, F_GETFD, 0); - /* @r{If reading the flags failed, return error indication now.} */ - if (oldflags < 0) - return oldflags; - /* @r{Set just the flag we want to set.} */ - if (value != 0) - oldflags |= FD_CLOEXEC; - else - oldflags &= ~FD_CLOEXEC; - /* @r{Store modified flag word in the descriptor.} */ - return fcntl (desc, F_SETFD, oldflags); -@} -@end smallexample - -@node File Status Flags -@section File Status Flags -@cindex file status flags - -@dfn{File status flags} are used to specify attributes of the opening of a -file. Unlike the file descriptor flags discussed in @ref{Descriptor -Flags}, the file status flags are shared by duplicated file descriptors -resulting from a single opening of the file. The file status flags are -specified with the @var{flags} argument to @code{open}; -@pxref{Opening and Closing Files}. - -File status flags fall into three categories, which are described in the -following sections. - -@itemize @bullet -@item -@ref{Access Modes}, specify what type of access is allowed to the -file: reading, writing, or both. They are set by @code{open} and are -returned by @code{fcntl}, but cannot be changed. - -@item -@ref{Open-time Flags}, control details of what @code{open} will do. -These flags are not preserved after the @code{open} call. - -@item -@ref{Operating Modes}, affect how operations such as @code{read} and -@code{write} are done. They are set by @code{open}, and can be fetched or -changed with @code{fcntl}. -@end itemize - -The symbols in this section are defined in the header file -@file{fcntl.h}. -@pindex fcntl.h - -@menu -* Access Modes:: Whether the descriptor can read or write. -* Open-time Flags:: Details of @code{open}. -* Operating Modes:: Special modes to control I/O operations. -* Getting File Status Flags:: Fetching and changing these flags. -@end menu - -@node Access Modes -@subsection File Access Modes - -The file access modes allow a file descriptor to be used for reading, -writing, or both. (On @gnuhurdsystems{}, they can also allow none of these, -and allow execution of the file as a program.) The access modes are chosen -when the file is opened, and never change. - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int O_RDONLY -Open the file for read access. -@end deftypevr - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int O_WRONLY -Open the file for write access. -@end deftypevr - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int O_RDWR -Open the file for both reading and writing. -@end deftypevr - -On @gnuhurdsystems{} (and not on other systems), @code{O_RDONLY} and -@code{O_WRONLY} are independent bits that can be bitwise-ORed together, -and it is valid for either bit to be set or clear. This means that -@code{O_RDWR} is the same as @code{O_RDONLY|O_WRONLY}. A file access -mode of zero is permissible; it allows no operations that do input or -output to the file, but does allow other operations such as -@code{fchmod}. On @gnuhurdsystems{}, since ``read-only'' or ``write-only'' -is a misnomer, @file{fcntl.h} defines additional names for the file -access modes. These names are preferred when writing GNU-specific code. -But most programs will want to be portable to other POSIX.1 systems and -should use the POSIX.1 names above instead. - -@comment fcntl.h (optional) -@comment GNU -@deftypevr Macro int O_READ -Open the file for reading. Same as @code{O_RDONLY}; only defined on GNU. -@end deftypevr - -@comment fcntl.h (optional) -@comment GNU -@deftypevr Macro int O_WRITE -Open the file for writing. Same as @code{O_WRONLY}; only defined on GNU. -@end deftypevr - -@comment fcntl.h (optional) -@comment GNU -@deftypevr Macro int O_EXEC -Open the file for executing. Only defined on GNU. -@end deftypevr - -To determine the file access mode with @code{fcntl}, you must extract -the access mode bits from the retrieved file status flags. On -@gnuhurdsystems{}, -you can just test the @code{O_READ} and @code{O_WRITE} bits in -the flags word. But in other POSIX.1 systems, reading and writing -access modes are not stored as distinct bit flags. The portable way to -extract the file access mode bits is with @code{O_ACCMODE}. - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int O_ACCMODE -This macro stands for a mask that can be bitwise-ANDed with the file -status flag value to produce a value representing the file access mode. -The mode will be @code{O_RDONLY}, @code{O_WRONLY}, or @code{O_RDWR}. -(On @gnuhurdsystems{} it could also be zero, and it never includes the -@code{O_EXEC} bit.) -@end deftypevr - -@node Open-time Flags -@subsection Open-time Flags - -The open-time flags specify options affecting how @code{open} will behave. -These options are not preserved once the file is open. The exception to -this is @code{O_NONBLOCK}, which is also an I/O operating mode and so it -@emph{is} saved. @xref{Opening and Closing Files}, for how to call -@code{open}. - -There are two sorts of options specified by open-time flags. - -@itemize @bullet -@item -@dfn{File name translation flags} affect how @code{open} looks up the -file name to locate the file, and whether the file can be created. -@cindex file name translation flags -@cindex flags, file name translation - -@item -@dfn{Open-time action flags} specify extra operations that @code{open} will -perform on the file once it is open. -@cindex open-time action flags -@cindex flags, open-time action -@end itemize - -Here are the file name translation flags. - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int O_CREAT -If set, the file will be created if it doesn't already exist. -@c !!! mode arg, umask -@cindex create on open (file status flag) -@end deftypevr - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int O_EXCL -If both @code{O_CREAT} and @code{O_EXCL} are set, then @code{open} fails -if the specified file already exists. This is guaranteed to never -clobber an existing file. -@end deftypevr - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int O_NONBLOCK -@cindex non-blocking open -This prevents @code{open} from blocking for a ``long time'' to open the -file. This is only meaningful for some kinds of files, usually devices -such as serial ports; when it is not meaningful, it is harmless and -ignored. Often, opening a port to a modem blocks until the modem reports -carrier detection; if @code{O_NONBLOCK} is specified, @code{open} will -return immediately without a carrier. - -Note that the @code{O_NONBLOCK} flag is overloaded as both an I/O operating -mode and a file name translation flag. This means that specifying -@code{O_NONBLOCK} in @code{open} also sets nonblocking I/O mode; -@pxref{Operating Modes}. To open the file without blocking but do normal -I/O that blocks, you must call @code{open} with @code{O_NONBLOCK} set and -then call @code{fcntl} to turn the bit off. -@end deftypevr - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int O_NOCTTY -If the named file is a terminal device, don't make it the controlling -terminal for the process. @xref{Job Control}, for information about -what it means to be the controlling terminal. - -On @gnuhurdsystems{} and 4.4 BSD, opening a file never makes it the -controlling terminal and @code{O_NOCTTY} is zero. However, @gnulinuxsystems{} -and some other systems use a nonzero value for @code{O_NOCTTY} and set the -controlling terminal when you open a file that is a terminal device; so -to be portable, use @code{O_NOCTTY} when it is important to avoid this. -@cindex controlling terminal, setting -@end deftypevr - -The following three file name translation flags exist only on -@gnuhurdsystems{}. - -@comment fcntl.h (optional) -@comment GNU -@deftypevr Macro int O_IGNORE_CTTY -Do not recognize the named file as the controlling terminal, even if it -refers to the process's existing controlling terminal device. Operations -on the new file descriptor will never induce job control signals. -@xref{Job Control}. -@end deftypevr - -@comment fcntl.h (optional) -@comment GNU -@deftypevr Macro int O_NOLINK -If the named file is a symbolic link, open the link itself instead of -the file it refers to. (@code{fstat} on the new file descriptor will -return the information returned by @code{lstat} on the link's name.) -@cindex symbolic link, opening -@end deftypevr - -@comment fcntl.h (optional) -@comment GNU -@deftypevr Macro int O_NOTRANS -If the named file is specially translated, do not invoke the translator. -Open the bare file the translator itself sees. -@end deftypevr - - -The open-time action flags tell @code{open} to do additional operations -which are not really related to opening the file. The reason to do them -as part of @code{open} instead of in separate calls is that @code{open} -can do them @i{atomically}. - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int O_TRUNC -Truncate the file to zero length. This option is only useful for -regular files, not special files such as directories or FIFOs. POSIX.1 -requires that you open the file for writing to use @code{O_TRUNC}. In -BSD and GNU you must have permission to write the file to truncate it, -but you need not open for write access. - -This is the only open-time action flag specified by POSIX.1. There is -no good reason for truncation to be done by @code{open}, instead of by -calling @code{ftruncate} afterwards. The @code{O_TRUNC} flag existed in -Unix before @code{ftruncate} was invented, and is retained for backward -compatibility. -@end deftypevr - -The remaining operating modes are BSD extensions. They exist only -on some systems. On other systems, these macros are not defined. - -@comment fcntl.h (optional) -@comment BSD -@deftypevr Macro int O_SHLOCK -Acquire a shared lock on the file, as with @code{flock}. -@xref{File Locks}. - -If @code{O_CREAT} is specified, the locking is done atomically when -creating the file. You are guaranteed that no other process will get -the lock on the new file first. -@end deftypevr - -@comment fcntl.h (optional) -@comment BSD -@deftypevr Macro int O_EXLOCK -Acquire an exclusive lock on the file, as with @code{flock}. -@xref{File Locks}. This is atomic like @code{O_SHLOCK}. -@end deftypevr - -@node Operating Modes -@subsection I/O Operating Modes - -The operating modes affect how input and output operations using a file -descriptor work. These flags are set by @code{open} and can be fetched -and changed with @code{fcntl}. - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int O_APPEND -The bit that enables append mode for the file. If set, then all -@code{write} operations write the data at the end of the file, extending -it, regardless of the current file position. This is the only reliable -way to append to a file. In append mode, you are guaranteed that the -data you write will always go to the current end of the file, regardless -of other processes writing to the file. Conversely, if you simply set -the file position to the end of file and write, then another process can -extend the file after you set the file position but before you write, -resulting in your data appearing someplace before the real end of file. -@end deftypevr - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int O_NONBLOCK -The bit that enables nonblocking mode for the file. If this bit is set, -@code{read} requests on the file can return immediately with a failure -status if there is no input immediately available, instead of blocking. -Likewise, @code{write} requests can also return immediately with a -failure status if the output can't be written immediately. - -Note that the @code{O_NONBLOCK} flag is overloaded as both an I/O -operating mode and a file name translation flag; @pxref{Open-time Flags}. -@end deftypevr - -@comment fcntl.h -@comment BSD -@deftypevr Macro int O_NDELAY -This is an obsolete name for @code{O_NONBLOCK}, provided for -compatibility with BSD. It is not defined by the POSIX.1 standard. -@end deftypevr - -The remaining operating modes are BSD and GNU extensions. They exist only -on some systems. On other systems, these macros are not defined. - -@comment fcntl.h -@comment BSD -@deftypevr Macro int O_ASYNC -The bit that enables asynchronous input mode. If set, then @code{SIGIO} -signals will be generated when input is available. @xref{Interrupt Input}. - -Asynchronous input mode is a BSD feature. -@end deftypevr - -@comment fcntl.h -@comment BSD -@deftypevr Macro int O_FSYNC -The bit that enables synchronous writing for the file. If set, each -@code{write} call will make sure the data is reliably stored on disk before -returning. @c !!! xref fsync - -Synchronous writing is a BSD feature. -@end deftypevr - -@comment fcntl.h -@comment BSD -@deftypevr Macro int O_SYNC -This is another name for @code{O_FSYNC}. They have the same value. -@end deftypevr - -@comment fcntl.h -@comment GNU -@deftypevr Macro int O_NOATIME -If this bit is set, @code{read} will not update the access time of the -file. @xref{File Times}. This is used by programs that do backups, so -that backing a file up does not count as reading it. -Only the owner of the file or the superuser may use this bit. - -This is a GNU extension. -@end deftypevr - -@node Getting File Status Flags -@subsection Getting and Setting File Status Flags - -The @code{fcntl} function can fetch or change file status flags. - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int F_GETFL -This macro is used as the @var{command} argument to @code{fcntl}, to -read the file status flags for the open file with descriptor -@var{filedes}. - -The normal return value from @code{fcntl} with this command is a -nonnegative number which can be interpreted as the bitwise OR of the -individual flags. Since the file access modes are not single-bit values, -you can mask off other bits in the returned flags with @code{O_ACCMODE} -to compare them. - -In case of an error, @code{fcntl} returns @math{-1}. The following -@code{errno} error conditions are defined for this command: - -@table @code -@item EBADF -The @var{filedes} argument is invalid. -@end table -@end deftypevr - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int F_SETFL -This macro is used as the @var{command} argument to @code{fcntl}, to set -the file status flags for the open file corresponding to the -@var{filedes} argument. This command requires a third @code{int} -argument to specify the new flags, so the call looks like this: - -@smallexample -fcntl (@var{filedes}, F_SETFL, @var{new-flags}) -@end smallexample - -You can't change the access mode for the file in this way; that is, -whether the file descriptor was opened for reading or writing. - -The normal return value from @code{fcntl} with this command is an -unspecified value other than @math{-1}, which indicates an error. The -error conditions are the same as for the @code{F_GETFL} command. -@end deftypevr - -If you want to modify the file status flags, you should get the current -flags with @code{F_GETFL} and modify the value. Don't assume that the -flags listed here are the only ones that are implemented; your program -may be run years from now and more flags may exist then. For example, -here is a function to set or clear the flag @code{O_NONBLOCK} without -altering any other flags: - -@smallexample -@group -/* @r{Set the @code{O_NONBLOCK} flag of @var{desc} if @var{value} is nonzero,} - @r{or clear the flag if @var{value} is 0.} - @r{Return 0 on success, or -1 on error with @code{errno} set.} */ - -int -set_nonblock_flag (int desc, int value) -@{ - int oldflags = fcntl (desc, F_GETFL, 0); - /* @r{If reading the flags failed, return error indication now.} */ - if (oldflags == -1) - return -1; - /* @r{Set just the flag we want to set.} */ - if (value != 0) - oldflags |= O_NONBLOCK; - else - oldflags &= ~O_NONBLOCK; - /* @r{Store modified flag word in the descriptor.} */ - return fcntl (desc, F_SETFL, oldflags); -@} -@end group -@end smallexample - -@node File Locks -@section File Locks - -@cindex file locks -@cindex record locking -This section describes record locks that are associated with the process. -There is also a different type of record lock that is associated with the -open file description instead of the process. @xref{Open File Description Locks}. - -The remaining @code{fcntl} commands are used to support @dfn{record -locking}, which permits multiple cooperating programs to prevent each -other from simultaneously accessing parts of a file in error-prone -ways. - -@cindex exclusive lock -@cindex write lock -An @dfn{exclusive} or @dfn{write} lock gives a process exclusive access -for writing to the specified part of the file. While a write lock is in -place, no other process can lock that part of the file. - -@cindex shared lock -@cindex read lock -A @dfn{shared} or @dfn{read} lock prohibits any other process from -requesting a write lock on the specified part of the file. However, -other processes can request read locks. - -The @code{read} and @code{write} functions do not actually check to see -whether there are any locks in place. If you want to implement a -locking protocol for a file shared by multiple processes, your application -must do explicit @code{fcntl} calls to request and clear locks at the -appropriate points. - -Locks are associated with processes. A process can only have one kind -of lock set for each byte of a given file. When any file descriptor for -that file is closed by the process, all of the locks that process holds -on that file are released, even if the locks were made using other -descriptors that remain open. Likewise, locks are released when a -process exits, and are not inherited by child processes created using -@code{fork} (@pxref{Creating a Process}). - -When making a lock, use a @code{struct flock} to specify what kind of -lock and where. This data type and the associated macros for the -@code{fcntl} function are declared in the header file @file{fcntl.h}. -@pindex fcntl.h - -@comment fcntl.h -@comment POSIX.1 -@deftp {Data Type} {struct flock} -This structure is used with the @code{fcntl} function to describe a file -lock. It has these members: - -@table @code -@item short int l_type -Specifies the type of the lock; one of @code{F_RDLCK}, @code{F_WRLCK}, or -@code{F_UNLCK}. - -@item short int l_whence -This corresponds to the @var{whence} argument to @code{fseek} or -@code{lseek}, and specifies what the offset is relative to. Its value -can be one of @code{SEEK_SET}, @code{SEEK_CUR}, or @code{SEEK_END}. - -@item off_t l_start -This specifies the offset of the start of the region to which the lock -applies, and is given in bytes relative to the point specified by the -@code{l_whence} member. - -@item off_t l_len -This specifies the length of the region to be locked. A value of -@code{0} is treated specially; it means the region extends to the end of -the file. - -@item pid_t l_pid -This field is the process ID (@pxref{Process Creation Concepts}) of the -process holding the lock. It is filled in by calling @code{fcntl} with -the @code{F_GETLK} command, but is ignored when making a lock. If the -conflicting lock is an open file description lock -(@pxref{Open File Description Locks}), then this field will be set to -@math{-1}. -@end table -@end deftp - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int F_GETLK -This macro is used as the @var{command} argument to @code{fcntl}, to -specify that it should get information about a lock. This command -requires a third argument of type @w{@code{struct flock *}} to be passed -to @code{fcntl}, so that the form of the call is: - -@smallexample -fcntl (@var{filedes}, F_GETLK, @var{lockp}) -@end smallexample - -If there is a lock already in place that would block the lock described -by the @var{lockp} argument, information about that lock overwrites -@code{*@var{lockp}}. Existing locks are not reported if they are -compatible with making a new lock as specified. Thus, you should -specify a lock type of @code{F_WRLCK} if you want to find out about both -read and write locks, or @code{F_RDLCK} if you want to find out about -write locks only. - -There might be more than one lock affecting the region specified by the -@var{lockp} argument, but @code{fcntl} only returns information about -one of them. The @code{l_whence} member of the @var{lockp} structure is -set to @code{SEEK_SET} and the @code{l_start} and @code{l_len} fields -set to identify the locked region. - -If no lock applies, the only change to the @var{lockp} structure is to -update the @code{l_type} to a value of @code{F_UNLCK}. - -The normal return value from @code{fcntl} with this command is an -unspecified value other than @math{-1}, which is reserved to indicate an -error. The following @code{errno} error conditions are defined for -this command: - -@table @code -@item EBADF -The @var{filedes} argument is invalid. - -@item EINVAL -Either the @var{lockp} argument doesn't specify valid lock information, -or the file associated with @var{filedes} doesn't support locks. -@end table -@end deftypevr - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int F_SETLK -This macro is used as the @var{command} argument to @code{fcntl}, to -specify that it should set or clear a lock. This command requires a -third argument of type @w{@code{struct flock *}} to be passed to -@code{fcntl}, so that the form of the call is: - -@smallexample -fcntl (@var{filedes}, F_SETLK, @var{lockp}) -@end smallexample - -If the process already has a lock on any part of the region, the old lock -on that part is replaced with the new lock. You can remove a lock -by specifying a lock type of @code{F_UNLCK}. - -If the lock cannot be set, @code{fcntl} returns immediately with a value -of @math{-1}. This function does not block while waiting for other processes -to release locks. If @code{fcntl} succeeds, it returns a value other -than @math{-1}. - -The following @code{errno} error conditions are defined for this -function: - -@table @code -@item EAGAIN -@itemx EACCES -The lock cannot be set because it is blocked by an existing lock on the -file. Some systems use @code{EAGAIN} in this case, and other systems -use @code{EACCES}; your program should treat them alike, after -@code{F_SETLK}. (@gnulinuxhurdsystems{} always use @code{EAGAIN}.) - -@item EBADF -Either: the @var{filedes} argument is invalid; you requested a read lock -but the @var{filedes} is not open for read access; or, you requested a -write lock but the @var{filedes} is not open for write access. - -@item EINVAL -Either the @var{lockp} argument doesn't specify valid lock information, -or the file associated with @var{filedes} doesn't support locks. - -@item ENOLCK -The system has run out of file lock resources; there are already too -many file locks in place. - -Well-designed file systems never report this error, because they have no -limitation on the number of locks. However, you must still take account -of the possibility of this error, as it could result from network access -to a file system on another machine. -@end table -@end deftypevr - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int F_SETLKW -This macro is used as the @var{command} argument to @code{fcntl}, to -specify that it should set or clear a lock. It is just like the -@code{F_SETLK} command, but causes the process to block (or wait) -until the request can be specified. - -This command requires a third argument of type @code{struct flock *}, as -for the @code{F_SETLK} command. - -The @code{fcntl} return values and errors are the same as for the -@code{F_SETLK} command, but these additional @code{errno} error conditions -are defined for this command: - -@table @code -@item EINTR -The function was interrupted by a signal while it was waiting. -@xref{Interrupted Primitives}. - -@item EDEADLK -The specified region is being locked by another process. But that -process is waiting to lock a region which the current process has -locked, so waiting for the lock would result in deadlock. The system -does not guarantee that it will detect all such conditions, but it lets -you know if it notices one. -@end table -@end deftypevr - - -The following macros are defined for use as values for the @code{l_type} -member of the @code{flock} structure. The values are integer constants. - -@vtable @code -@comment fcntl.h -@comment POSIX.1 -@item F_RDLCK -This macro is used to specify a read (or shared) lock. - -@comment fcntl.h -@comment POSIX.1 -@item F_WRLCK -This macro is used to specify a write (or exclusive) lock. - -@comment fcntl.h -@comment POSIX.1 -@item F_UNLCK -This macro is used to specify that the region is unlocked. -@end vtable - -As an example of a situation where file locking is useful, consider a -program that can be run simultaneously by several different users, that -logs status information to a common file. One example of such a program -might be a game that uses a file to keep track of high scores. Another -example might be a program that records usage or accounting information -for billing purposes. - -Having multiple copies of the program simultaneously writing to the -file could cause the contents of the file to become mixed up. But -you can prevent this kind of problem by setting a write lock on the -file before actually writing to the file. - -If the program also needs to read the file and wants to make sure that -the contents of the file are in a consistent state, then it can also use -a read lock. While the read lock is set, no other process can lock -that part of the file for writing. - -@c ??? This section could use an example program. - -Remember that file locks are only an @emph{advisory} protocol for -controlling access to a file. There is still potential for access to -the file by programs that don't use the lock protocol. - -@node Open File Description Locks -@section Open File Description Locks - -In contrast to process-associated record locks (@pxref{File Locks}), -open file description record locks are associated with an open file -description rather than a process. - -Using @code{fcntl} to apply an open file description lock on a region that -already has an existing open file description lock that was created via the -same file descriptor will never cause a lock conflict. - -Open file description locks are also inherited by child processes across -@code{fork}, or @code{clone} with @code{CLONE_FILES} set -(@pxref{Creating a Process}), along with the file descriptor. - -It is important to distinguish between the open file @emph{description} (an -instance of an open file, usually created by a call to @code{open}) and -an open file @emph{descriptor}, which is a numeric value that refers to the -open file description. The locks described here are associated with the -open file @emph{description} and not the open file @emph{descriptor}. - -Using @code{dup} (@pxref{Duplicating Descriptors}) to copy a file -descriptor does not give you a new open file description, but rather copies a -reference to an existing open file description and assigns it to a new -file descriptor. Thus, open file description locks set on a file -descriptor cloned by @code{dup} will never conflict with open file -description locks set on the original descriptor since they refer to the -same open file description. Depending on the range and type of lock -involved, the original lock may be modified by a @code{F_OFD_SETLK} or -@code{F_OFD_SETLKW} command in this situation however. - -Open file description locks always conflict with process-associated locks, -even if acquired by the same process or on the same open file -descriptor. - -Open file description locks use the same @code{struct flock} as -process-associated locks as an argument (@pxref{File Locks}) and the -macros for the @code{command} values are also declared in the header file -@file{fcntl.h}. To use them, the macro @code{_GNU_SOURCE} must be -defined prior to including any header file. - -In contrast to process-associated locks, any @code{struct flock} used as -an argument to open file description lock commands must have the @code{l_pid} -value set to @math{0}. Also, when returning information about an -open file description lock in a @code{F_GETLK} or @code{F_OFD_GETLK} request, -the @code{l_pid} field in @code{struct flock} will be set to @math{-1} -to indicate that the lock is not associated with a process. - -When the same @code{struct flock} is reused as an argument to a -@code{F_OFD_SETLK} or @code{F_OFD_SETLKW} request after being used for an -@code{F_OFD_GETLK} request, it is necessary to inspect and reset the -@code{l_pid} field to @math{0}. - -@pindex fcntl.h. - -@deftypevr Macro int F_OFD_GETLK -This macro is used as the @var{command} argument to @code{fcntl}, to -specify that it should get information about a lock. This command -requires a third argument of type @w{@code{struct flock *}} to be passed -to @code{fcntl}, so that the form of the call is: - -@smallexample -fcntl (@var{filedes}, F_OFD_GETLK, @var{lockp}) -@end smallexample - -If there is a lock already in place that would block the lock described -by the @var{lockp} argument, information about that lock is written to -@code{*@var{lockp}}. Existing locks are not reported if they are -compatible with making a new lock as specified. Thus, you should -specify a lock type of @code{F_WRLCK} if you want to find out about both -read and write locks, or @code{F_RDLCK} if you want to find out about -write locks only. - -There might be more than one lock affecting the region specified by the -@var{lockp} argument, but @code{fcntl} only returns information about -one of them. Which lock is returned in this situation is undefined. - -The @code{l_whence} member of the @var{lockp} structure are set to -@code{SEEK_SET} and the @code{l_start} and @code{l_len} fields are set -to identify the locked region. - -If no conflicting lock exists, the only change to the @var{lockp} structure -is to update the @code{l_type} field to the value @code{F_UNLCK}. - -The normal return value from @code{fcntl} with this command is either @math{0} -on success or @math{-1}, which indicates an error. The following @code{errno} -error conditions are defined for this command: - -@table @code -@item EBADF -The @var{filedes} argument is invalid. - -@item EINVAL -Either the @var{lockp} argument doesn't specify valid lock information, -the operating system kernel doesn't support open file description locks, or the file -associated with @var{filedes} doesn't support locks. -@end table -@end deftypevr - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int F_OFD_SETLK -This macro is used as the @var{command} argument to @code{fcntl}, to -specify that it should set or clear a lock. This command requires a -third argument of type @w{@code{struct flock *}} to be passed to -@code{fcntl}, so that the form of the call is: - -@smallexample -fcntl (@var{filedes}, F_OFD_SETLK, @var{lockp}) -@end smallexample - -If the open file already has a lock on any part of the -region, the old lock on that part is replaced with the new lock. You -can remove a lock by specifying a lock type of @code{F_UNLCK}. - -If the lock cannot be set, @code{fcntl} returns immediately with a value -of @math{-1}. This command does not wait for other tasks -to release locks. If @code{fcntl} succeeds, it returns @math{0}. - -The following @code{errno} error conditions are defined for this -command: - -@table @code -@item EAGAIN -The lock cannot be set because it is blocked by an existing lock on the -file. - -@item EBADF -Either: the @var{filedes} argument is invalid; you requested a read lock -but the @var{filedes} is not open for read access; or, you requested a -write lock but the @var{filedes} is not open for write access. - -@item EINVAL -Either the @var{lockp} argument doesn't specify valid lock information, -the operating system kernel doesn't support open file description locks, or the -file associated with @var{filedes} doesn't support locks. - -@item ENOLCK -The system has run out of file lock resources; there are already too -many file locks in place. - -Well-designed file systems never report this error, because they have no -limitation on the number of locks. However, you must still take account -of the possibility of this error, as it could result from network access -to a file system on another machine. -@end table -@end deftypevr - -@comment fcntl.h -@comment POSIX.1 -@deftypevr Macro int F_OFD_SETLKW -This macro is used as the @var{command} argument to @code{fcntl}, to -specify that it should set or clear a lock. It is just like the -@code{F_OFD_SETLK} command, but causes the process to wait until the request -can be completed. - -This command requires a third argument of type @code{struct flock *}, as -for the @code{F_OFD_SETLK} command. - -The @code{fcntl} return values and errors are the same as for the -@code{F_OFD_SETLK} command, but these additional @code{errno} error conditions -are defined for this command: - -@table @code -@item EINTR -The function was interrupted by a signal while it was waiting. -@xref{Interrupted Primitives}. - -@end table -@end deftypevr - -Open file description locks are useful in the same sorts of situations as -process-associated locks. They can also be used to synchronize file -access between threads within the same process by having each thread perform -its own @code{open} of the file, to obtain its own open file description. - -Because open file description locks are automatically freed only upon -closing the last file descriptor that refers to the open file -description, this locking mechanism avoids the possibility that locks -are inadvertently released due to a library routine opening and closing -a file without the application being aware. - -As with process-associated locks, open file description locks are advisory. - -@node Open File Description Locks Example -@section Open File Description Locks Example - -Here is an example of using open file description locks in a threaded -program. If this program used process-associated locks, then it would be -subject to data corruption because process-associated locks are shared -by the threads inside a process, and thus cannot be used by one thread -to lock out another thread in the same process. - -Proper error handling has been omitted in the following program for -brevity. - -@smallexample -@include ofdlocks.c.texi -@end smallexample - -This example creates three threads each of which loops five times, -appending to the file. Access to the file is serialized via open file -description locks. If we compile and run the above program, we'll end up -with /tmp/foo that has 15 lines in it. - -If we, however, were to replace the @code{F_OFD_SETLK} and -@code{F_OFD_SETLKW} commands with their process-associated lock -equivalents, the locking essentially becomes a noop since it is all done -within the context of the same process. That leads to data corruption -(typically manifested as missing lines) as some threads race in and -overwrite the data written by others. - -@node Interrupt Input -@section Interrupt-Driven Input - -@cindex interrupt-driven input -If you set the @code{O_ASYNC} status flag on a file descriptor -(@pxref{File Status Flags}), a @code{SIGIO} signal is sent whenever -input or output becomes possible on that file descriptor. The process -or process group to receive the signal can be selected by using the -@code{F_SETOWN} command to the @code{fcntl} function. If the file -descriptor is a socket, this also selects the recipient of @code{SIGURG} -signals that are delivered when out-of-band data arrives on that socket; -see @ref{Out-of-Band Data}. (@code{SIGURG} is sent in any situation -where @code{select} would report the socket as having an ``exceptional -condition''. @xref{Waiting for I/O}.) - -If the file descriptor corresponds to a terminal device, then @code{SIGIO} -signals are sent to the foreground process group of the terminal. -@xref{Job Control}. - -@pindex fcntl.h -The symbols in this section are defined in the header file -@file{fcntl.h}. - -@comment fcntl.h -@comment BSD -@deftypevr Macro int F_GETOWN -This macro is used as the @var{command} argument to @code{fcntl}, to -specify that it should get information about the process or process -group to which @code{SIGIO} signals are sent. (For a terminal, this is -actually the foreground process group ID, which you can get using -@code{tcgetpgrp}; see @ref{Terminal Access Functions}.) - -The return value is interpreted as a process ID; if negative, its -absolute value is the process group ID. - -The following @code{errno} error condition is defined for this command: - -@table @code -@item EBADF -The @var{filedes} argument is invalid. -@end table -@end deftypevr - -@comment fcntl.h -@comment BSD -@deftypevr Macro int F_SETOWN -This macro is used as the @var{command} argument to @code{fcntl}, to -specify that it should set the process or process group to which -@code{SIGIO} signals are sent. This command requires a third argument -of type @code{pid_t} to be passed to @code{fcntl}, so that the form of -the call is: - -@smallexample -fcntl (@var{filedes}, F_SETOWN, @var{pid}) -@end smallexample - -The @var{pid} argument should be a process ID. You can also pass a -negative number whose absolute value is a process group ID. - -The return value from @code{fcntl} with this command is @math{-1} -in case of error and some other value if successful. The following -@code{errno} error conditions are defined for this command: - -@table @code -@item EBADF -The @var{filedes} argument is invalid. - -@item ESRCH -There is no process or process group corresponding to @var{pid}. -@end table -@end deftypevr - -@c ??? This section could use an example program. - -@node IOCTLs -@section Generic I/O Control operations -@cindex generic i/o control operations -@cindex IOCTLs - -@gnusystems{} can handle most input/output operations on many different -devices and objects in terms of a few file primitives - @code{read}, -@code{write} and @code{lseek}. However, most devices also have a few -peculiar operations which do not fit into this model. Such as: - -@itemize @bullet - -@item -Changing the character font used on a terminal. - -@item -Telling a magnetic tape system to rewind or fast forward. (Since they -cannot move in byte increments, @code{lseek} is inapplicable). - -@item -Ejecting a disk from a drive. - -@item -Playing an audio track from a CD-ROM drive. - -@item -Maintaining routing tables for a network. - -@end itemize - -Although some such objects such as sockets and terminals -@footnote{Actually, the terminal-specific functions are implemented with -IOCTLs on many platforms.} have special functions of their own, it would -not be practical to create functions for all these cases. - -Instead these minor operations, known as @dfn{IOCTL}s, are assigned code -numbers and multiplexed through the @code{ioctl} function, defined in -@code{sys/ioctl.h}. The code numbers themselves are defined in many -different headers. - -@comment sys/ioctl.h -@comment BSD -@deftypefun int ioctl (int @var{filedes}, int @var{command}, @dots{}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} - -The @code{ioctl} function performs the generic I/O operation -@var{command} on @var{filedes}. - -A third argument is usually present, either a single number or a pointer -to a structure. The meaning of this argument, the returned value, and -any error codes depends upon the command used. Often @math{-1} is -returned for a failure. - -@end deftypefun - -On some systems, IOCTLs used by different devices share the same numbers. -Thus, although use of an inappropriate IOCTL @emph{usually} only produces -an error, you should not attempt to use device-specific IOCTLs on an -unknown device. - -Most IOCTLs are OS-specific and/or only used in special system utilities, -and are thus beyond the scope of this document. For an example of the use -of an IOCTL, see @ref{Out-of-Band Data}. - -@c FIXME this is undocumented: -@c dup3 |