aboutsummaryrefslogtreecommitdiff
path: root/manual/charset.texi
diff options
context:
space:
mode:
Diffstat (limited to 'manual/charset.texi')
-rw-r--r--manual/charset.texi46
1 files changed, 23 insertions, 23 deletions
diff --git a/manual/charset.texi b/manual/charset.texi
index d7d82ad006..610db90858 100644
--- a/manual/charset.texi
+++ b/manual/charset.texi
@@ -361,7 +361,7 @@ the @code{LC_CTYPE} category of the current locale is used; see
The functions handling more than one character at a time require NUL
terminated strings as the argument (i.e., converting blocks of text
does not work unless one can add a NUL byte at an appropriate place).
-The GNU C library contains some extensions to the standard that allow
+@Theglibc{} contains some extensions to the standard that allow
specifying a size, but basically they also expect terminated strings.
@end itemize
@@ -418,7 +418,7 @@ a compile-time constant and is defined in @file{limits.h}.
maximum number of bytes in a multibyte character in the current locale.
The value is never greater than @code{MB_LEN_MAX}. Unlike
@code{MB_LEN_MAX} this macro need not be a compile-time constant, and in
-the GNU C library it is not.
+@theglibc{} it is not.
@pindex stdlib.h
@code{MB_CUR_MAX} is defined in @file{stdlib.h}.
@@ -793,7 +793,7 @@ character sequence but the one representing the NUL wide character.
Therefore, the @code{mbrlen} function will never read invalid memory.
Now that this function is available (just to make this clear, this
-function is @emph{not} part of the GNU C library) we can compute the
+function is @emph{not} part of @theglibc{}) we can compute the
number of wide character required to store the converted multibyte
character string @var{s} using
@@ -949,7 +949,7 @@ The functions described in the previous section only convert a single
character at a time. Most operations to be performed in real-world
programs include strings and therefore the @w{ISO C} standard also
defines conversions on entire strings. However, the defined set of
-functions is quite limited; therefore, the GNU C library contains a few
+functions is quite limited; therefore, @theglibc{} contains a few
extensions that can help in some important situations.
@comment wchar.h
@@ -1030,7 +1030,7 @@ therefore, should never be used in generally used code.
The generic conversion interface (@pxref{Generic Charset Conversion})
does not have this limitation (it simply works on buffers, not
-strings), and the GNU C library contains a set of functions that take
+strings), and @theglibc{} contains a set of functions that take
additional parameters specifying the maximal number of bytes that are
consumed from the input string. This way the problem of
@code{mbsrtowcs}'s example above could be solved by determining the line
@@ -1528,8 +1528,8 @@ The conversion functions mentioned so far in this chapter all had in
common that they operate on character sets that are not directly
specified by the functions. The multibyte encoding used is specified by
the currently selected locale for the @code{LC_CTYPE} category. The
-wide character set is fixed by the implementation (in the case of GNU C
-library it is always UCS-4 encoded @w{ISO 10646}.
+wide character set is fixed by the implementation (in the case of @theglibc{}
+it is always UCS-4 encoded @w{ISO 10646}.
This has of course several problems when it comes to general character
conversion:
@@ -1648,7 +1648,7 @@ An @code{iconv} descriptor is like a file descriptor as for every use a
new descriptor must be created. The descriptor does not stand for all
of the conversions from @var{fromset} to @var{toset}.
-The GNU C library implementation of @code{iconv_open} has one
+The @glibcadj{} implementation of @code{iconv_open} has one
significant extension to other implementations. To ease the extension
of the set of available conversions, the implementation allows storing
the necessary files with data and code in an arbitrary number of
@@ -1740,7 +1740,7 @@ from the initial state. It is important that the programmer never makes
any assumption as to whether the conversion has to deal with states.
Even if the input and output character sets are not stateful, the
implementation might still have to keep states. This is due to the
-implementation chosen for the GNU C library as it is described below.
+implementation chosen for @theglibc{} as it is described below.
Therefore an @code{iconv} call to reset the state should always be
performed if some protocol requires this for the output text.
@@ -1761,7 +1761,7 @@ Since the character sets selected in the @code{iconv_open} call can be
almost arbitrary, there can be situations where the input buffer contains
valid characters, which have no identical representation in the output
character set. The behavior in this situation is undefined. The
-@emph{current} behavior of the GNU C library in this situation is to
+@emph{current} behavior of @theglibc{} in this situation is to
return with an error immediately. This certainly is not the most
desirable solution; therefore, future versions will provide better ones,
but they are not yet finished.
@@ -1980,7 +1980,7 @@ the door open for extensions and improvements, but this design is also
limiting on some platforms since not many platforms support dynamic
loading in statically linked programs. On platforms without this
capability it is therefore not possible to use this interface in
-statically linked programs. The GNU C library has, on ELF platforms, no
+statically linked programs. @Theglibc{} has, on ELF platforms, no
problems with dynamic loading in these situations; therefore, this
point is moot. The danger is that one gets acquainted with this
situation and forgets about the restrictions on other systems.
@@ -2054,38 +2054,38 @@ such conversion, one could make sure this also is true for indirect
routes.
@node glibc iconv Implementation
-@subsection The @code{iconv} Implementation in the GNU C library
+@subsection The @code{iconv} Implementation in @theglibc{}
After reading about the problems of @code{iconv} implementations in the
last section it is certainly good to note that the implementation in
-the GNU C library has none of the problems mentioned above. What
+@theglibc{} has none of the problems mentioned above. What
follows is a step-by-step analysis of the points raised above. The
evaluation is based on the current state of the development (as of
January 1999). The development of the @code{iconv} functions is not
complete, but basic functionality has solidified.
-The GNU C library's @code{iconv} implementation uses shared loadable
+@Theglibc{}'s @code{iconv} implementation uses shared loadable
modules to implement the conversions. A very small number of
conversions are built into the library itself but these are only rather
trivial conversions.
-All the benefits of loadable modules are available in the GNU C library
+All the benefits of loadable modules are available in the @glibcadj{}
implementation. This is especially appealing since the interface is
well documented (see below), and it, therefore, is easy to write new
conversion modules. The drawback of using loadable objects is not a
-problem in the GNU C library, at least on ELF systems. Since the
+problem in @theglibc{}, at least on ELF systems. Since the
library is able to load shared objects even in statically linked
binaries, static linking need not be forbidden in case one wants to use
@code{iconv}.
The second mentioned problem is the number of supported conversions.
-Currently, the GNU C library supports more than 150 character sets. The
+Currently, @theglibc{} supports more than 150 character sets. The
way the implementation is designed the number of supported conversions
is greater than 22350 (@math{150} times @math{149}). If any conversion
from or to a character set is missing, it can be added easily.
Particularly impressive as it may be, this high number is due to the
-fact that the GNU C library implementation of @code{iconv} does not have
+fact that the @glibcadj{} implementation of @code{iconv} does not have
the third problem mentioned above (i.e., whenever there is a conversion
from a character set @math{@cal{A}} to @math{@cal{B}} and from
@math{@cal{B}} to @math{@cal{C}} it is always possible to convert from
@@ -2115,7 +2115,7 @@ the input to @w{ISO 10646} first. The two character sets of interest
are much more similar to each other than to @w{ISO 10646}.
In such a situation one easily can write a new conversion and provide it
-as a better alternative. The GNU C library @code{iconv} implementation
+as a better alternative. The @glibcadj{} @code{iconv} implementation
would automatically use the module implementing the conversion if it is
specified to be more efficient.
@@ -2207,7 +2207,7 @@ file, however, specifies that the new conversion modules can perform this
conversion with only the cost of @math{1}.
A mysterious item about the @file{gconv-modules} file above (and also
-the file coming with the GNU C library) are the names of the character
+the file coming with @theglibc{}) are the names of the character
sets specified in the @code{module} lines. Why do almost all the names
end in @code{//}? And this is not all: the names can actually be
regular expressions. At this point in time this mystery should not be
@@ -2423,7 +2423,7 @@ loads the objects with the conversions.
It is often the case that one conversion is used more than once (i.e.,
there are several @code{iconv_open} calls for the same set of character
sets during one program run). The @code{mbsrtowcs} et.al.@: functions in
-the GNU C library also use the @code{iconv} functionality, which
+@theglibc{} also use the @code{iconv} functionality, which
increases the number of uses of the same functions even more.
Because of this multiple use of conversions, the modules do not get
@@ -2888,8 +2888,8 @@ gconv (struct __gconv_step *step, struct __gconv_step_data *data,
@end deftypevr
This information should be sufficient to write new modules. Anybody
-doing so should also take a look at the available source code in the GNU
-C library sources. It contains many examples of working and optimized
+doing so should also take a look at the available source code in the
+@glibcadj{} sources. It contains many examples of working and optimized
modules.
@c File charset.texi edited October 2001 by Dennis Grace, IBM Corporation