From 815d8147a3418334ffa91e2384c6e159f0809d65 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Tue, 17 Nov 2009 16:23:24 -0800 Subject: Fix ranges with multibyte characters as endpoints. This is another bug in computing the fastmap. It was reported by a user of sed because it usually does not happen with !_LIBC. However, it is there in that case too. The bug is that whenever we have a range at the beginning of the regex, the regex must be tested on any possible multibyte character. The reason why _LIBC masks it, is that in general there is a collation symbol for each possible multibyte-character lead byte, so all the lead bytes are in general already part of the fastmap. The tests use cyrillic characters as an example. With _LIBC, they pass without the patch too, but you can make them fail by removing collation symbols handling. --- ChangeLog | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'ChangeLog') diff --git a/ChangeLog b/ChangeLog index b92fd42c9d..0a6ae196a8 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,11 @@ +2009-11-17 Paolo Bonzini + + * posix/bug-regex30.c: New file. + * posix/Makefile: Add rules to build and run bug-regex30. + * posix/regcomp.c (re_compile_fastmap_iter): Add all multibyte + character lead bytes when there is a range in a COMPLEX_BRACKET. + Reported by Oleg Bylatov. + 2009-11-17 Ulrich Drepper [BZ #10969] -- cgit v1.2.3