diff options
author | Ulrich Drepper <drepper@redhat.com> | 2003-11-29 06:13:09 +0000 |
---|---|---|
committer | Ulrich Drepper <drepper@redhat.com> | 2003-11-29 06:13:09 +0000 |
commit | bb3f4825c411e676c51479fea59643af540810b5 (patch) | |
tree | c16c5849da218c04f66b8ccc47b1e911d673493d /posix | |
parent | 46bf9de7b17d4539a19b69bd8407d5d2987b034b (diff) | |
download | glibc-bb3f4825c411e676c51479fea59643af540810b5.tar glibc-bb3f4825c411e676c51479fea59643af540810b5.tar.gz glibc-bb3f4825c411e676c51479fea59643af540810b5.tar.bz2 glibc-bb3f4825c411e676c51479fea59643af540810b5.zip |
Update.
2003-11-28 Ulrich Drepper <drepper@redhat.com>
* sysdeps/x86_64/fpu/libm-test-ulps: Add some more minor changes
to compensate other setup.
2003-11-27 Andreas Jaeger <aj@suse.de>
* sysdeps/x86_64/fpu/libm-test-ulps: Add ulps for new atan2 test.
* math/libm-test.inc (atan2_test): Add test that run infinitly.
Reported by "Willus" <etc231etc231@willus.com>.
2003-11-27 Michael Matz <matz@suse.de>
* sysdeps/ieee754/dbl-64/mpsqrt.c (fastiroot): Fix 64-bit problem
with wrong types.
2003-11-28 Jakub Jelinek <jakub@redhat.com>
* posix/regexec.c (acquire_init_state_context): Make inline.
Add always_inline attribute.
(check_matching): Add BE macro. Move if (cur_state->has_backref)
into if (dfa->nbackref).
(sift_states_backward): Fix comment.
(transit_state): Add BE macro. Move if (next_state->has_backref)
into if (dfa->nbackref && next_state). Don't check for next_state
!= NULL twice.
* posix/regcomp.c (peek_token): Use opr.ctx_type instead of opr.idx
for ANCHOR.
(parse_expression): Only call init_word_char if word context will be
needed.
* posix/bug-regex11.c (tests): Add new tests.
* posix/tst-regex.c: Include getopt.h.
(timing): New variable.
(main): Set timing to 1 if --timing argument is present.
Add 2 new tests.
(run_test, run_test_backwards): Handle timing.
2003-11-27 Jakub Jelinek <jakub@redhat.com>
* posix/regex_internal.h (re_string_t): Remove mbs_case field.
Add offsets, valid_raw_len, raw_len, raw_stop, mbs_allocated and
offsets_needed fields. Change icase, is_utf8 and map_notascii
type from int bitfield to unsigned char.
(MBS_ALLOCATED, MBS_CASE_ALLOCATED): Remove.
(build_wcs_upper_buffer): Change prototype to return int.
(re_string_peek_byte_case, re_string_fetch_byte_case): Remove
defines, add prototypes.
* posix/regex_internal.c (re_string_allocate): Don't initialize
stop here. Don't initialize mbs_case. Set valid_raw_len.
Use mbs_allocated instead of MBS_* macros.
(re_string_construct): Don't initialize stop and valid_len here.
Don't initialize mbs_case. Use mbs_allocated instead of MBS_*
macros. Reallocate buffers if build_wcs_upper_buffer converted
too few bytes. Set valid_len to bufs_len only for single byte
no translation and set in that case valid_raw_len as well.
(re_string_realloc_buffers): Reallocate offsets if not NULL.
Use mbs_allocated instead of MBS_ALLOCATED. Don't reallocate
mbs_case.
(re_string_construct_common): Initialize raw_len, mbs_allocated,
stop and raw_stop.
(build_wcs_buffer): Apply pstr->trans before mbrtowc instead of
after it. Set valid_raw_len. Don't set mbs_case.
(build_wcs_upper_buffer): Return REG_NOERROR or REG_ESPACE.
Only use the fast path if !pstr->offsets_needed. Apply pstr->trans
before mbrtowc instead of after it. If upper case character
uses different number of bytes than lower case, goto to the
slow path. Don't call towupper unnecessarily twice. Set
valid_raw_len as well. Handle in the slow path the case if
lower and upper case use different number of characters.
Don't set mbs_case.
(re_string_skip_chars): Use valid_raw_len instead of valid_len.
(build_upper_buffer): Don't set mbs_case. Add BE macro. Set
valid_raw_len.
(re_string_translate_buffer): Set mbs instead of mbs_case. Set
valid_raw_len.
(re_string_reconstruct): Use raw_len/raw_stop to initialize
len/stop. Clear valid_raw_len and offsets_needed when clearing
valid_len. Use mbs_allocated instead of MBS_* macros.
Check original offset against valid_raw_len instead of valid_len.
Remove mbs_case handling. Adjust valid_raw_len together with
valid_len. If is_utf8 and looking for tip context, apply
pstr->trans first. If buffers start with partial multi-byte
character, initialize mbs array as well if mbs_allocated.
Check return value of build_wcs_upper_buffer.
(re_string_peek_byte_case): New function.
(re_string_fetch_byte_case): New function.
(re_string_destruct): Use mbs_allocated instead of MBS_ALLOCATED.
Don't free mbs_case. Free offsets.
* posix/regcomp.c (init_dfa): Only check if charset name is UTF-8
if mb_cur_max == 6.
* posix/regexec.c (re_search_internal): Initialize input.raw_stop
as well. Use valid_raw_len instead of valid_len when looking
through fastmap. Adjust registers through input.offsets.
(extend_buffers): Allow build_wcs_upper_buffer to fail.
* posix/bug-regex18.c (tests): Enable #ifdefed out tests. Add new
tests.
Diffstat (limited to 'posix')
-rw-r--r-- | posix/bug-regex11.c | 8 | ||||
-rw-r--r-- | posix/bug-regex18.c | 18 | ||||
-rw-r--r-- | posix/regcomp.c | 22 | ||||
-rw-r--r-- | posix/regex_internal.h | 37 | ||||
-rw-r--r-- | posix/regexec.c | 75 | ||||
-rw-r--r-- | posix/tst-regex.c | 141 |
6 files changed, 231 insertions, 70 deletions
diff --git a/posix/bug-regex11.c b/posix/bug-regex11.c index 44b4d927cd..29fa7def79 100644 --- a/posix/bug-regex11.c +++ b/posix/bug-regex11.c @@ -71,10 +71,18 @@ struct { "(bb())\\2\\1", "bbbb", REG_EXTENDED, 3, { { 0, 4 }, { 0, 2 }, { 2, 2 } } }, { "^(.?)(.?)(.?)(.?)(.?).?\\5\\4\\3\\2\\1$", "level", REG_NOSUB | REG_EXTENDED, 0, { { -1, -1 } } }, + { "^(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.).?\\9\\8\\7\\6\\5\\4\\3\\2\\1$|^.?$", + "level", REG_NOSUB | REG_EXTENDED, 0, { { -1, -1 } } }, + { "^(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.).?\\9\\8\\7\\6\\5\\4\\3\\2\\1$|^.?$", + "abcdedcba", REG_EXTENDED, 1, { { 0, 9 } } }, #if 0 /* XXX Not used since they fail so far. */ + { "^(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.).?\\9\\8\\7\\6\\5\\4\\3\\2\\1$|^.?$", + "ababababa", REG_EXTENDED, 1, { { 0, 9 } } }, { "^(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?).?\\9\\8\\7\\6\\5\\4\\3\\2\\1$", "level", REG_NOSUB | REG_EXTENDED, 0, { { -1, -1 } } }, + { "^(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?).?\\9\\8\\7\\6\\5\\4\\3\\2\\1$", + "ababababa", REG_EXTENDED, 1, { { 0, 9 } } }, #endif }; diff --git a/posix/bug-regex18.c b/posix/bug-regex18.c index 503c36b39c..a193ed9215 100644 --- a/posix/bug-regex18.c +++ b/posix/bug-regex18.c @@ -33,17 +33,23 @@ struct int flags, nmatch; regmatch_t rm[5]; } tests[] = { - /* \xc4\xb0 LATIN CAPITAL LETTER I WITH DOT ABOVE - \xc4\xb1 LATIN SMALL LETTER DOTLESS I */ -#if 0 - /* XXX Not used since they fail so far. */ + /* \xc4\xb0 LATIN CAPITAL LETTER I WITH DOT ABOVE + \xc4\xb1 LATIN SMALL LETTER DOTLESS I + \xe2\x80\x94 EM DASH */ { "\xc4\xb0I*\xc4\xb1$", "aBi\xc4\xb1\xc4\xb1I", REG_ICASE, 2, { { 2, 8 }, { -1, -1 } } }, { "[\xc4\xb0x]I*\xc4\xb1$", "aBi\xc4\xb1\xc4\xb1I", REG_ICASE, 2, { { 2, 8 }, { -1, -1 } } }, { "[^x]I*\xc4\xb1$", "aBi\xc4\xb1\xc4\xb1I", REG_ICASE, 2, - { { 2, 8 }, { -1, -1 } } } -#endif + { { 2, 8 }, { -1, -1 } } }, + { "([[:alpha:]]i[[:xdigit:]])(\xc4\xb1*)(\xc4\xb0{2})", + "\xe2\x80\x94\xc4\xb1\xc4\xb0""fIi\xc4\xb0ii", REG_ICASE | REG_EXTENDED, + 4, { { 3, 12 }, { 3, 8 }, { 8, 9 }, { 9, 12 } } }, + { "\xc4\xb1i(i)*()(\\s\xc4\xb0|\\SI)", "SIi\xc4\xb0\xc4\xb0 is", + REG_ICASE | REG_EXTENDED, 4, { { 1, 9 }, { 5, 7 }, { 7, 7 }, { 7, 9 } } }, + { "\xc4\xb1i(i)*()(\\s\xc4\xb0|\\SI)", "\xc4\xb1\xc4\xb0\xc4\xb0iJ\xc4\xb1", + REG_ICASE | REG_EXTENDED, 4, + { { 0, 10 }, { 6, 7 }, { 7, 7 }, { 7, 10 } } }, }; int diff --git a/posix/regcomp.c b/posix/regcomp.c index 1f1c85926e..bdcc59da1a 100644 --- a/posix/regcomp.c +++ b/posix/regcomp.c @@ -838,7 +838,7 @@ init_dfa (dfa, pat_len) dfa->mb_cur_max = MB_CUR_MAX; #ifdef _LIBC - if (dfa->mb_cur_max > 1 + if (dfa->mb_cur_max == 6 && strcmp (_NL_CURRENT (LC_CTYPE, _NL_CTYPE_CODESET_NAME), "UTF-8") == 0) dfa->is_utf8 = 1; dfa->map_notascii = (_NL_CURRENT_WORD (LC_CTYPE, _NL_CTYPE_MAP_TO_NONASCII) @@ -1711,28 +1711,28 @@ peek_token (token, input, syntax) if (!(syntax & RE_NO_GNU_OPS)) { token->type = ANCHOR; - token->opr.idx = WORD_FIRST; + token->opr.ctx_type = WORD_FIRST; } break; case '>': if (!(syntax & RE_NO_GNU_OPS)) { token->type = ANCHOR; - token->opr.idx = WORD_LAST; + token->opr.ctx_type = WORD_LAST; } break; case 'b': if (!(syntax & RE_NO_GNU_OPS)) { token->type = ANCHOR; - token->opr.idx = WORD_DELIM; + token->opr.ctx_type = WORD_DELIM; } break; case 'B': if (!(syntax & RE_NO_GNU_OPS)) { token->type = ANCHOR; - token->opr.idx = INSIDE_WORD; + token->opr.ctx_type = INSIDE_WORD; } break; case 'w': @@ -1755,14 +1755,14 @@ peek_token (token, input, syntax) if (!(syntax & RE_NO_GNU_OPS)) { token->type = ANCHOR; - token->opr.idx = BUF_FIRST; + token->opr.ctx_type = BUF_FIRST; } break; case '\'': if (!(syntax & RE_NO_GNU_OPS)) { token->type = ANCHOR; - token->opr.idx = BUF_LAST; + token->opr.ctx_type = BUF_LAST; } break; case '(': @@ -1858,7 +1858,7 @@ peek_token (token, input, syntax) break; } token->type = ANCHOR; - token->opr.idx = LINE_FIRST; + token->opr.ctx_type = LINE_FIRST; break; case '$': if (!(syntax & RE_CONTEXT_INDEP_ANCHORS) && @@ -1872,7 +1872,7 @@ peek_token (token, input, syntax) break; } token->type = ANCHOR; - token->opr.idx = LINE_LAST; + token->opr.ctx_type = LINE_LAST; break; default: break; @@ -2217,7 +2217,9 @@ parse_expression (regexp, preg, token, syntax, nest, err) } break; case ANCHOR: - if (dfa->word_char == NULL) + if ((token->opr.ctx_type + & (WORD_DELIM | INSIDE_WORD | WORD_FIRST | WORD_LAST)) + && dfa->word_char == NULL) { *err = init_word_char (dfa); if (BE (*err != REG_NOERROR, 0)) diff --git a/posix/regex_internal.h b/posix/regex_internal.h index f8e99ee06a..214f7af6c0 100644 --- a/posix/regex_internal.h +++ b/posix/regex_internal.h @@ -302,13 +302,10 @@ struct re_string_t REG_ICASE, upper cases of the string are stored, otherwise MBS points the same address that RAW_MBS points. */ unsigned char *mbs; - /* Store the case sensitive multibyte string. In case of - "case insensitive mode", the original string are stored, - otherwise MBS_CASE points the same address that MBS points. */ - unsigned char *mbs_case; #ifdef RE_ENABLE_I18N /* Store the wide character string which is corresponding to MBS. */ wint_t *wcs; + int *offsets; mbstate_t cur_state; #endif /* Index in RAW_MBS. Each character mbs[i] corresponds to @@ -316,15 +313,21 @@ struct re_string_t int raw_mbs_idx; /* The length of the valid characters in the buffers. */ int valid_len; - /* The length of the buffers MBS, MBS_CASE, and WCS. */ + /* The corresponding number of bytes in raw_mbs array. */ + int valid_raw_len; + /* The length of the buffers MBS and WCS. */ int bufs_len; /* The index in MBS, which is updated by re_string_fetch_byte. */ int cur_idx; - /* This is length_of_RAW_MBS - RAW_MBS_IDX. */ + /* length of RAW_MBS array. */ + int raw_len; + /* This is RAW_LEN - RAW_MBS_IDX + VALID_LEN - VALID_RAW_LEN. */ int len; /* End of the buffer may be shorter than its length in the cases such as re_match_2, re_search_2. Then, we use STOP for end of the buffer instead of LEN. */ + int raw_stop; + /* This is RAW_STOP - RAW_MBS_IDX adjusted through OFFSETS. */ int stop; /* The context of mbs[0]. We store the context independently, since @@ -334,17 +337,14 @@ struct re_string_t /* The translation passed as a part of an argument of re_compile_pattern. */ RE_TRANSLATE_TYPE trans; /* 1 if REG_ICASE. */ - unsigned int icase : 1; - unsigned int is_utf8 : 1; - unsigned int map_notascii : 1; + unsigned char icase; + unsigned char is_utf8; + unsigned char map_notascii; + unsigned char mbs_allocated; + unsigned char offsets_needed; int mb_cur_max; }; typedef struct re_string_t re_string_t; -/* In case of REG_ICASE, we allocate the buffer dynamically for mbs. */ -#define MBS_ALLOCATED(pstr) (pstr->icase) -/* In case that we need translation, we allocate the buffer dynamically - for mbs_case. Note that mbs == mbs_case if not REG_ICASE. */ -#define MBS_CASE_ALLOCATED(pstr) (pstr->trans != NULL) struct re_dfa_t; @@ -363,7 +363,7 @@ static reg_errcode_t re_string_realloc_buffers (re_string_t *pstr, int new_buf_len); # ifdef RE_ENABLE_I18N static void build_wcs_buffer (re_string_t *pstr); -static void build_wcs_upper_buffer (re_string_t *pstr); +static int build_wcs_upper_buffer (re_string_t *pstr); # endif /* RE_ENABLE_I18N */ static void build_upper_buffer (re_string_t *pstr); static void re_string_translate_buffer (re_string_t *pstr); @@ -375,15 +375,14 @@ static inline wint_t re_string_wchar_at (const re_string_t *pstr, int idx); # endif /* RE_ENABLE_I18N */ static unsigned int re_string_context_at (const re_string_t *input, int idx, int eflags, int newline_anchor); +static unsigned char re_string_peek_byte_case (const re_string_t *pstr, + int idx); +static unsigned char re_string_fetch_byte_case (re_string_t *pstr); #endif #define re_string_peek_byte(pstr, offset) \ ((pstr)->mbs[(pstr)->cur_idx + offset]) -#define re_string_peek_byte_case(pstr, offset) \ - ((pstr)->mbs_case[(pstr)->cur_idx + offset]) #define re_string_fetch_byte(pstr) \ ((pstr)->mbs[(pstr)->cur_idx++]) -#define re_string_fetch_byte_case(pstr) \ - ((pstr)->mbs_case[(pstr)->cur_idx++]) #define re_string_first_byte(pstr, idx) \ ((idx) == (pstr)->valid_len || (pstr)->wcs[idx] != WEOF) #define re_string_is_single_byte_char(pstr, idx) \ diff --git a/posix/regexec.c b/posix/regexec.c index 53f49ea972..1942a7fee9 100644 --- a/posix/regexec.c +++ b/posix/regexec.c @@ -50,10 +50,9 @@ static int re_search_stub (struct re_pattern_buffer *bufp, int ret_len); static unsigned re_copy_regs (struct re_registers *regs, regmatch_t *pmatch, int nregs, int regs_allocated); -static re_dfastate_t *acquire_init_state_context (reg_errcode_t *err, - const regex_t *preg, - const re_match_context_t *mctx, - int idx); +static inline re_dfastate_t *acquire_init_state_context + (reg_errcode_t *err, const regex_t *preg, const re_match_context_t *mctx, + int idx) __attribute ((always_inline)); static reg_errcode_t prune_impossible_nodes (const regex_t *preg, re_match_context_t *mctx); static int check_matching (const regex_t *preg, re_match_context_t *mctx, @@ -609,6 +608,7 @@ re_search_internal (preg, string, length, start, range, stop, nmatch, pmatch, if (BE (err != REG_NOERROR, 0)) goto free_return; input.stop = stop; + input.raw_stop = stop; err = match_ctx_init (&mctx, eflags, &input, dfa->nbackref * 2); if (BE (err != REG_NOERROR, 0)) @@ -703,7 +703,7 @@ re_search_internal (preg, string, length, start, range, stop, nmatch, pmatch, instead. */ /* If MATCH_FIRST is out of the valid range, reconstruct the buffers. */ - if (input.raw_mbs_idx + input.valid_len <= match_first + if (input.raw_mbs_idx + input.valid_raw_len <= match_first || match_first < input.raw_mbs_idx) { err = re_string_reconstruct (&input, match_first, eflags, @@ -807,6 +807,17 @@ re_search_internal (preg, string, length, start, range, stop, nmatch, pmatch, for (reg_idx = 0; reg_idx < nmatch; ++reg_idx) if (pmatch[reg_idx].rm_so != -1) { + if (BE (input.offsets_needed != 0, 0)) + { + if (pmatch[reg_idx].rm_so == input.valid_len) + pmatch[reg_idx].rm_so += input.valid_raw_len - input.valid_len; + else + pmatch[reg_idx].rm_so = input.offsets[pmatch[reg_idx].rm_so]; + if (pmatch[reg_idx].rm_eo == input.valid_len) + pmatch[reg_idx].rm_eo += input.valid_raw_len - input.valid_len; + else + pmatch[reg_idx].rm_eo = input.offsets[pmatch[reg_idx].rm_eo]; + } pmatch[reg_idx].rm_so += match_first; pmatch[reg_idx].rm_eo += match_first; } @@ -909,7 +920,7 @@ prune_impossible_nodes (preg, mctx) We must select appropriate initial state depending on the context, since initial states may have constraints like "\<", "^", etc.. */ -static re_dfastate_t * +static inline re_dfastate_t * acquire_init_state_context (err, preg, mctx, idx) reg_errcode_t *err; const regex_t *preg; @@ -976,22 +987,22 @@ check_matching (preg, mctx, fl_longest_match) /* Check OP_OPEN_SUBEXP in the initial state in case that we use them later. E.g. Processing back references. */ - if (dfa->nbackref) + if (BE (dfa->nbackref, 0)) { err = check_subexp_matching_top (dfa, mctx, &cur_state->nodes, 0); if (BE (err != REG_NOERROR, 0)) return err; - } - if (cur_state->has_backref) - { - err = transit_state_bkref (preg, &cur_state->nodes, mctx); - if (BE (err != REG_NOERROR, 0)) - return err; + if (cur_state->has_backref) + { + err = transit_state_bkref (preg, &cur_state->nodes, mctx); + if (BE (err != REG_NOERROR, 0)) + return err; + } } /* If the RE accepts NULL string. */ - if (cur_state->halt) + if (BE (cur_state->halt, 0)) { if (!cur_state->has_constraint || check_halt_state_context (preg, cur_state, mctx, cur_str_idx)) @@ -1372,11 +1383,11 @@ update_regs (dfa, pmatch, cur_node, cur_idx, nmatch) i. If 'b' isn't in the STATE_LOG[STR_IDX+strlen('s')], we throw away the node `a'. ii. If 'b' is in the STATE_LOG[STR_IDX+strlen('s')] but 'b' is - throwed away, we throw away the node `a'. + thrown away, we throw away the node `a'. 3. When 0 <= STR_IDX < MATCH_LAST and 'a' epsilon transit to 'b': i. If 'b' isn't in the STATE_LOG[STR_IDX], we throw away the node `a'. - ii. If 'b' is in the STATE_LOG[STR_IDX] but 'b' is throwed away, + ii. If 'b' is in the STATE_LOG[STR_IDX] but 'b' is thrown away, we throw away the node `a'. */ #define STATE_NODE_CONTAINS(state,node) \ @@ -2041,7 +2052,7 @@ sift_states_iter_mb (preg, mctx, sctx, node_idx, str_idx, max_str_idx) !STATE_NODE_CONTAINS (sctx->sifted_states[str_idx + naccepted], dfa->nexts[node_idx])) /* The node can't accept the `multi byte', or the - destination was already throwed away, then the node + destination was already thrown away, then the node could't accept the current input `multi byte'. */ naccepted = 0; /* Otherwise, it is sure that the node could accept @@ -2188,24 +2199,24 @@ transit_state (err, preg, mctx, state) } } - /* Check OP_OPEN_SUBEXP in the current state in case that we use them - later. We must check them here, since the back references in the - next state might use them. */ - if (dfa->nbackref && next_state/* && fl_process_bkref */) + if (BE (dfa->nbackref, 0) && next_state != NULL) { + /* Check OP_OPEN_SUBEXP in the current state in case that we use them + later. We must check them here, since the back references in the + next state might use them. */ *err = check_subexp_matching_top (dfa, mctx, &next_state->nodes, cur_idx); if (BE (*err != REG_NOERROR, 0)) return NULL; - } - /* If the next state has back references. */ - if (next_state != NULL && next_state->has_backref) - { - *err = transit_state_bkref (preg, &next_state->nodes, mctx); - if (BE (*err != REG_NOERROR, 0)) - return NULL; - next_state = mctx->state_log[cur_idx]; + /* If the next state has back references. */ + if (next_state->has_backref) + { + *err = transit_state_bkref (preg, &next_state->nodes, mctx); + if (BE (*err != REG_NOERROR, 0)) + return NULL; + next_state = mctx->state_log[cur_idx]; + } } return next_state; } @@ -3858,7 +3869,11 @@ extend_buffers (mctx) { #ifdef RE_ENABLE_I18N if (pstr->mb_cur_max > 1) - build_wcs_upper_buffer (pstr); + { + ret = build_wcs_upper_buffer (pstr); + if (BE (ret != REG_NOERROR, 0)) + return ret; + } else #endif /* RE_ENABLE_I18N */ build_upper_buffer (pstr); diff --git a/posix/tst-regex.c b/posix/tst-regex.c index adc2d8ab9a..53960a3d9a 100644 --- a/posix/tst-regex.c +++ b/posix/tst-regex.c @@ -23,6 +23,7 @@ #include <errno.h> #include <error.h> #include <fcntl.h> +#include <getopt.h> #include <iconv.h> #include <locale.h> #include <mcheck.h> @@ -45,6 +46,7 @@ static char *mem; static char *umem; static size_t memlen; static size_t umemlen; +static int timing; static int test_expr (const char *expr, int expected, int expectedicase); static int run_test (const char *expr, const char *mem, size_t memlen, @@ -54,7 +56,7 @@ static int run_test_backwards (const char *expr, const char *mem, int -main (void) +main (int argc, char *argv[]) { const char *file; int fd; @@ -64,9 +66,16 @@ main (void) char *outmem; size_t inlen; size_t outlen; + static const struct option options[] = + { + {"timing",no_argument, &timing, 1 }, + {NULL, 0, NULL, 0 } + }; mtrace (); + while (getopt_long (argc, argv, "", options, NULL) >= 0); + /* Make the content of the file available in memory. */ file = "../ChangeLog.8"; fd = open (file, O_RDONLY); @@ -125,6 +134,8 @@ main (void) result |= test_expr ("G.\\{1\\}ran", 2, 3); result |= test_expr ("G.*ran", 3, 44); result |= test_expr ("[הבאג]", 0, 0); + result |= test_expr ("Uddeborg", 2, 2); + result |= test_expr (".Uddeborg", 2, 2); /* Free the resources. */ free (umem); @@ -201,7 +212,7 @@ run_test (const char *expr, const char *mem, size_t memlen, int icase, int cnt; #ifdef _POSIX_CPUTIME - if (use_clock) + if (use_clock && !timing) use_clock = clock_gettime (cl, &start) == 0; #endif @@ -250,7 +261,7 @@ run_test (const char *expr, const char *mem, size_t memlen, int icase, regfree (&re); #ifdef _POSIX_CPUTIME - if (use_clock) + if (use_clock && !timing) { use_clock = clock_gettime (cl, &finish) == 0; if (use_clock) @@ -270,6 +281,58 @@ run_test (const char *expr, const char *mem, size_t memlen, int icase, finish.tv_sec, finish.tv_nsec); } } + + if (use_clock && timing) + { + struct timespec mintime = { .tv_sec = 24 * 60 * 60 }; + + for (int i = 0; i < 10; ++i) + { + offset = 0; + use_clock = clock_gettime (cl, &start) == 0; + + if (!use_clock) + continue; + + err = regcomp (&re, expr, REG_NEWLINE | (icase ? REG_ICASE : 0)); + if (err != REG_NOERROR) + continue; + + while (offset < memlen) + { + regmatch_t ma[1]; + + err = regexec (&re, mem + offset, 1, ma, 0); + if (err != REG_NOERROR) + break; + + offset += ma[0].rm_eo; + } + + regfree (&re); + + use_clock = clock_gettime (cl, &finish) == 0; + if (use_clock) + { + if (finish.tv_nsec < start.tv_nsec) + { + finish.tv_nsec -= start.tv_nsec - 1000000000; + finish.tv_sec -= 1 + start.tv_sec; + } + else + { + finish.tv_nsec -= start.tv_nsec; + finish.tv_sec -= start.tv_sec; + } + if (finish.tv_sec < mintime.tv_sec + || (finish.tv_sec == mintime.tv_sec + && finish.tv_nsec < mintime.tv_nsec)) + mintime = finish; + } + } + printf ("elapsed time: %ld.%09ld sec\n", + mintime.tv_sec, mintime.tv_nsec); + } #endif /* Return an error if the number of matches found is not match we @@ -292,7 +355,7 @@ run_test_backwards (const char *expr, const char *mem, size_t memlen, int cnt; #ifdef _POSIX_CPUTIME - if (use_clock) + if (use_clock && !timing) use_clock = clock_gettime (cl, &start) == 0; #endif @@ -344,7 +407,7 @@ run_test_backwards (const char *expr, const char *mem, size_t memlen, regfree (&re); #ifdef _POSIX_CPUTIME - if (use_clock) + if (use_clock && !timing) { use_clock = clock_gettime (cl, &finish) == 0; if (use_clock) @@ -364,6 +427,74 @@ run_test_backwards (const char *expr, const char *mem, size_t memlen, finish.tv_sec, finish.tv_nsec); } } + + if (use_clock && timing) + { + struct timespec mintime = { .tv_sec = 24 * 60 * 60 }; + + for (int i = 0; i < 10; ++i) + { + offset = memlen; + use_clock = clock_gettime (cl, &start) == 0; + + if (!use_clock) + continue; + + memset (&re, 0, sizeof (re)); + re.fastmap = malloc (256); + if (re.fastmap == NULL) + continue; + + err = re_compile_pattern (expr, strlen (expr), &re); + if (err != NULL) + continue; + + if (re_compile_fastmap (&re)) + { + regfree (&re); + continue; + } + + while (offset <= memlen) + { + int start; + const char *sp; + + start = re_search (&re, mem, memlen, offset, -offset, NULL); + if (start < -1) + break; + + sp = mem + start; + while (sp > mem && sp[-1] != '\n') + --sp; + + offset = sp - 1 - mem; + } + + regfree (&re); + + use_clock = clock_gettime (cl, &finish) == 0; + if (use_clock) + { + if (finish.tv_nsec < start.tv_nsec) + { + finish.tv_nsec -= start.tv_nsec - 1000000000; + finish.tv_sec -= 1 + start.tv_sec; + } + else + { + finish.tv_nsec -= start.tv_nsec; + finish.tv_sec -= start.tv_sec; + } + if (finish.tv_sec < mintime.tv_sec + || (finish.tv_sec == mintime.tv_sec + && finish.tv_nsec < mintime.tv_nsec)) + mintime = finish; + } + } + printf ("elapsed time: %ld.%09ld sec\n", + mintime.tv_sec, mintime.tv_nsec); + } #endif /* Return an error if the number of matches found is not match we |