Age | Commit message (Collapse) | Author |
|
|
|
When filtering it needs 6 pixels: 2 prior to the source, the source, and
3 after the source.
When filtering 16 wide, that means 21. To accomplish this the SSE2 reads
[-2] to [5], [6] to [13], and [14] to [21], a total of 24 bytes (reading
in groups of 8 is easy)
The filter then shifts this last set to the top half of the register and
uses 'or' to combine it with the previous set.
Valgrind detected an issue reading pixels [19], [20] and [21]:
Address 0x7f581c2 is 434 bytes inside a block of size 441 alloc'd
Note: we only need pixels [16], [17], and [18] as context for [15].
To fix this, it now reads 8 bytes starting at [11], which re-loads [11]
through [13], but stops at [18] and does not over-read any values.
This is shifted by 5 and 'or'd with xmm1. Although the lower bits are
not cleared, they overlap directly with [11] through [13], so 'or'
produces the correct results.
Change-Id: I0c89c03afa660fc9b0108ac055d7bd403e493320
|
|
the --enable-postproc-visualizer configure option remains as a no-op as
do the control names and values for compatibility
+ remove the corresponding debug flags from vpxdec: --pp-*
Change-Id: I4a001cd9962b59560d7d6bda6272d4ff32b8d37c
|
|
* changes:
vp8: convert some uses of unsigned long to size_t
vp8/encoder: quiet some -Wshorten-64-to-32 warnings
|
|
|
|
similar to changes that were done in vp9 for encoded frame size
reporting. has the side-effect of quieting a -Wshorten-64-to-32 warning.
Change-Id: I89f74cb617fc29334ee351dc8dfaa3b8cfd4e5af
|
|
* changes:
apply clang-format
.clang-format: update to 3.8.1
|
|
The code only has issues when xoffset == 0 and yoffset == 0 which
represents a simple copy. Presumably this case does not need to be
handled because the issue has existed since 2010.
BUG=webm:1287
Change-Id: Ic47e2653f3b729e99b40e53d8d2d8d1501edaaa9
|
|
|
|
This reverts commit d9dce2f48eed1368a44c368fa87a506bd89ffec5.
Appears to be failing the SixtapPredict tests in some configurations and possibly test vectors as well.
Change-Id: Ica6aa83ebac47d0a76e451846e7da67b1c17a7d7
|
|
|
|
This function was removed when clang started introducing alignment hints
which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail:
https://llvm.org/bugs/show_bug.cgi?id=24421
The load has been rendered safe with an implementation ~indiscernible
performance-wise that uses _u8 and over-reads just a touch.
It is still ~5x faster than C in the unaligned case and doing both
filters.
BUG=webm:892
BUG=webm:1273
Change-Id: Icf7167189391b46202f47233bb585c24c42bcc36
|
|
|
|
postproc.c is overloaded and used for both postproc and internal stats.
If only --enable-internal-stats is specified there are issues with
non-existent struct members and unused functions.
Change-Id: I82367f1ffce659c3918c9f964dbce94a716fbb89
|
|
Change-Id: I501597b7c1e0f0c7ae2aea3ee8073f0a641b3487
|
|
This function was removed when clang started introducing alignment hints
which caused the 32 bit vld1_lane_u32/vst1_lane_u32 to fail:
https://llvm.org/bugs/show_bug.cgi?id=24421
The load has been rendered safe with an implementation ~indiscernible
performance-wise that uses _u8 and over-reads just a touch.
The store, when unaligned, has a version that is ~25% slower but safe
when xoffset = 0 (second pass filter only). When the first pass filter
(or both) are in play, the new version is almost identical in speed.
Worst case performance (both filters, unaligned stores) is roughly 3-4x
faster than C.
BUG=webm:817
BUG=webm:1273
Change-Id: I1e490e94453e0872151fe0dafb05557463f6247d
|
|
Change-Id: Idcf3b68f0e59bd74c9d332bbd4a7c1484ddb691a
|
|
sysconf returns a long; cast (unsigned) dwNumberOfProcessors to int for
good measure
Change-Id: I1f181d7bd9a060c0898db41f66a5065394afdc4e
|
|
When 'NDEBUG' is set, assert() generates no code.
Change-Id: Icf61cfc1a8f6e5f0770b3626d8c73ae968df1108
|
|
_beginthreadex does not align the stack on 16-byte boundary as expected
by gcc.
On x86 targets, the force_align_arg_pointer attribute may be applied to
individual function definitions, generating an alternate prologue and
epilogue that realigns the run-time stack if necessary. This supports
mixing legacy codes that run with a 4-byte aligned stack with modern
codes that keep a 16-byte stack for SSE compatibility.
https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html
Change-Id: Ie4e4ab32948c238fa87054d5664189972ca6708e
Signed-off-by: Aleksey Vasenev <margtu-fivt@ya.ru>
|
|
Change-Id: I1fa81cc9cabf362a185fc3a53f1e58de533a41e5
|
|
The neon intrinsics are not able to load just the 4 values that are
used. In vpx_dsp/arm/intrapred_neon.c:dc_4x4 it loads 8 values for both
the 'above' and 'left' computations, but only uses the sum of the first
4 values.
BUG=webm:1268
Change-Id: I937113d7e3a21e25bebde3593de0446bf6b0115a
|
|
Change-Id: Id2a936301ec1e3d5648b4f8adbf4e6625002589d
|
|
float->int as reported by -Wfloat-conversion
Change-Id: I0089e8847b218c47526bcfbb0fffd9aad7c5adb3
|
|
Added back the header needed in threading.h
Change-Id: I2ce66ad4fe58004997623f6c3f3b8dd11640aa98
|
|
Reverted the patch because of possible performance issue.
Change-Id: I49944f827ccd38ed194c9f8d9cb9036fa9bf79e1
|
|
Change-Id: I84e1a293ee033865f82c244e8aaaadfb2fb27e63
|
|
applied against an x86_64 configure
clang-tidy-3.7.1 \
-checks='-*,google-readability-braces-around-statements' \
-header-filter='.*' -fix
+ clang-format afterward
Change-Id: I6694edeaee89b58b8b3082187e6756561136b459
|
|
Applied the following regex :
search for: (for.*\(.*;.*;) ([a-zA-Z_]*)\+\+\)
replace with: \1 ++\2)
This misses some for loops:
ie : for (mb_col = 0; mb_col < oci->mb_cols; mb_col++, mi++)
Change-Id: Icf5f6fb93cced0992e0bb71d2241780f7fb1f0a8
|
|
Change-Id: I7605b6678014a5426ceb45c27b54885e0c4e06ed
|
|
Change-Id: I5d4343f2da9cd4b01dd37be7a048d159fec109d1
|
|
Change-Id: I582b6307f28bfc987dcf8910379a52c6f679173c
|
|
Change-Id: Ifdcb36b8e77b65faeeb10644256e175acb32275d
|
|
Change-Id: I63ba35dc0ae9286c9812367a531e01d79a4c1635
|
|
The deblocking filters used in vp8 have been moved to vpx_dsp for
use by both vp8 and vp9.
Change-Id: I5209d76edafc894b550f751fc76d3aa6799b392d
|
|
quiets -Wmissing-prototypes warning
BUG=b/29584271
Change-Id: I806e3475ebee579dce0073dd1784a7c2899e7de0
|
|
add a trailing ':', though it's optional with the tools we support, it's
more common to use it to mark a label. this also quiets the
orphan-labels warning with nasm/yasm.
BUG=b/29583530
Change-Id: I46e95255e12026dd542d9838e2dd3fbddf7b56e2
|
|
When building without multithreading and for a non-arm, non-x86 system,
ctx is unused.
Cleans up -Wextra warning:
unused parameter ‘ctx’ [-Werror=unused-parameter]
Change-Id: Ifddff89d2ebd45f7d71e3d415a8f2415dd818957
|
|
left_above_mv and above_block_mv return as_int
as_int is defined as uint32_t in vp8/common/mv.h
Cleans up -Wextra warnings:
signed and unsigned type in conditional expression
this_mv->as_int = col ? d[-1].bmi.mv.as_int : left_block_mv(mic, i);
^
this_mv->as_int = row ? d[-4].bmi.mv.as_int : above_block_mv(mic, i, mis);
^
left_mv.as_int = col ? d[-1].bmi.mv.as_int :
^
Change-Id: Ia043764e4ce93d2152d2269b1c7b28b5d5f814cf
|
|
With correction of a type of a thread function for new threading
codes.
Change-Id: Ic6dc9f530698800d1cfe2da327848e8f8b62e31f
|
|
These implementations rely on casting the pointers to load the data.
Clang implemented optimizations which automatically add alignment hints
to such loads. The 4x4 filters do not guarantee the necessary alignment
so the resulting assembly is broken.
https://llvm.org/bugs/show_bug.cgi?id=24421
BUG=webm:817
BUG=webm:892
Change-Id: I608885299f1f86ff83653b65e0e40d0ae87fb3fe
|
|
Change-Id: I12218d8331c0558c0587a66321e3ca46da7e5cc7
|
|
Change-Id: Icc34a00759c95b7b8ac356cdcc4adae848b61431
|
|
avoids -Wunused-function warnings when INLINE is set
Change-Id: I44d91eaa7efba7bc2427501fb9f63a93f32aaa7f
|
|
Made the definition of THREAD_FUNCTION consistent.
Change-Id: I1ac099484e201e359298ed16de0b81ec781075ce
|
|
This fixes the build errors with msvc.
Change-Id: Ie2716e4c15a1bacfb00a8d41ec3283d718af88fc
|
|
There are flaws in current implementation of VP8 multithreading encoder
and decoder as reported in the following issue:
https://code.google.com/p/chromium/issues/detail?id=158922
Although the data race warnings are harmless, and wouldn't cause real
problems while encoding and decoding videos, it is better to fix the
warnings so that VP8 code could pass the TSan test.
To synchronize the thread-shared data access and maintain the speed
(i.e. decoding speed), use multiple mutexes based on mb_rows to reduce
the number of synchronizations needed, make the reads and writes of
the shared data protected, and reduce the number of mb_col writes by
nsync times.
The decoder speed tests showed < 3% speed loss while using 2 ~ 4
threads.
Change-Id: Ie296defffcd86a693188b668270d811964227882
|
|
the loop filter level is transmitted as 6-bits + sign so needs to be clamped in
the delta + absolute case.
BUG=https://bugzilla.mozilla.org/show_bug.cgi?id=1224363
Change-Id: Icbdca4fdbf043466429bd5c9d59dbe913bf153bc
|
|
This experiment allows using full above/right edges for all transform
sizes whenever available (for d45/d63), and adds bottom/left edges for
d207.
See issue 1043.
Change-Id: I5cf7f345e783e8539bb6b6d2c9972fb1d6d0a78b
|
|
The x86 simd expects this. Identical alignment can be found in vp9
and vp10 also. Fixes crashes on 32bit x86 systems.
Change-Id: I229c88d8f696acbef5337c8fa9503528df4e1c40
|