Age | Commit message (Collapse) | Author |
|
Change-Id: Ib8e429152c9a8b6032be22b5faac802aa8224caa
|
|
call to this set of functions are replaced by var16x16.
Change-Id: I5ff1effc6c1358ea06cda1517b88ec28ef551b0d
|
|
The encoder defined about 4 set of similar functions to calculate sum,
variance or sse or a combination of them. This commit removed one set
of these functions, get8x8var and get16x16var, where calls to the later
function are replaced with var16x16 by using the fact on a 16x16 MB:
variance == sse - sum*sum/256
Change-Id: I803eabd1fb3ab177780a40338cbd596dffaed267
|
|
Minor modification.
Change-Id: I09511d38fd1451d5c4106a48acdb3f766ce59cb7
|
|
|
|
|
|
In NEWMV mode, currently, full search is used as the refining search
after n-step search. By replacing it with an iterative diamond search
of radius 1 largely reduced the computation complexity, but still
maintained the same encoding quality since the refining search is
done for every macroblock instead of only a small precentage of
macroblocks while using full search.
Tests on the test set showed a 3.4% encoding speed increase with none
psnr & ssim loss.
Change-Id: Ife907d7eb9544d15c34f17dc6e4cfd97cb743d41
|
|
Change-Id: I9467d7a50eac32d8e8f3a2f26db818e47c93c94b
|
|
Renamed configure option "enable-psnr" to "enable-internal-stats" to
better reflect the purpose of the option and eliminate the confusion
reported in http://code.google.com/p/webm/issues/detail?id=35
Change-Id: If72df6fdb9f1e33dab1329240ba4d8911d2f1f7a
|
|
|
|
The accumulator array is an integer array, so use paddd instead of paddw
to add values to it. Fixes overflows when using large --arnr-maxframes
(>8) values.
Change-Id: Iad83794caa02400a65f3ab5760f2517e082d66ae
|
|
add an sse4 quantizer so we can use pinsrw/pextrw and keep values in xmm
registers instead of proxying through the stack. and as long as we're
bumping up, use some ssse3 instructions in the EOB detection (see ssse3
fast quantizer)
pick up about a percent on 32bit and about two on 64bit.
Change-Id: If15abba0e8b037a1d231c0edf33501545c9d9363
|
|
the win64 abi requires saving and restoring xmm6:xmm15. currently
SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow
specifying the highest register used and if the stack is unaligned.
Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
|
|
|
|
Went through the code and fixed it. Verified on Windows.
Where possible, remove dependencies on xmm[67]
Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.
Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
|
|
|
|
|
|
in encodframe.c, quant_shift is set to 0 or 1 in vp8cx_invert_quant
only use 8 bits to store this, instead of 16. will allow saving an
xmm register in an updated version of the regular quantize
Change-Id: Ie88c47fe2aff5af0283dab1147fb2791e4b12f90
|
|
Remove encoding speed limitation in real-time mode.
Change-Id: Ib5e35d8bb522b2a25f3e4ad5cfe2788ebebb3617
|
|
This commit fixed an overflow in ssim calculation, added register
save and restore to make sure assembly code working for x64 platform.
It also changed the sampling points to every 4x4 instead of 8x8 and
adjusted the constants in SSIM calculation to match the scale of
previous VPXSSIM.
Change-Id: Ia4dbb8c69eac55812f4662c88ab4653b6720537b
|
|
on the same order as the sse2 fast quantize change: ~2%
except for 32bit. only a slight improvment there.
Change-Id: Iff80e5f1ce7e646eebfdc8871405458ff911986b
|
|
Rax updated to eax to avoid uninitialized memory
usage.
Change-Id: Iedb953f104329ede2a786fc648a47f1be2f3798a
|
|
on the same order as the regular quantize change: ~2%
Change-Id: I5c9eec18e89ae7345dd96945cb740e6f349cee86
|
|
rather than look up rc in the zig zag table, embed it in the macro. this
also allows us to shuffle some values in the macro and keep *d in rsi
gains of about the same order as the obj_int_extract implementation: ~2%
Change-Id: Ib7252dd10eee66e0af8b0e567426122781dc053d
|
|
|
|
Fixed a bug in SSSE3 sub-pixel filter functions.
Change-Id: I2e2126652970eb78307ffcefcace1efd5966fb0a
|
|
http://code.google.com/p/webm/issues/detail?id=309
Change-Id: I6fce9e2f74bc09a9f258df7f91ab599812324e8c
|
|
remove helper function and avoid shadowing all the arguments to the
stack on 64bit systems
when running with --good --cpu-used=0:
~2% on linux x86 and x86_64
~2% on win32 x86 msys and visual studio
more on darwin10 x86_64
significantly more on
x86_64-win64-vs9
Change-Id: Ib7be12edf511fbf2922f191afd5b33b19a0c4ae6
|
|
This declaration did not match the prototype_sad() prototype, but was
unused in this translation unit, so it is removed instead. Fixes
issue 290.
Change-Id: I168854f88a85f73ca9aaf61d1e5dc0f43fc3fdb3
|
|
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.
These symbols were identified by:
$ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
| sort | grep '^ *1 '
Change-Id: I59609f58ab65312012c047036ae1e0634f795779
|
|
|
|
Change-Id: I91921b0a90dbaddc7010380b038955be347964b3
|
|
Use aligned store.
Change-Id: Icab4c0c53da811d0c52bb7e8134927f249ba2499
|
|
Added SSSE3 function
Change-Id: I8c304c92458618d93fda3a2f62bd09ccb63e75ad
|
|
Removed some unused functions
Change-Id: Ifdfc27453e53cfc75997b38492901d193a16b245
|
|
Rewrote these functions to process 16 pixels once instead of 8.
Change-Id: Ic67e80124467a446a3df4cfecfb76a4248602adb
|
|
Skip filter at zero offset.
Change-Id: I95fc7e211869bc0ab5bcfb7ab2e3259d1c0ccf38
|
|
1. Process 16 pixels at one time instead of 8.
2. Add check for both xoffset =0 and yoffset=0, which happens
during motion search.
This change gave encoder 1%~3% performance gain.
Change-Id: Idaa39506b48f4f8b2fbbeb45aae8226fa32afb3e
|
|
|
|
This improved encoding performance by 0.5% (good, speed 1) to
1.5% (good, speed 5).
Change-Id: I843d72a0d68a90b5f694adf770943e4a4618f50e
|
|
It is not used in realtime mode. Reduces memory footprint.
Change-Id: I7f163225762368df5457cfd413050161d3704a3f
|
|
This reverts commit f50f2fd2a73f2c5ee3f10ad077e780398df17cd7.
Change Ib7506e3e aligns the buffer
Change-Id: Ie0f8bd3e57cfdfef81d39638a1451458ebbae2e0
|
|
|
|
In real-time mode, vp8_sad16x16 function is called heavily in
motion search part. Improvement of this function gives 1.2%
encoding performance gain (real-time mode, tulip clip).
Change-Id: I23c401fc40c061f732a9767e8d383737a179bd58
|
|
Allow compiling without adding vp8/{common,encoder,decoder} to the
include paths.
Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
|
|
|
|
In sub-pixel calculation, xoffset and yoffset mostly take some
specific values. Modified sub-pixel filter functions according to
these possible values to improve performance.
Change-Id: I83083570af8b00ff65093467914fbb97a4e9ea21
|
|
Remove allocation/deallocation of stats storage.
Remove full search functions in machine specific encoder inits.
Remove last pass validation in validate_config.
Change-Id: I7f29be69273981a4fef6e80ecdb6217c68cbad4e
|
|
about ~5% gain on 32bit. disabled for 64bit
unset executable bit on ssse3 version (cosmetic)
Change-Id: I1a5860839eb294ce4261f819caea2dcfa78e57ca
|
|
source buffer is not guaranteed to be aligned for odd size buffers
Change-Id: Id0b1fd40ba3bd6c994bcfada788feccd2b53c5a9
|