Age | Commit message (Collapse) | Author |
|
Conflicts:
vp8/common/alloccommon.c
vp8/encoder/rdopt.c
Change-Id: Ic34b33577423031e277235ffa6bcaff7b252e5cb
|
|
the decision to run the regular or simple loopfilter is made outside the
function and managed with pointers
stop tracking the option in two places. use filter_type exclusively
Change-Id: I39d7b5d1352885efc632c0a94aaf56b72cc2fe15
|
|
Conflicts:
vp8/decoder/onyxd_int.h
Change-Id: Icf445b589c2bc61d93d8c977379bbd84387d0488
|
|
the win64 abi requires saving and restoring xmm6:xmm15. currently
SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow
specifying the highest register used and if the stack is unaligned.
Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
|
|
Went through the code and fixed it. Verified on Windows.
Where possible, remove dependencies on xmm[67]
Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.
Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
|
|
|
|
vp8_filter_block1d16_h4_ssse3 was never called
because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was
masked. however, if/when win64 used those registers for persistant data,
issues could/will arise.
Change-Id: I56d6effca0aeba1f86082689771cb10145d39651
|
|
|
|
Change-Id: I36ca3f2f4620358033da34daf764f0b388dacd08
|
|
Conflicts:
vp8/decoder/decodemv.c
vp8/decoder/onyxd_if.c
vp8/encoder/ratectrl.c
vp8/encoder/rdopt.c
Change-Id: Ia1c1c5e589f4200822d12378c7749ba62bd17ae2
|
|
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.
These symbols were identified by:
$ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
| sort | grep '^ *1 '
Change-Id: I59609f58ab65312012c047036ae1e0634f795779
|
|
|
|
Allow compiling without adding vp8/{common,encoder,decoder} to the
include paths.
Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
|
|
Conflicts:
vp8/encoder/encodeframe.c
vp8/encoder/ethreading.c
vp8/encoder/onyx_int.h
Change-Id: I1c562d2fe6e42c0d1d86f68c77c0e899066e02bd
|
|
Change-Id: I0d41415e3961c2c9492d342290c1999f9d02e6d8
|
|
|
|
This eliminates a large set of warnings exposed by the Mozilla build
system (Use of C++ comments in ISO C90 source, commas at the end of
enum lists, a couple incomplete initializers, and signed/unsigned
comparisons).
It also eliminates many (but not all) of the warnings expose by newer
GCC versions and _FORTIFY_SOURCE (e.g., calling fread and fwrite
without checking the return values).
There are a few spurious warnings left on my system:
../vp8/encoder/encodemb.c:274:9: warning: 'sz' may be used
uninitialized in this function
gcc seems to be unable to figure out that the value shortcut doesn't
change between the two if blocks that test it here.
../vp8/encoder/onyx_if.c:5314:5: warning: comparison of unsigned
expression >= 0 is always true
../vp8/encoder/onyx_if.c:5319:5: warning: comparison of unsigned
expression >= 0 is always true
This is true, so far as it goes, but it's comparing against an enum, and the C
standard does not mandate that enums be unsigned, so the checks can't be
removed.
Change-Id: Iaf689ae3e3d0ddc5ade00faa474debe73b8d3395
|
|
Filed for nasm as:
https://sourceforge.net/tracker/?func=detail&atid=106208&aid=3081103&group_id=6208
nasm just does not accept any size parameter for movhps:
1.asm:2: error: mismatch in operand sizes
Some parts of libvpx already use MMWORD for movhps and MMWORD is
defined-out so it is compatible both with yasm and nasm.
Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu.
Change-Id: I4008a317ca87ec07c9ada958fcdc10a0cb589bbc
|
|
nasm does not support `label wrt rip', it requires `rel label'. It is
still fully compatible with yasm.
Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.
Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
|
|
nasm requires the instruction length (movd/movq) to match to its
parameters. I find it more clear to really use 64bit instructions when
we use 64bit registers in the assembly.
Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.
Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
|
|
- Scheduling for Atom processors
- Combining of macros to allow for better interleaving
- Change from multiplies to adds for main filter
- Use of movhps/movlps to fill xmm registers without
shifting and orring
Change-Id: I0b3500a5f58abf7085253ec92d64c8a96723040b
|
|
Movdqu is more expensive (throughput, uops) than movq. Minimal
impact for newer big cores, but ~2.25% gain on Atom.
Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f
|
|
Use pmaxub instead of a combination of psubusb/por to
determine if any comparisons go over the limit.
Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82
|
|
There is no need to make sure that the lower byte of the
register is 0 because the downshift by 11 overwrites that byte.
Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1
|
|
Sequentially accessing memory from a low address to a high
address should make it easier for the processor to predict
the cache.
Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d
|
|
|
|
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.
Fixes issue #97.
Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
|
|
Used pmaddubsw for multiply and add of two filter taps
at once for 16x16 and 8x8 blocks.
Change-Id: Idccf2d6e094561624407b109fa7e80ba799355ea
|
|
Moving the eob structure allows for a non-struct based
function to handle decoding an entire mb of
idct/dequant/recon data. This allows for SIMD functions
to idct/dequant/recon multiple blocks at once.
SSE2 implementation gives 3% gain on Atom.
Change-Id: I8a8f3efd546ea4e0535f517d94f347cfb737c9c2
|
|
This reverts commit 6ea5bb85cd1547b846f4c794e8684de5abcf9f62.
|
|
Change-Id: I0f20fbb898ee31eb94a143471aa6f1ca17a229a4
|
|
Added vp8_filter_block1d4_h6_ssse3 and vp8_filter_block1d4_v6_ssse3
assembly routines. Also removed unused assembly.
Change-Id: I01c1021835f2edda9da706822345f217087ca0d0
|
|
Improved decoder performance by 9% for the clip used.
Change-Id: I8fc5609213b7bef10248372595dc85b29f9895b9
|
|
global label:data
^^
Provide nasm compatibility. No binary change by this patch with yasm
on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.
Change-Id: I10f17eb1e4d4a718d4ebd1d0ccddc807c365e021
|
|
Change-Id: I896fe6f9664e6849c7cee2cc6bb4e045eb42540f
|
|
This moves the prediction step before the idct and combines the idct and
reconstruction steps into a single step. Combining them seems to give an
overall decoder performance improvement of about 1%.
Change-Id: I90d8b167ec70d79c7ba2ee484106a78b3d16e318
|
|
Restructured and rewrote SSE2 loopfilter functions. Combined u and
v into one function to take advantage of SSE2 128-bit registers.
Tests on test clips showed a 4% decoder performance improvement on
Linux desktop.
Change-Id: Iccc6669f09e17f2224da715f7547d6f93b0a4987
|
|
If the version script produced by the libvpx build system is not
used when linking a shared library on x86-64 Linux, the constant
data in the subpel filters produces R_X86_64_32 relocation errors
due to the use of wrt rip addressing instead of
wrt rip wrt ..gotpcrel.
Instead of adding a new macro for this addressing mode, this patch
sets the ELF visibility of these symbols to "hidden", which
allows wrt rip addressing to work without a text relocation.
This allows building a shared library without using the provided
build system or a separate version script.
Fixes http://code.google.com/p/webm/issues/detail?id=46
Change-Id: Ie108f9d9a4352e5af46938bf4750d2302c1b2dc2
|
|
When the license headers were updated, they accidentally contained
trailing whitespace, so unfortunately we have to touch all the files
again.
Change-Id: I236c05fade06589e417179c0444cb39b09e4200d
|
|
Add same fix in subpixel_sse2.asm.
Change-Id: Icfda6103cbf74ec43308e96961dd738aa823c14d
|
|
XMM6 to XMM15 are non-volatile on Windows x64 ABI. We have to save
these registers.
Change-Id: I4676309f1350af25c8a35f0c81b1f0499ab99076
|
|
Restructure vp8_sixtap_predict functions to eliminate extra 5-line
calculation while doing first-pass only. Also, combline functions
to eliminate usage of intermediate buffer. This gives decoder a 3%
performance gain on my test clips.
Change-Id: I13de49638884d1a57d0855c63aea719316d08c1b
|
|
Change-Id: Ieebea089095d9073b3a94932791099f614ce120c
|
|
|