summaryrefslogtreecommitdiff
path: root/vp8/common/x86
AgeCommit message (Collapse)Author
2011-04-26Merge remote branch 'internal/upstream' into HEADJohn Koleszar
Conflicts: vp8/common/alloccommon.c vp8/encoder/rdopt.c Change-Id: Ic34b33577423031e277235ffa6bcaff7b252e5cb
2011-04-25remove simpler_lpfJohann
the decision to run the regular or simple loopfilter is made outside the function and managed with pointers stop tracking the option in two places. use filter_type exclusively Change-Id: I39d7b5d1352885efc632c0a94aaf56b72cc2fe15
2011-04-25Merge remote branch 'internal/upstream-experimental' into HEADJohn Koleszar
Conflicts: vp8/decoder/onyxd_int.h Change-Id: Icf445b589c2bc61d93d8c977379bbd84387d0488
2011-04-19modify SAVE_XMM for potential 64bit useJohann
the win64 abi requires saving and restoring xmm6:xmm15. currently SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow specifying the highest register used and if the stack is unaligned. Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
2011-04-18Add save/restore xmm registers in x86 assembly codeJohann
Went through the code and fixed it. Verified on Windows. Where possible, remove dependencies on xmm[67] Current code relies on pushing rbp to the stack to get 16 byte alignment. This broke when rbp wasn't pushed (vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned memory accesses. Revisit this and the offsets in vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM. Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
2011-04-16Merge remote branch 'internal/upstream' into HEADJohn Koleszar
2011-04-15remove dead code, add missing RESTORE_XMMJohann
vp8_filter_block1d16_h4_ssse3 was never called because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was masked. however, if/when win64 used those registers for persistant data, issues could/will arise. Change-Id: I56d6effca0aeba1f86082689771cb10145d39651
2011-04-12Merge remote branch 'internal/upstream' into HEADJohn Koleszar
2011-04-11Remove unused filesJohn Koleszar
Change-Id: I36ca3f2f4620358033da34daf764f0b388dacd08
2011-03-23Merge remote branch 'internal/upstream' into HEADJohn Koleszar
Conflicts: vp8/decoder/decodemv.c vp8/decoder/onyxd_if.c vp8/encoder/ratectrl.c vp8/encoder/rdopt.c Change-Id: Ia1c1c5e589f4200822d12378c7749ba62bd17ae2
2011-03-17Increase static linkage, remove unused functionsJohn Koleszar
A large number of functions were defined with external linkage, even though they were only used from within one file. This patch changes their linkage to static and removes the vp8_ prefix from their names, which should make it more obvious to the reader that the function is contained within the current translation unit. Functions that were not referenced were removed. These symbols were identified by: $ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \ | sort | grep '^ *1 ' Change-Id: I59609f58ab65312012c047036ae1e0634f795779
2011-02-18Merge remote branch 'internal/upstream' into HEADJohn Koleszar
2011-02-10Fix relative include pathsJohn Koleszar
Allow compiling without adding vp8/{common,encoder,decoder} to the include paths. Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
2011-02-07Merge remote branch 'internal/upstream-experimental' into HEADJohn Koleszar
Conflicts: vp8/encoder/encodeframe.c vp8/encoder/ethreading.c vp8/encoder/onyx_int.h Change-Id: I1c562d2fe6e42c0d1d86f68c77c0e899066e02bd
2011-02-04Remove duplicate loopfilter parameters.Gaute Strokkenes
Change-Id: I0d41415e3961c2c9492d342290c1999f9d02e6d8
2010-11-16changes to start experimenting with color segmentation prediction modes.Jim Bankoski
2010-10-27Eliminate more warnings.Timothy B. Terriberry
This eliminates a large set of warnings exposed by the Mozilla build system (Use of C++ comments in ISO C90 source, commas at the end of enum lists, a couple incomplete initializers, and signed/unsigned comparisons). It also eliminates many (but not all) of the warnings expose by newer GCC versions and _FORTIFY_SOURCE (e.g., calling fread and fwrite without checking the return values). There are a few spurious warnings left on my system: ../vp8/encoder/encodemb.c:274:9: warning: 'sz' may be used uninitialized in this function gcc seems to be unable to figure out that the value shortcut doesn't change between the two if blocks that test it here. ../vp8/encoder/onyx_if.c:5314:5: warning: comparison of unsigned expression >= 0 is always true ../vp8/encoder/onyx_if.c:5319:5: warning: comparison of unsigned expression >= 0 is always true This is true, so far as it goes, but it's comparing against an enum, and the C standard does not mandate that enums be unsigned, so the checks can't be removed. Change-Id: Iaf689ae3e3d0ddc5ade00faa474debe73b8d3395
2010-10-04nasm: movhps compatibility QWORD->MMWORDJan Kratochvil
Filed for nasm as: https://sourceforge.net/tracker/?func=detail&atid=106208&aid=3081103&group_id=6208 nasm just does not accept any size parameter for movhps: 1.asm:2: error: mismatch in operand sizes Some parts of libvpx already use MMWORD for movhps and MMWORD is defined-out so it is compatible both with yasm and nasm. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Change-Id: I4008a317ca87ec07c9ada958fcdc10a0cb589bbc
2010-10-04nasm: address labels 'rel label' vice 'wrt rip'Jan Kratochvil
nasm does not support `label wrt rip', it requires `rel label'. It is still fully compatible with yasm. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
2010-10-04nasm: match instruction length (movd/movq) to parametersJan Kratochvil
nasm requires the instruction length (movd/movq) to match to its parameters. I find it more clear to really use 64bit instructions when we use 64bit registers in the assembly. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
2010-09-28Optimizations on the loopfilters.Fritz Koenig
- Scheduling for Atom processors - Combining of macros to allow for better interleaving - Change from multiplies to adds for main filter - Use of movhps/movlps to fill xmm registers without shifting and orring Change-Id: I0b3500a5f58abf7085253ec92d64c8a96723040b
2010-09-20Use movq instead of movdqu.Fritz Koenig
Movdqu is more expensive (throughput, uops) than movq. Minimal impact for newer big cores, but ~2.25% gain on Atom. Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f
2010-09-20Better choice of instruction filter mask comparision.Fritz Koenig
Use pmaxub instead of a combination of psubusb/por to determine if any comparisons go over the limit. Change-Id: I3f0bd7d2aabe5fee9ba6620508e2b60605abcb82
2010-09-13Removed unnecessary pxor.Fritz Koenig
There is no need to make sure that the lower byte of the register is 0 because the downshift by 11 overwrites that byte. Change-Id: I89cbf004b2ff532a2c68e0dc399c45a49cdad5a1
2010-09-10Make block access to frame buffer sequentialFritz Koenig
Sequentially accessing memory from a low address to a high address should make it easier for the processor to predict the cache. Change-Id: I1921ce996bdd547144fe864fea6435f527f5842d
2010-09-09Merge branch 'master' of git://review.webmproject.org/libvpxFritz Koenig
2010-09-09Use WebM in copyright notice for consistencyJohn Koleszar
Changes 'The VP8 project' to 'The WebM project', for consistency with other webmproject.org repositories. Fixes issue #97. Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
2010-09-07Bilinear subpixel optimizations for ssse3.Fritz Koenig
Used pmaddubsw for multiply and add of two filter taps at once for 16x16 and 8x8 blocks. Change-Id: Idccf2d6e094561624407b109fa7e80ba799355ea
2010-08-23Rework idct calling structure.Fritz Koenig
Moving the eob structure allows for a non-struct based function to handle decoding an entire mb of idct/dequant/recon data. This allows for SIMD functions to idct/dequant/recon multiple blocks at once. SSE2 implementation gives 3% gain on Atom. Change-Id: I8a8f3efd546ea4e0535f517d94f347cfb737c9c2
2010-08-19Revert "Removed ssse3 sixtap code"Jim Bankoski
This reverts commit 6ea5bb85cd1547b846f4c794e8684de5abcf9f62.
2010-08-18Removed ssse3 sixtap codeScott LaVarnway
Change-Id: I0f20fbb898ee31eb94a143471aa6f1ca17a229a4
2010-08-11Finished vp8_sixtap_predict4x4_ssse3 functionScott LaVarnway
Added vp8_filter_block1d4_h6_ssse3 and vp8_filter_block1d4_v6_ssse3 assembly routines. Also removed unused assembly. Change-Id: I01c1021835f2edda9da706822345f217087ca0d0
2010-08-10Added ssse3 version of sixtap filtersScott LaVarnway
Improved decoder performance by 9% for the clip used. Change-Id: I8fc5609213b7bef10248372595dc85b29f9895b9
2010-08-02nasm: avoid space before the :data symbol type.Jan Kratochvil
global label:data ^^ Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I10f17eb1e4d4a718d4ebd1d0ccddc807c365e021
2010-07-23Change the x86 idct functions to do reconstruction at the same timeJeff Muizelaar
Change-Id: I896fe6f9664e6849c7cee2cc6bb4e045eb42540f
2010-07-23Combine idct and reconstruction stepsJeff Muizelaar
This moves the prediction step before the idct and combines the idct and reconstruction steps into a single step. Combining them seems to give an overall decoder performance improvement of about 1%. Change-Id: I90d8b167ec70d79c7ba2ee484106a78b3d16e318
2010-06-29Improve SSE2 loopfilter functionsYunqing Wang
Restructured and rewrote SSE2 loopfilter functions. Combined u and v into one function to take advantage of SSE2 128-bit registers. Tests on test clips showed a 4% decoder performance improvement on Linux desktop. Change-Id: Iccc6669f09e17f2224da715f7547d6f93b0a4987
2010-06-21Fix a linker error on x86-64 Linux when not using a version script.Timothy B. Terriberry
If the version script produced by the libvpx build system is not used when linking a shared library on x86-64 Linux, the constant data in the subpel filters produces R_X86_64_32 relocation errors due to the use of wrt rip addressing instead of wrt rip wrt ..gotpcrel. Instead of adding a new macro for this addressing mode, this patch sets the ELF visibility of these symbols to "hidden", which allows wrt rip addressing to work without a text relocation. This allows building a shared library without using the provided build system or a separate version script. Fixes http://code.google.com/p/webm/issues/detail?id=46 Change-Id: Ie108f9d9a4352e5af46938bf4750d2302c1b2dc2
2010-06-18cosmetics: trim trailing whitespaceJohn Koleszar
When the license headers were updated, they accidentally contained trailing whitespace, so unfortunately we have to touch all the files again. Change-Id: I236c05fade06589e417179c0444cb39b09e4200d
2010-06-15More on "some XMM registers are non-volatile on windows x64 ABI"Yunqing Wang
Add same fix in subpixel_sse2.asm. Change-Id: Icfda6103cbf74ec43308e96961dd738aa823c14d
2010-06-11some XMM registers are non-volatile on windows x64 ABIMakoto Kato
XMM6 to XMM15 are non-volatile on Windows x64 ABI. We have to save these registers. Change-Id: I4676309f1350af25c8a35f0c81b1f0499ab99076
2010-06-10Improve vp8_sixtap_predict functionsYunqing Wang
Restructure vp8_sixtap_predict functions to eliminate extra 5-line calculation while doing first-pass only. Also, combline functions to eliminate usage of intermediate buffer. This gives decoder a 3% performance gain on my test clips. Change-Id: I13de49638884d1a57d0855c63aea719316d08c1b
2010-06-04LICENSE: update with latest textJohn Koleszar
Change-Id: Ieebea089095d9073b3a94932791099f614ce120c
2010-05-18Initial WebM releaseJohn Koleszar