summaryrefslogtreecommitdiff
path: root/vp8/common/x86
AgeCommit message (Collapse)Author
2012-06-25Sign-extend input argument so it can be used in pointer arithmetic.Ronald S. Bultje
Change-Id: I6cbd4de96f9dcc783cef170bfd7652f6cbee36a2
2012-06-18x86inc: Move x86inc to the correct location.Daniel Kang
Change-Id: I6802731a4d15feef5ce62993dc505ded55c40f7e
2012-06-12Adds x86inc.asm and update idct/dequant mmxDaniel Kang
Updates idct/dequant mmx assembly to work with vpnext instead of vp8. Also adds x86inc.asm Change-Id: I6e147d5e89177ae449271e97e50d082eb11b078e
2012-05-15Adds new Directional Intra prediction modes.Deb Mukherjee
Adds 6 directional intra predictiom modes for 16x16 and 8x8 blocks. Change-Id: I25eccc0836f28d8d74922e4e9231568a648b47d1
2012-02-23Supporting high precision 1/8-pel motion vectorsDeb Mukherjee
This is the initial patch for supporting 1/8th pel motion. Currently if we configure with enable-high-precision-mv, all motion vectors would default to 1/8 pel. Encode and decode syncs fine with the current code. In the next phase the code will be refactored so that we can choose the 1/8 pel mode adaptively at a frame/segment/mb level. Derf results: http://www.corp.google.com/~debargha/vp8_results/enhinterp_hpmv.html (about 0.83% better than 8-tap interpoaltion) Patch 3: Rebased. Also adding 1/16th pel interpolation for U and V Patch 4: HD results. http://www.corp.google.com/~debargha/vp8_results/enhinterp_hd_hpmv.html Seems impressive (unless I am doing something wrong). Patch 5: Added mmx/sse for bilateral filtering, as well as enforced use of c-versions of subpel filters with 8-taps and 1/16th pel; Also redesigned the 8-tap filters to reduce the cut-off in order to introduce a denoising effect. There is a new configure option sixteenth-subpel-uv which will use 1/16 th pel interpolation for uv, if the motion vectors have 1/8 pel accuracy. With the fixes the results are promising on the derf set. The enhanced interpolation option with 8-taps alone gives 3% improvement over thei derf set: http://www.corp.google.com/~debargha/vp8_results/enhinterpn.html Results on high precision mv and on the hd set are to follow. Patch 6: Adding a missing condition for CONFIG_SIXTEENTH_SUBPEL_UV in vp8/common/x86/x86_systemdependent.c Patch 7: Cleaning up various debug messages. Patch 8: Merge conflict Change-Id: I5b1d844457aefd7414a9e4e0e06c6ed38fd8cc04
2011-08-25Merge remote branch 'internal/upstream' into HEADJohn Koleszar
Conflicts: vp8/common/defaultcoefcounts.h vp8/common/entropy.c vp8/encoder/bitstream.c Change-Id: Idd4990c80d5b5494ac036254694015fab449bc08
2011-08-24Fix naming of sse2 idct functions.Fritz Koenig
Prepend idct function names with vp8_ so that under profiling they show up associated with libvpx. Change-Id: I4fe357b50236cb7730a4cc00164c0a3487a1d8b4
2011-08-24Merge remote branch 'internal/upstream' into HEADJohn Koleszar
2011-08-23Fix data accesses for simple loopfiltersJohann
The data that the simple horizontal loopfilter reads is aligned, treat it accordingly. For the vertical, we only use the bottom 4 bytes, so don't read in 16 (and incur the penalty for unaligned access). This shows a small improvement on older processors which have a significant penalty for unaligned reads. postproc_mmx.c is unused Change-Id: I87b29bbc0c3b19ee1ca1de3c4f47332a53087b3d
2011-08-23Use local labels for jumps/loops in x86 assembly.Fritz Koenig
Prepend . to local labels in assembly code. This allows non unique labels within a file. Also makes profiling information more informative by keeping the function name with the loop name. Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f
2011-07-14Merge remote branch 'internal/upstream' into HEADJohn Koleszar
2011-07-08update x86 asm for loopfilterJohann
Change-Id: I1ed739522db7c00c189851c7095c1b64ef6412ce
2011-07-02Merge remote branch 'internal/upstream' into HEADJohn Koleszar
2011-06-30Properly use GET_GOT/RESTORE_GOT when using GLOBAL().Ronald S. Bultje
This should fix binaries using PIC on x86-32. Also should fix issue 343. Change-Id: I591de3ad68c8a8bb16054bd8f987a75b4e2bad02
2011-05-20Merge remote branch 'internal/upstream' into HEADJohn Koleszar
2011-05-19Merge "Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3."Scott LaVarnway
2011-05-11Merge remote branch 'internal/upstream' into HEADJohn Koleszar
2011-05-11Merge remote branch 'internal/upstream-experimental' into HEADJohn Koleszar
2011-05-10set up Global Offset Table in reconJohann
global values were being referenced, but the GOT was not being set up. as the GOT is only required for PIC, this issue wasn't caught in the default configuration. Change-Id: I8006e53776139362a76f2c80cf9d0f8458602b2f http://code.google.com/p/webm/issues/detail?id=328
2011-05-09clean up unused variable warningsJohann
Change-Id: I9467d7a50eac32d8e8f3a2f26db818e47c93c94b
2011-05-03Merge remote branch 'internal/upstream' into HEADJohn Koleszar
2011-04-30Fix documentation typosThijs Vermeir
Change-Id: I97124670926433bf1593c91660d8b8f8482ea9ce
2011-04-29Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3.Ronald S. Bultje
Change-Id: I658a1df7d825f820573cb2d11ad402f9d2791035
2011-04-29Merge remote branch 'internal/upstream' into HEADJohn Koleszar
2011-04-28bug fix removed inline from recon_wrapper_sse2.cJames Berry
removed inline from recon_wrapper_sse2.c to build for visual stuido Change-Id: I74a3482950448e2cdb30e9cd7087145b440d8a22
2011-04-28Merge remote branch 'internal/upstream' into HEADJohn Koleszar
2011-04-27Use psadbw to get the sum of bytes in a line.Ronald S. Bultje
Thanks Jason for pointing that out on #vp8. ;-). Change-Id: I5330a753e752a8704b78a409597472628e0b26a5
2011-04-27SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}().Ronald S. Bultje
decoding before 10.425 10.432 10.423 =10.426 after: 10.405 10.416 10.398 =10.406, 0.2% faster encoding before 14.252 14.331 14.250 14.223 14.241 14.220 14.221 =14.248 after 14.095 14.090 14.085 14.095 14.064 14.081 14.089 =14.086, 1.1% faster Change-Id: I483d3d8f0deda8ad434cea76e16028380722aee2
2011-04-26Merge remote branch 'internal/upstream' into HEADJohn Koleszar
Conflicts: vp8/common/alloccommon.c vp8/encoder/rdopt.c Change-Id: Ic34b33577423031e277235ffa6bcaff7b252e5cb
2011-04-25remove simpler_lpfJohann
the decision to run the regular or simple loopfilter is made outside the function and managed with pointers stop tracking the option in two places. use filter_type exclusively Change-Id: I39d7b5d1352885efc632c0a94aaf56b72cc2fe15
2011-04-25Merge remote branch 'internal/upstream-experimental' into HEADJohn Koleszar
Conflicts: vp8/decoder/onyxd_int.h Change-Id: Icf445b589c2bc61d93d8c977379bbd84387d0488
2011-04-19modify SAVE_XMM for potential 64bit useJohann
the win64 abi requires saving and restoring xmm6:xmm15. currently SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow specifying the highest register used and if the stack is unaligned. Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
2011-04-18Add save/restore xmm registers in x86 assembly codeJohann
Went through the code and fixed it. Verified on Windows. Where possible, remove dependencies on xmm[67] Current code relies on pushing rbp to the stack to get 16 byte alignment. This broke when rbp wasn't pushed (vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned memory accesses. Revisit this and the offsets in vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM. Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
2011-04-16Merge remote branch 'internal/upstream' into HEADJohn Koleszar
2011-04-15remove dead code, add missing RESTORE_XMMJohann
vp8_filter_block1d16_h4_ssse3 was never called because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was masked. however, if/when win64 used those registers for persistant data, issues could/will arise. Change-Id: I56d6effca0aeba1f86082689771cb10145d39651
2011-04-12Merge remote branch 'internal/upstream' into HEADJohn Koleszar
2011-04-11Remove unused filesJohn Koleszar
Change-Id: I36ca3f2f4620358033da34daf764f0b388dacd08
2011-03-23Merge remote branch 'internal/upstream' into HEADJohn Koleszar
Conflicts: vp8/decoder/decodemv.c vp8/decoder/onyxd_if.c vp8/encoder/ratectrl.c vp8/encoder/rdopt.c Change-Id: Ia1c1c5e589f4200822d12378c7749ba62bd17ae2
2011-03-17Increase static linkage, remove unused functionsJohn Koleszar
A large number of functions were defined with external linkage, even though they were only used from within one file. This patch changes their linkage to static and removes the vp8_ prefix from their names, which should make it more obvious to the reader that the function is contained within the current translation unit. Functions that were not referenced were removed. These symbols were identified by: $ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \ | sort | grep '^ *1 ' Change-Id: I59609f58ab65312012c047036ae1e0634f795779
2011-02-18Merge remote branch 'internal/upstream' into HEADJohn Koleszar
2011-02-10Fix relative include pathsJohn Koleszar
Allow compiling without adding vp8/{common,encoder,decoder} to the include paths. Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
2011-02-07Merge remote branch 'internal/upstream-experimental' into HEADJohn Koleszar
Conflicts: vp8/encoder/encodeframe.c vp8/encoder/ethreading.c vp8/encoder/onyx_int.h Change-Id: I1c562d2fe6e42c0d1d86f68c77c0e899066e02bd
2011-02-04Remove duplicate loopfilter parameters.Gaute Strokkenes
Change-Id: I0d41415e3961c2c9492d342290c1999f9d02e6d8
2010-11-16changes to start experimenting with color segmentation prediction modes.Jim Bankoski
2010-10-27Eliminate more warnings.Timothy B. Terriberry
This eliminates a large set of warnings exposed by the Mozilla build system (Use of C++ comments in ISO C90 source, commas at the end of enum lists, a couple incomplete initializers, and signed/unsigned comparisons). It also eliminates many (but not all) of the warnings expose by newer GCC versions and _FORTIFY_SOURCE (e.g., calling fread and fwrite without checking the return values). There are a few spurious warnings left on my system: ../vp8/encoder/encodemb.c:274:9: warning: 'sz' may be used uninitialized in this function gcc seems to be unable to figure out that the value shortcut doesn't change between the two if blocks that test it here. ../vp8/encoder/onyx_if.c:5314:5: warning: comparison of unsigned expression >= 0 is always true ../vp8/encoder/onyx_if.c:5319:5: warning: comparison of unsigned expression >= 0 is always true This is true, so far as it goes, but it's comparing against an enum, and the C standard does not mandate that enums be unsigned, so the checks can't be removed. Change-Id: Iaf689ae3e3d0ddc5ade00faa474debe73b8d3395
2010-10-04nasm: movhps compatibility QWORD->MMWORDJan Kratochvil
Filed for nasm as: https://sourceforge.net/tracker/?func=detail&atid=106208&aid=3081103&group_id=6208 nasm just does not accept any size parameter for movhps: 1.asm:2: error: mismatch in operand sizes Some parts of libvpx already use MMWORD for movhps and MMWORD is defined-out so it is compatible both with yasm and nasm. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Change-Id: I4008a317ca87ec07c9ada958fcdc10a0cb589bbc
2010-10-04nasm: address labels 'rel label' vice 'wrt rip'Jan Kratochvil
nasm does not support `label wrt rip', it requires `rel label'. It is still fully compatible with yasm. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
2010-10-04nasm: match instruction length (movd/movq) to parametersJan Kratochvil
nasm requires the instruction length (movd/movq) to match to its parameters. I find it more clear to really use 64bit instructions when we use 64bit registers in the assembly. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
2010-09-28Optimizations on the loopfilters.Fritz Koenig
- Scheduling for Atom processors - Combining of macros to allow for better interleaving - Change from multiplies to adds for main filter - Use of movhps/movlps to fill xmm registers without shifting and orring Change-Id: I0b3500a5f58abf7085253ec92d64c8a96723040b
2010-09-20Use movq instead of movdqu.Fritz Koenig
Movdqu is more expensive (throughput, uops) than movq. Minimal impact for newer big cores, but ~2.25% gain on Atom. Change-Id: I62c80bb1cc01d8a91c350c4c7719462809a4ef7f