summaryrefslogtreecommitdiff
path: root/vp8/encoder/x86
AgeCommit message (Collapse)Author
2011-03-11Merge "Align SAD output array to be 16-byte aligned"Yunqing Wang
2011-03-11vp8cx- alternate ssim function with optimizationsJim Bankoski
Change-Id: I91921b0a90dbaddc7010380b038955be347964b3
2011-03-11Align SAD output array to be 16-byte alignedYunqing Wang
Use aligned store. Change-Id: Icab4c0c53da811d0c52bb7e8134927f249ba2499
2011-03-09Add vp8_sub_pixel_variance16x8_ssse3 functionYunqing Wang
Added SSSE3 function Change-Id: I8c304c92458618d93fda3a2f62bd09ccb63e75ad
2011-03-09Remove unused functionsYunqing Wang
Removed some unused functions Change-Id: Ifdfc27453e53cfc75997b38492901d193a16b245
2011-03-08Improve SSE2 half-pixel filter funtionsYunqing Wang
Rewrote these functions to process 16 pixels once instead of 8. Change-Id: Ic67e80124467a446a3df4cfecfb76a4248602adb
2011-03-08Add zero offset checking in SSE2 sub-pixel filter functionYunqing Wang
Skip filter at zero offset. Change-Id: I95fc7e211869bc0ab5bcfb7ab2e3259d1c0ccf38
2011-03-08Write SSSE3 sub-pixel filter functionYunqing Wang
1. Process 16 pixels at one time instead of 8. 2. Add check for both xoffset =0 and yoffset=0, which happens during motion search. This change gave encoder 1%~3% performance gain. Change-Id: Idaa39506b48f4f8b2fbbeb45aae8226fa32afb3e
2011-02-28Merge "Add prefetch before variance calculation"Yunqing Wang
2011-02-28Add prefetch before variance calculationYunqing Wang
This improved encoding performance by 0.5% (good, speed 1) to 1.5% (good, speed 5). Change-Id: I843d72a0d68a90b5f694adf770943e4a4618f50e
2011-02-22Remove temporal alt ref from realtime only buildAttila Nagy
It is not used in realtime mode. Reduces memory footprint. Change-Id: I7f163225762368df5457cfd413050161d3704a3f
2011-02-18Revert "use unaligned load"Johann
This reverts commit f50f2fd2a73f2c5ee3f10ad077e780398df17cd7. Change Ib7506e3e aligns the buffer Change-Id: Ie0f8bd3e57cfdfef81d39638a1451458ebbae2e0
2011-02-17Merge "Fix relative include paths"John Koleszar
2011-02-14Improve vp8_sad16x16_sse3 functionYunqing Wang
In real-time mode, vp8_sad16x16 function is called heavily in motion search part. Improvement of this function gives 1.2% encoding performance gain (real-time mode, tulip clip). Change-Id: I23c401fc40c061f732a9767e8d383737a179bd58
2011-02-10Fix relative include pathsJohn Koleszar
Allow compiling without adding vp8/{common,encoder,decoder} to the include paths. Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
2011-01-25Merge "update sse2 regular quantizer"Johann
2011-01-21Modify sub-pixel filters to eliminate unnecessary calculationsYunqing Wang
In sub-pixel calculation, xoffset and yoffset mostly take some specific values. Modified sub-pixel filter functions according to these possible values to improve performance. Change-Id: I83083570af8b00ff65093467914fbb97a4e9ea21
2011-01-18Fix encoder real-time only configuration.Attila Nagy
Remove allocation/deallocation of stats storage. Remove full search functions in machine specific encoder inits. Remove last pass validation in validate_config. Change-Id: I7f29be69273981a4fef6e80ecdb6217c68cbad4e
2011-01-14update sse2 regular quantizerJohann
about ~5% gain on 32bit. disabled for 64bit unset executable bit on ssse3 version (cosmetic) Change-Id: I1a5860839eb294ce4261f819caea2dcfa78e57ca
2011-01-11use unaligned loadJohann
source buffer is not guaranteed to be aligned for odd size buffers Change-Id: Id0b1fd40ba3bd6c994bcfada788feccd2b53c5a9
2011-01-06x86 sse2 temporal_filter_applyJohann
count can be reduced to short because the max number of filtered frames is set to 15. the max value for any frame is 32 (modifier = 16, filter_weight = 2). 15*32 = 480 which requires 9 bits this function goes from about 7000 us / 1000 iterations for the C code to < 275 us / 1000 iterations for sse2 for block_size = 16 and from about 1800 us / 1000 iters to < 100 us / 1000 iters for block_size = 8 Change-Id: I64a32607f58a2d33c39286f468b04ccd457d9e6e
2010-12-28Use the fast quantizer for inter mode selectionScott LaVarnway
Use the fast quantizer for inter mode selection and the regular quantizer for the rest of the encode for good quality, speed 1. Both performance and quality were improved. The quality gains will make up for the quality loss mentioned in I9dc089007ca08129fb6c11fe7692777ebb8647b0. Change-Id: Ia90bc9cf326a7c65d60d31fa32f6465ab6984d21
2010-12-13remove unused temporal preproc codeJohn Koleszar
This code is unused, as the current preproc implementation uses the same spatial filter that postproc uses. Change-Id: Ia06d5664917d67283f279e2480016bebed602ea7
2010-12-09vp8 fast quantizer sse2 optimizations for eob.Fritz Koenig
Changed the end of block computation to use pmaxw. Removed additional pushing and popping of registers that was not needed. Change-Id: I08cb9b424513cd8a2c7ad8cea53b4e2adc66ef98
2010-11-15Remove stack shadowing for x86-x64 for SAD functions.Fritz Koenig
x86-64 passes arguments in registers. There is no need to push them to the stack before using them. This fixes 15acc84f10cefd98b2f8dbd2eac2cc92c5a3f851 where ebx was not getting preserved on x86. Change-Id: I1214b5f818a0201f75ab6ad7d5c6f448e09b16c2
2010-11-11Revert "Remove stack shadowing for x86-64"Fritz Koenig
This reverts commit 15acc84f10cefd98b2f8dbd2eac2cc92c5a3f851. Change-Id: Ia640be8cbc134432914849c1750f62575ea084e6
2010-11-10Merge "Remove stack shadowing for x86-64"Fritz Koenig
2010-11-10FDCT optimizations.Fritz Koenig
Fixed up the fdct for mmx and 8x4 sse2 to match them most recent changes. Change-Id: Ibee2d6c536fe14dcf75cd6eb1c73f4848a56d719
2010-11-01SSSE3 version of fast quantizerScott LaVarnway
(test clip: tulip) For good quality mode with speed=1, this gave the encoder a small (2 - 3%) performance boost. Change-Id: I8a1d4269465944ac0819986c2f0be4b0a2ee0b35
2010-10-28Save XMM registers in asm functionsYunqing Wang
XMM6/7 are used in these functions, and need to be saved. Change-Id: I3dfaddaf2a69cd4bf8e8735c7064b17bac5a14e5
2010-10-28Fix full-search SAD function crash in Visual StudioYunqing Wang
Unlike GCC, Visual Studio compiler doesn't allocate SAD output array 16-byte aligned, which causes crash in visual studio. Change-Id: Ia755cf5a807f12929bda8db94032bb3c9d0c2362
2010-10-27Full search SAD function optimization in SSE4.1Yunqing Wang
Use mpsadbw, and calculate 8 sad at once. Function list: vp8_sad16x16x8_sse4 vp8_sad16x8x8_sse4 vp8_sad8x16x8_sse4 vp8_sad8x8x8_sse4 vp8_sad4x4x8_sse4 (test clip: tulip) For best quality mode, this gave encoder a 5% performance boost. For good quality mode with speed=1, this gave encoder a 3% performance boost. Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
2010-10-27Fix half-pixel variance RTCD functionsJohn Koleszar
This patch fixes the system dependent entries for the half-pixel variance functions in both the RTCD and non-RTCD cases: - The generic C versions of these functions are now correct. Before all three cases called the hv code. - Wire up the ARM functions in RTCD mode - Created stubs for x86 to call the optimized subpixel functions with the correct parameters, rather than falling back to C code. Change-Id: I1d937d074d929e0eb93aacb1232cc5e0ad1c6184
2010-10-25add missing GET_GOT/RESTORE_GOT pairsJohn Koleszar
These functions made global references but did not set up the GOT, causing compilation failures in PIC mode. Change-Id: Iac473bf46733f87eb2e001cd736af4acf73fa51d
2010-10-21Convert [4][4] matrices to [16] arrays.Timothy B. Terriberry
Most of the code that actually uses these matrices indexes them as if they were a single contiguous array, and coverity produces reports about the resulting accesses that overflow the static bounds of the first row. This is perfectly legal in C, but converting them to actual [16] arrays should eliminate the report, and removes a good deal of extraneous indexing and address operators from the code. Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
2010-10-21Add MMWORD PTR/XMMWORD PTR in subtract_sse2.asmYunqing Wang
Change-Id: Ia649b500ef020225d8bbf611799d0f47658dc2ac
2010-10-21Remove stack shadowing for x86-64Fritz Koenig
x86-64 passes most arguments in registers. There is no need to push them to the stack before using them. Change-Id: I13c683f1358782682ecafaf1df3fb0af23b978ea
2010-10-21Rewrite vp8_short_walsh4x4_sse2()Yunqing Wang
This rewriting reflects changes made in commit "Improve the accuracy of forward walsh-hadamard transform". Since this function is not called much, only a small encoder performance gain (~0.5% ) is seen. Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b
2010-10-18Add SSE2 subtract functionsYunqing Wang
Instead of doing 8-bit data unpack and 16-bit subtraction, use psubb to do 16 8-bit subtractions and pcmpgtb to preserve the sign information. This does not bring noticable gain since these functions are not called frequently. Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e
2010-10-13Fix compiler warning about vp8_fast_quantize_b_impl_ssse2.Fritz Koenig
Typo had function defined as _ssse2 and prototyped as _sse2. Change-Id: If9f19da1a83cff40774a90cf936d601c0bf1b7fe
2010-10-13Correct QWORD usage in assembly filesFritz Koenig
QWORD was being undefined because it was being used incorrectly. Change-Id: I3610cefa3d6f0da4054316760f78b9694cde3876
2010-10-12Merge "Add const qualifiers to variance/SAD functions."John Koleszar
2010-10-12Add const qualifiers to variance/SAD functions.Timothy B. Terriberry
These functions should never change their input, and there's no reason not to declare that. This allows them to be passed static const data. Change-Id: Ia49fe4b01e80e9afcb24b4844817694d4da5995c
2010-10-11Merge "Added vp8_fast_quantize_b_sse2"Scott LaVarnway
2010-10-07Remove unused file in encoderYunqing Wang
Remove vp8/encoder/x86/csystemdependent.c Change-Id: I7c590dcd07b68704d463a1452f62f29ffb1402f4
2010-10-07Added vp8_fast_quantize_b_sse2Scott LaVarnway
Moved vp8_fast_quantize_b_sse from quantize_mmx.asm into quantize_sse2.asm and renamed. Updated the assembly code to match the C version. Change-Id: I1766d9e1ca60e173f65badc0ca0c160c2b51b200
2010-10-04nasm: address labels 'rel label' vice 'wrt rip'Jan Kratochvil
nasm does not support `label wrt rip', it requires `rel label'. It is still fully compatible with yasm. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I488773a4e930a56e43b0cc72d867ee5291215f50
2010-10-04nasm: match instruction length (movd/movq) to parametersJan Kratochvil
nasm requires the instruction length (movd/movq) to match to its parameters. I find it more clear to really use 64bit instructions when we use 64bit registers in the assembly. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: Id9b1a5cdfb1bc05697e523c317a296df43d42a91
2010-09-09Use WebM in copyright notice for consistencyJohn Koleszar
Changes 'The VP8 project' to 'The WebM project', for consistency with other webmproject.org repositories. Fixes issue #97. Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
2010-08-02nasm: end labels with colon (':')Jan Kratochvil
Labels should end by colon (':'), nasm requires it. Provide nasm compatibility. No binary change by this patch with yasm on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on {x86_64,i686}-fedora13-linux-gnu have been checked as safe. Change-Id: I0b2ec6f01afb061d92841887affb5ca0084f936f