summaryrefslogtreecommitdiff
path: root/vp9/common/x86
AgeCommit message (Collapse)Author
2013-02-13WIP: ssse3 version of convolve avg functionsScott LaVarnway
Initial ssse3 convolve avg functions and is one step closer to using x86inc.asm. The decoder performance improved by 8% for the test clip used. This should be revisited later to see if averaging outside the loop is better than having many similar filter functions. Change-Id: Ice3fafb423b02710b0448ffca18b296bcac649e9
2013-02-09Bug fix: ssse3 version of subpixel did not match C codeScott LaVarnway
A 16 bit overflow condition occurs when using the EIGHTTAP_SMOOTH filters. (vp9_sub_pel_filters_8lp) Changed the order of the adds to fix this problem. Also added ssse3 support for 4x4 subpixel filtering. Change-Id: I475eaadae920794c2de5e01e9735c059a856518e
2013-02-08Merge changes Ife0d8147,I7d469716,Ic9a5615f into experimentalJohn Koleszar
* changes: Restore SSSE3 subpixel filters in new convolve framework Convert subpixel filters to use convolve framework Add 8-tap generic convolver
2013-02-08Restore SSSE3 subpixel filters in new convolve frameworkJohn Koleszar
This commit adds the 8 tap SSSE3 subpixel filters back into the code underneath the convolve API. The C code is still called for 4x4 blocks, as well as compound prediction modes. This restores the encode performance to be within about 8% of the baseline. Change-Id: Ife0d81477075ae33c05b53c65003951efdc8b09c
2013-02-06Use configure checks for various inline keywords.Ronald S. Bultje
Change-Id: I8508f1a3d3430f998bb9295f849e88e626a52a24
2013-02-05Convert subpixel filters to use convolve frameworkJohn Koleszar
Update the code to call the new convolution functions to do subpixel prediction rather than the existing functions. Remove the old C and assembly code, since it is unused. This causes a 50% performance reduction on the decoder, but that will be resolved when the asm for the new functions is available. There is no consensus for whether 6-tap or 2-tap predictors will be supported in the final codec, so these filters are implemented in terms of the 8-tap code, so that quality testing of these modes can continue. Implementing the lower complexity algorithms is a simple exercise, should it be necessary. This code produces slightly better results in the EIGHTTAP_SMOOTH case, since the filter is now applied in only one direction when the subpel motion is only in one direction. Like the previous code, the filtering is skipped entirely on full-pel MVs. This combination seems to give the best quality gains, but this may be indicative of a bug in the encoder's filter selection, since the encoder could achieve the result of skipping the filtering on full-pel by selecting one of the other filters. This should be revisited. Quality gains on derf positive on almost all clips. The only clip that seemed to be hurt at all datarates was football (-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR, 0.347% SSIM. Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff
2013-01-23Intrinsic version of loopfilter now matches C codeScott LaVarnway
Updated the instrinsic code to match Yaowu's latest loopfilter change. (I584393906c4f5f948a581d6590959522572743bb) The decoder performance improved by ~30% for the test clip used. Change-Id: I026cfc75d5bcb7d8d58be6f0440ac9e126ef39d2
2013-01-14fix a number issues that cause failuresYaowu Xu
During master jenkins verification proces Change-Id: I3722b8753eaf39f99b45979ce407a8ea0bea0b89
2013-01-14Merge experiment "widerlpf"Yaowu Xu
Change-Id: I0c94475075e66e13cfe4c20fab7db6474441ae86
2013-01-11Merge "WIP: Added sse2 version of vp9_mb_lpf_horizontal_edge_w" into ↵Jim Bankoski
experimental
2013-01-11WIP: Added sse2 version of vp9_mb_lpf_horizontal_edge_wScott LaVarnway
and vp9_mb_lpf_vertical_edge_w_sse2. This was quickly done so we can run some tests over the weekend. Future commits will optimize/refactor these functions further. The decoder performance improved by ~17% for the clip used. Change-Id: I612687cd5a7670ee840a0cbc3c68dc2b84d4af76
2013-01-11Merge "Add loop filtering for UV plane" into experimentalYaowu Xu
2013-01-11Add loop filtering for UV planeYaowu Xu
On block boundary within a MB when 8x8 block boundary only is filtered for Y. Change-Id: Ie1c804c877d199e78e2fecd8c2d3f1e114ce9ec1
2013-01-11Initial sse2 version of the wide loopfiltersScott LaVarnway
Updated the rtcd_defs and used the sse2 uv version of the loopfilter. The performance improved by ~8% for the test clip used. Change-Id: I5a0bca3b6674198d40ca4a77b8cc722ddde79c36
2013-01-08Merge "vp9_sub_pixel_variance16x2 SSE2 optimization" into experimentalYunqing Wang
2013-01-08vp9_sub_pixel_variance16x2 SSE2 optimizationYunqing Wang
About 5% decoder speedup. Change-Id: Ib6687d337af758a536a0e7e289f400990f1f9794
2013-01-08Merge vp9-preview changes into experimental branchJohn Koleszar
Incorportate vp9-preview changes by merging master branch into experimental. Conflicts: test/test.mk vp9/common/vp9_filter.c vp9/common/vp9_idctllm.c vp9/common/vp9_invtrans.h vp9/common/vp9_mbpitch.c vp9/common/vp9_rtcd_defs.sh vp9/common/vp9_systemdependent.h vp9/common/vp9_type_aliases.h vp9/common/x86/vp9_asm_stubs.c vp9/common/x86/vp9_subpixel_mmx.asm vp9/decoder/vp9_decodframe.c vp9/decoder/vp9_dequantize.c vp9/decoder/vp9_dequantize.h vp9/decoder/vp9_onyxd_int.h vp9/encoder/vp9_bitstream.c vp9/encoder/vp9_encodeframe.c vp9/encoder/vp9_rdopt.c Change-Id: I17f51c3666d1b59cf1a699f87607cbc5d30a87c5
2012-12-26Build fixes to merge vp9-preview into masterJohn Koleszar
Various fixups to resolve issues when building vp9-preview under the more stringent checks placed on the experimental branch. Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07
2012-12-21Removed mmx versions of vp9_bilinear_predict filtersScott LaVarnway
These filters will not work with VP9. Change-Id: Ic26c77961084fcea6bfa97f4cd95afdea2282e85
2012-12-21Merge "add emmintrin_compat.h for builds with gcc < 4" into vp9-previewJohn Koleszar
2012-12-21fixed sizes of global arraysJim Bankoski
Change-Id: Ibc077cf1c1da0c86063f88c6d3073c6876989119
2012-12-20add emmintrin_compat.h for builds with gcc < 4James Zern
Change-Id: If7822e6fcd0d3568b934032322b19ba3e401df26
2012-12-20add private to assembly files to insure proper chromebuildJim Bankoski
Change-Id: I6e43ca73f35401a974ed8ee27738d4318f09fd37
2012-12-03Merge "fixes --disable-vp9-encoder" into vp9-previewJim Bankoski
2012-12-03fixes --disable-vp9-encoderJim Bankoski
Change-Id: I467bf0fdf3b35326bcce58d5459e6d2dbfd6c5e5
2012-11-30google style guide include guardsJim Bankoski
Change-Id: I2c252f3ddcc99e96c1f5d3dab8bcb25a2a3637ea
2012-11-29Merge "Further improve macroblock loop filters" into experimentalYunqing Wang
2012-11-29fix vp9_vp8 files renamedJim Bankoski
Change-Id: I20c426e91ee49666db42e20eb074095ab6b8ec5d
2012-11-28more rtcd cleanupJim Bankoski
Change-Id: Ieefd76e164ca4aa87597da0412977614ddfbacb7
2012-11-28remove postproc invokesJim Bankoski
and some miscellaneous invoke left overs Change-Id: I63191b1bfd3bea4ce30cceaeb686ec850570fc43
2012-11-28Further improve macroblock loop filtersYunqing Wang
This change included: 1. Aligned reads in vp9_mbloop_filter_vertical_edge function. Since we actually read 16 bytes, we can align the reads to read starting at (s - 8) instead of (s - 5). 2. Combined u, v loop filters. 3. Added 8x16 transpose. This gave 2% decoder performance gain (tulip clip). Change-Id: Ib14c2f1645c4a3436df17fe2f24789506bf0bb58
2012-11-27Add vp9_ prefix to all vp9 filesJohn Koleszar
Support for gyp which doesn't support multiple objects in the same static library having the same basename. Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
2012-11-26Improve sad3x16 SSE2 functionYunqing Wang
Vp9_sad3x16_sse2() is heavily called in decoder, in which the unaligned reads consume lots of cpu cycles. When CONFIG_SUBPELREFMV is off, the unaligned offset is 1. In this situation, we can adjust the src_ptr to be 4-byte aligned, and then do the aligned reads. This reduced the reading time significantly. Tests on 1080p clip showed over 2% decoder performance gain with CONFIG_SUBPELREFM off. Change-Id: I953afe3ac5406107933ef49d0b695eafba9a6507
2012-11-24removed the idct rtcd idct callsJim Bankoski
More cleanup to do after this, but this is a good chunk of removing rtcd. Change-Id: I551db75e341a0a85c3ad650df1e9a60dc305681a
2012-11-21remove subpixel invoke functionsJim Bankoski
Removed the rtcd subpixel invoke functions. Change-Id: I8b7618bd5813333fac66b2817bdf807616e0fb33
2012-11-20Fix ref_stride in sad functionYunqing Wang
Used ref_stride. Change-Id: I31f0a3bb935520f54d11a1d87315627f162ae845
2012-11-15support building vp8 and vp9 into a single libJohn Koleszar
Change-Id: Ib8f8a66c9fd31e508cdc9caa662192f38433aa3d
2012-11-07merge full pixel refmv experimentYaowu Xu
Change-Id: Ib39ad47a7d188f3b45416937b7eeb28c3e79b74c
2012-11-03loopfilter: prevent signed integer overflowJames Zern
use unsigned ints to extended filter values in vp9_mbloop_filter_horizontal_edge_c_sse2 Change-Id: I55ec3ac2bcb9baf55626b0384d151b07fc8e087d
2012-11-01Rename vp8/ codec directory to vp9/.Ronald S. Bultje
Change-Id: Ic084c475844b24092a433ab88138cf58af3abbe4