summaryrefslogtreecommitdiff
path: root/vp9/common/x86
AgeCommit message (Collapse)Author
2013-06-12Quick modifications to mb loopfilter intrinsic functionsScott LaVarnway
Modified to work with 8x8 blocks of memory. Will revisit later for further optimizations. For the HD clip used, the decoder improved by almost 20%. Change-Id: Iaa4785be293a32a42e8db07141bd699f504b8c67
2013-06-12Merge "Quick modifications to wide loopfilter intrinsic functions"Yaowu Xu
2013-06-12Quick modifications to wide loopfilter intrinsic functionsScott LaVarnway
Modified to work with 8x8 blocks of memory. Will revisit later for further optimizations. For the HD clip used, the decoder improved my 20%. Change-Id: Ia0057f55d66d1445882351ea6c43b595a5a980e5
2013-06-12Remove some unused loopfilter codeJohn Koleszar
This code is unreachable, and not useful for later reference. Change-Id: I4c9a9e0fbf859c1081bbcfbcda9710afb4b4741f
2013-05-21Removed unused idct functionsScott LaVarnway
No longer used. Change-Id: Id28c9247cebba183c6fa786dff96824ae100132c
2013-05-21Removed vp9_recon functionsScott LaVarnway
No longer used. Change-Id: Ica5166f7117f4693dffdf7633dcfc1b263103d0d
2013-05-20WIP: 4x4 idct/recon mergeScott LaVarnway
This patch eliminates the intermediate diff buffer usage by combining the short idct and the add residual into one function. The encoder can use the same code as well. Change-Id: I296604bf73579c45105de0dd1adbcc91bcc53c22
2013-05-16WIP: 8x8 idct/recon mergeScott LaVarnway
This patch eliminates the intermediate diff buffer usage by combining the short idct and the add residual into one function. The encoder can use the same code as well. Change-Id: Iacfd57324fbe2b7beca5d7f3dcae25c976e67f45
2013-05-15WIP: 16x16 idct/recon mergeScott LaVarnway
This patch eliminates the intermediate diff buffer usage by combining the short idct and the add residual into one function. The encoder can use the same code as well. Change-Id: Iea7976b22b1927d24b8004d2a3fddae7ecca3ba1
2013-05-14WIP: 32x32 idct/recon mergeScott LaVarnway
This patch eliminates the intermediate diff buffer usage by combining the short idct and the add residual into one function. The encoder can use the same code as well. Change-Id: I4ea09df0e162591e420d869b7431c2e7f89a8c1a
2013-05-10Removing unused simple loopfilter code.Dmitry Kovalev
Change-Id: Ic11dc052fb641687c015e1bbc37181b9babcd43e
2013-04-26Merge branch 'master' into experimentalJohann
Conflicts: vp9/common/vp9_findnearmv.c vp9/common/vp9_rtcd_defs.sh vp9/decoder/vp9_decodframe.c vp9/decoder/x86/vp9_dequantize_sse2.c vp9/encoder/vp9_rdopt.c vp9/vp9_common.mk Resolve file name changes in favor of master. Resolve rdopt changes in favor of experimental, preserving the newer experiments. Change-Id: If51ed8f457470281c7b20a5c1a2f4ce2cf76c20f
2013-04-25Rename vp9_idct_x86.cJohann
Remove similarly named header file. It is obsolete. Move file to match naming style. Adjust make file to include the file correctly and remove extra unnecessary #if guard. Change-Id: Ifba07ba9938a5df08a9f4eda54a3ac4d6983f7bf
2013-04-19Move dst to per-plane MACROBLOCKD dataJohn Koleszar
First in a series of commits moving the framebuffers pointers to per-plane data, so that they can be indexed numerically rather than by name. Change-Id: I6e0d60fd4d51e6375c384eb7321776564df21775
2013-04-19Use SSSE3 for 2d filters larger than 16John Koleszar
The C code was being used as a fallback for the >16 case, but only for 2D. Change-Id: I1e2e6da9e4b28bd88bde9ba4dd32724ce466cf6f
2013-04-18Merge "Make the use of pred buffers consistent in MB/SB" into experimentalJingning Han
2013-04-18Make the use of pred buffers consistent in MB/SBJingning Han
Use in-place buffers (dst of MACROBLOCKD) for macroblock prediction. This makes the macroblock buffer handling consistent with those of superblock. Remove predictor buffer MACROBLOCKD. Change-Id: Id1bcd898961097b1e6230c10f0130753a59fc6df
2013-04-18convolve: support larger blocks, fix asm saturation bugJohn Koleszar
Updates the common convoloution code to support blocks larger than 16x16, and rectangular blocks. This uncovered a bug in the SSSE3 filtering routines due to the order of application of saturation. This commit fixes that bug, adjusts the unit test to bias its random values towards the extremes, and adds a test to ensure that all filters conform to the expected pairwise addition structure. Change-Id: I81f69668b1de0de5a8ed43f0643845641525c8f0
2013-04-17clean out experimentsYaowu Xu
that are related to using reconstructed pixel for selecting reference motion vectors. Change-Id: I048dfae39ca7385e344b57d46347ecc6e753e1bb
2013-04-16Merge branch 'experimental' into masterJohn Koleszar
VP9 preview bitstream 2, commit '868ecb55a1528ca3f19286e7d1551572bf89b642' Conflicts: vp9/vp9_common.mk Change-Id: I3f0f6e692c987ff24f98ceafbb86cb9cf64ad8d3
2013-04-02Merge branch 'master' into experimentalJohn Koleszar
Conflicts: vp9/vp9_common.mk Change-Id: I2cd5ab47dc31c4210cefc23a282102123d5e2221
2013-04-02Demux vp9_loopfilter_x86.cJohann
Allow more careful targeting of compiler flags. Change-Id: I963ab4a6479dedb165419310dfca52a58a9877b8
2013-04-02vp9_sadmxn_x86 only contains SSE2 functionsJohann
Rename the file and clean up includes. In the future we would like to pattern match the files which need additional compiler flags. Change-Id: I2c76256467f392a78dd4ccc71e6e0a580e158e56
2013-03-27Modify idct code to use macroYunqing Wang
Small modification of idct code. Change-Id: I5c4e3223944c68e4ccf762f6cf07c990250e4290
2013-03-27Optimize 32x32 idct functionYunqing Wang
Wrote sse2 version of vp9_short_idct_32x32 function. Compared to c version, the sse2 version is 5X faster. Change-Id: I071ab7378358346ab4d9c6e2980f713c3c209864
2013-03-21Optimize 16x16 idct10 functionYunqing Wang
Wrote sse2 version of vp9_short_idct10_16x16 function. Compared to c version, the sse2 version is 2.3X faster. Change-Id: I314c4f09369648721798321eeed6f58e38857f26
2013-03-21Merge "Optimize 16x16 idct function" into experimentalYunqing Wang
2013-03-21Optimize 16x16 idct functionYunqing Wang
Wrote sse2 version of vp9_short_idct16x16 function. Compared to c version, the sse2 version is over 2.5X faster. Change-Id: I38536e2b846427a2cc5c5423aaf305fd0e605d61
2013-03-20Code cleanup: lower case variable names.Dmitry Kovalev
Renaming Width to width, Height to height and Version to version in several structs and function signatures. Change-Id: I084c3f7e747cb2ce3345aff27a3dff9b13a87543
2013-03-18Optimize 8x8 idct functionYunqing Wang
Wrote sse2 functions of vp9_short_idct8x8 and vp9_short_idct10_8x8. Compared to c version, the sse2 version is 2X faster. The decoder test didn't show noticeable gain since 8x8 idct doesn't take much of decoding time (less than 1% in my test). Change-Id: I56313e18cd481700b3b52c4eda5ca204ca6365f3
2013-03-13removed reference to "LLM" and "x8"Yaowu Xu
The commit changed the name of files and function to remove obselete reference to LLM and x8. Change-Id: I973b20fc1a55149ed68b5408b3874768e6f88516
2013-03-08Add vp9_idct4_1d_sse2Yunqing Wang
Added SSE2 idct4_1d which is called by vp9_short_iht4x4. Also, modified the parameter type passed to vp9_short_iht functions to make it work with rtcd prototype. Change-Id: I81ba7cb4db6738f1923383b52a06deb760923ffe
2013-03-04Merge "Optimize vp9_short_idct4x4llm function" into experimentalYunqing Wang
2013-03-04Optimize vp9_short_idct4x4llm functionYunqing Wang
Wrote a SSE2 vp9_short_idct4x4llm to improve the decoder performance. Change-Id: I90b9d48c4bf37aaf47995bffe7e584e6d4a2c000
2013-03-01Merge master branch into experimentalJohn Koleszar
Picks up some build system changes, compiler warning fixes, etc. Change-Id: I2712f99e653502818a101a72696ad54018152d4e
2013-02-27Merge "Remove unused file" into experimentalYunqing Wang
2013-02-27Remove unused fileYunqing Wang
Removed vp9_idctllm_mmx.asm Change-Id: I7152756f23a5a09ed69e8fb40edb2ab3237290fe
2013-02-27Fix --as=nasm compatibility for new asm code.Jan Kratochvil
s/movd/movq/ Change-Id: Id1a56de91551f8dc796f14f1056c565dfc1ba626
2013-02-26Optimize vp9_dc_only_idct_add_c functionYunqing Wang
Wrote SSE2 version of vp9_dc_only_idct_add_c function. In order to improve performance, clipped the absolute diff values to [0, 255]. This allowed us to keep the additions/subtractions in 8 bits. Test showed an over 2% decoder performance increase. Change-Id: Ie1a236d23d207e4ffcd1fc9f3d77462a9c7fe09d
2013-02-13WIP: ssse3 version of convolve avg functionsScott LaVarnway
Initial ssse3 convolve avg functions and is one step closer to using x86inc.asm. The decoder performance improved by 8% for the test clip used. This should be revisited later to see if averaging outside the loop is better than having many similar filter functions. Change-Id: Ice3fafb423b02710b0448ffca18b296bcac649e9
2013-02-09Bug fix: ssse3 version of subpixel did not match C codeScott LaVarnway
A 16 bit overflow condition occurs when using the EIGHTTAP_SMOOTH filters. (vp9_sub_pel_filters_8lp) Changed the order of the adds to fix this problem. Also added ssse3 support for 4x4 subpixel filtering. Change-Id: I475eaadae920794c2de5e01e9735c059a856518e
2013-02-08Merge changes Ife0d8147,I7d469716,Ic9a5615f into experimentalJohn Koleszar
* changes: Restore SSSE3 subpixel filters in new convolve framework Convert subpixel filters to use convolve framework Add 8-tap generic convolver
2013-02-08Restore SSSE3 subpixel filters in new convolve frameworkJohn Koleszar
This commit adds the 8 tap SSSE3 subpixel filters back into the code underneath the convolve API. The C code is still called for 4x4 blocks, as well as compound prediction modes. This restores the encode performance to be within about 8% of the baseline. Change-Id: Ife0d81477075ae33c05b53c65003951efdc8b09c
2013-02-06Use configure checks for various inline keywords.Ronald S. Bultje
Change-Id: I8508f1a3d3430f998bb9295f849e88e626a52a24
2013-02-05Convert subpixel filters to use convolve frameworkJohn Koleszar
Update the code to call the new convolution functions to do subpixel prediction rather than the existing functions. Remove the old C and assembly code, since it is unused. This causes a 50% performance reduction on the decoder, but that will be resolved when the asm for the new functions is available. There is no consensus for whether 6-tap or 2-tap predictors will be supported in the final codec, so these filters are implemented in terms of the 8-tap code, so that quality testing of these modes can continue. Implementing the lower complexity algorithms is a simple exercise, should it be necessary. This code produces slightly better results in the EIGHTTAP_SMOOTH case, since the filter is now applied in only one direction when the subpel motion is only in one direction. Like the previous code, the filtering is skipped entirely on full-pel MVs. This combination seems to give the best quality gains, but this may be indicative of a bug in the encoder's filter selection, since the encoder could achieve the result of skipping the filtering on full-pel by selecting one of the other filters. This should be revisited. Quality gains on derf positive on almost all clips. The only clip that seemed to be hurt at all datarates was football (-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR, 0.347% SSIM. Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff
2013-01-23Intrinsic version of loopfilter now matches C codeScott LaVarnway
Updated the instrinsic code to match Yaowu's latest loopfilter change. (I584393906c4f5f948a581d6590959522572743bb) The decoder performance improved by ~30% for the test clip used. Change-Id: I026cfc75d5bcb7d8d58be6f0440ac9e126ef39d2
2013-01-14fix a number issues that cause failuresYaowu Xu
During master jenkins verification proces Change-Id: I3722b8753eaf39f99b45979ce407a8ea0bea0b89
2013-01-14Merge experiment "widerlpf"Yaowu Xu
Change-Id: I0c94475075e66e13cfe4c20fab7db6474441ae86
2013-01-11Merge "WIP: Added sse2 version of vp9_mb_lpf_horizontal_edge_w" into ↵Jim Bankoski
experimental
2013-01-11WIP: Added sse2 version of vp9_mb_lpf_horizontal_edge_wScott LaVarnway
and vp9_mb_lpf_vertical_edge_w_sse2. This was quickly done so we can run some tests over the weekend. Future commits will optimize/refactor these functions further. The decoder performance improved by ~17% for the clip used. Change-Id: I612687cd5a7670ee840a0cbc3c68dc2b84d4af76