summaryrefslogtreecommitdiff
path: root/vp9/vp9_common.mk
AgeCommit message (Collapse)Author
2014-03-05Removing vp9_onyx.h and moving its content to the encoder.Dmitry Kovalev
Change-Id: I03451c88536bc498edddbe0cd9773ff79da085c2
2014-03-03build: convert rtcd.sh to perlJames Zern
significantly speeds up file generation. the goal of this change is to convert rtcd.sh to perl as directly as possible to allow for simple comparison. future changes can make it more perl-like. --- Linux [CREATE] vpx_scale_rtcd.h real 0m0.485s -> 0m0.022s [CREATE] vp8_rtcd.h real 0m4.619s -> 0m0.060s [CREATE] vp9_rtcd.h real 0m10.102s -> 0m0.087s Windows [CREATE] vpx_scale_rtcd.h real 0m8.360s -> 0m0.080s [CREATE] vp8_rtcd.h real 1m8.083s -> 0m0.160s [CREATE] vp9_rtcd.h real 2m6.489s -> 0m0.233s Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
2014-02-26Removing vp9_systemdependent.c.Dmitry Kovalev
Change-Id: I7b9738a7113c0c4687e5d320581ff69d98a8b271
2014-02-14SSSE3 convolution optimizationlevytamar82
Optimizing all SSSE3 assembly for convolution: 1. vp9_filter_block1d4_h8_sse2 2. vp9_filter_block1d8_h8_sse2 3. vp9_filter_block1d16_h8_sse2 4. vp9_filter_block1d4_v8_sse2 5. vp9_filter_block1d8_v8_sse2 6. vp9_filter_block1d16_v8_sse2 my optimization include: -processing 2x8 elements in one 128 bit register instead of processing 8 elements in one 128 bit register. -removing unecessary loads. This optimization gives between 2.4% user level gain for 480p input and 1.6% user level gain for 720p. This Optimization is done only for 64 bit Change-Id: Ic07fce2f9360329b4f2d956efda1480ae958766b
2014-02-12AVX2 Convolve Optimizationlevytamar82
Two convolve functions were optimized for AVX2: 1. vp9_filter_block1d16_h8 2. vp9_filter_block1d16_v8 vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of loop strides by half, two strides were processed in parallel. vp9_filter_block1d16_v8 was also optimized in the same way also some of the loads were being done outside of the loop and by that preventing redundant loads. This Optimization gives 43% function level gain and 1.3% user level gain. Now can be compiled in Windows Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
2014-02-11Merge "Add get release decoder frame buffer functions."Frank Galligan
2014-02-10Merge "*.mk: s/\bUSE_X86INC/CONFIG_USE_X86INC/"James Zern
2014-02-10Add get release decoder frame buffer functions.Frank Galligan
This CL changes libvpx to call a function when a frame buffer is needed for decode. Libvpx will call a release callback when no other frames reference the frame buffer. This CL adds a default implementation of the frame buffer callbacks. Currently only VP9 is supported. A future CL will add support for applications to supply their own frame buffer callbacks. Change-Id: I1405a320118f1cdd95f80c670d52b085a62cb10d
2014-02-04*.mk: s/\bUSE_X86INC/CONFIG_USE_X86INC/James Zern
CONFIG_USE_X86INC is available to every makefile, there's no need to duplicate its value with USE_X86INC Change-Id: Id12bd5f09cba78abba56ab5a8f56351562e5b8b6
2014-02-04Optimize bilinear sub-pixel filters in ssse3Yunqing Wang
This patch added ssse3 optimization of bilinear sub-pixel filters. The real time encoder was speeded up by ~1%. Change-Id: Ie82e98976f411183cb8c61ab8d2ba0276e55a338
2014-02-03Merge "Removing "_short" suffix from arm transform file names."Dmitry Kovalev
2014-02-03Optimize bilinear sub-pixel filters in sse2Yunqing Wang
Using bilinear filters could speed up the codec in real-time mode. This patch added sse2 optimizations of bilinear filters that operate on different-sized blocks. Tests showed that the real-time encoder was speeded up by 3%. Change-Id: If99a7ee4385fcc225c3ee7445d962d5752e57c3f
2014-01-31static function convert to inline or global vp9_blockd.hJim Bankoski
Change-Id: Ifdd951f24932839f06d1c700371662511dde6ebe
2014-01-31Removing "_short" suffix from arm transform file names.Dmitry Kovalev
Change-Id: Iefe118f61a335e88821a21a9f50fb919212c1507
2014-01-16Revert "Revert "Revert "SSSE3 convolution optimization"""Yunqing Wang
This reverts commit f9404f240642222775a371acde8fc0721b3812df. This patch caused some ASAN error. Change-Id: If15b7e581310e19061d111c69f2931809662ed19
2014-01-13Revert "Revert "SSSE3 convolution optimization""Yunqing Wang
This reverts commit b645257121da20b422dbbebf02aae0fc6dff95d4. Change-Id: I60d1bf57ae8e9eb6127f42f2d5a780124ac51b45
2014-01-10Revert "SSSE3 convolution optimization"Paul Wilkins
This reverts commit 511d218c60b9b6c1ab9383db746815e907af0359. In current form intrinsics break borg build. Change-Id: Ied37936af841250ecff449802e69a3d3761c91b9
2014-01-09Merge "SSSE3 convolution optimization"Yunqing Wang
2014-01-09SSSE3 convolution optimizationlevytamar82
Optimizing all SSSE3 assembly for convolution: 1. vp9_filter_block1d4_h8_sse2 2. vp9_filter_block1d8_h8_sse2 3. vp9_filter_block1d16_h8_sse2 4. vp9_filter_block1d4_v8_sse2 5. vp9_filter_block1d8_v8_sse2 6. vp9_filter_block1d16_v8_sse2 my optimization include: -processing 2x8 elements in one 128 bit register instead of processing 8 elements in one 128 bit register. -removing unecessary loads. This optimization gives between 2.4% user level gain for 480p input and 1.6% user level gain for 720p. This Optimization done only for 64bit. Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
2014-01-08Merge "Add initial intra frame neon optimization. 1~2% gain."hkuang
2014-01-08Add initial intra frame neon optimization. 1~2% gain.hkuang
More intra optimizations will be added. Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8
2013-12-19Removing vp9_findnearmv.{h, c} files.Dmitry Kovalev
Moving all code from that files to vp9_mvref_common.{h, c}. Change-Id: Ibc4afcb8cea6847166ff411130e93611ebe63b20
2013-12-16Converting vp9_treecoder.h to vp9_prob.{h, c}Dmitry Kovalev
Moving vp9_norm probability table from vp9_entropy.c to vp9_prob.c Change-Id: Ie757b73860c6f43130790c332b292e2a1a81b788
2013-12-05Moving vp9_tree_probs_from_distribution() to encoder.Dmitry Kovalev
Writing custom coeff branch count calculation (which is much clearer) in adapt_coef_probs() function. Removing vp9_treecoder.c file. Change-Id: I8880fb7a39996c8bcf6cd0acf9898a8c712ba91f
2013-12-04Removing vp9_default_coef_probs.h file.Dmitry Kovalev
Moving all probability tables from removed file to vp9_entropy.c. Change-Id: I12846f1da778c3016d96b82e53384d4634883430
2013-11-26Fix 16 wide neon horz loopfilter.Frank Galligan
Multiply by 3 was on 8bit vectors when it should have been on 16bit vectors. Change-Id: I248c1429b3134dfd171dfab0ebb109fd2437e1fc
2013-11-21Revert "Add 16 wide neon horz loopfilter."Frank Galligan
The change caused mismatches with some test vectors on neon. Original CL: https://gerrit.chromium.org/gerrit/#/c/67863/ Change-Id: I913891636d53783e93cb1865ca78ded1821dc4b0
2013-11-21Merge "Add 16 wide neon horz loopfilter."Frank Galligan
2013-11-21Add 16 wide neon horz loopfilter.Frank Galligan
Add support to do 16 pixel horizontal filtering in Neon. Nexus devices saw about 0.5% decode speed increase. Change-Id: I2993f6c2d49f31fa74976879eeaa289fd3f4e15d
2013-11-19Move vp9_sadmxn.h from common to encoderYaowu Xu
Change-Id: I6f6ba91b1b8b280902b171472314d665aa0baf0b
2013-11-18Merge "Move vp9_extend.{h,c} from common to encoder"Yaowu Xu
2013-11-18Move vp9_extend.{h,c} from common to encoderYaowu Xu
Since they used in encoder only. This commit also re-order includes for the files that include vp9_extend.h Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459
2013-11-15Do horizontal loopfiltering in parallelYunqing Wang
This patch followed "Rewrite filter_selectively_horiz for parallel loopfiltering" commit, and added x86 SSE2 optimization to do 16-pixel filtering in parallel. Also, corrected the declaration of aligned arrays. For 8-pixel-in-parallel case, improved the calculation of the masks and filters. Updated the threshold loading since the thresholds were already duplicated. Updated neon C functions to call neon loopfilters twice. Using tulip clip, tests showed it gave a ~1.5% decoder speed gain. Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
2013-11-13Merge "mips dsp-ase r2 vp9 decoder intra module optimizations (rebase)"Johann
2013-11-13mips dsp-ase r2 vp9 decoder intra module optimizations (rebase)Parag Salasakar
Change-Id: Ib27fc4f3dbe01fe8adfa04a61aaba21b3480e75c
2013-11-13mips dsp-ase r2 vp9 decoder loopfilter module optimizations (rebase)Parag Salasakar
Change-Id: Ia7f640ca395e8deaac5986f19d11ab18d85eec2d
2013-11-07Merge "Add back vp9_short_idct32x32_1_add_neon which is deleted in cleanup ↵hkuang
I63df79a13cf62aa2c9360a7a26933c100f9ebda3."
2013-11-05Add back vp9_short_idct32x32_1_add_neon which is deleted inhkuang
cleanup I63df79a13cf62aa2c9360a7a26933c100f9ebda3. Change-Id: I034848cf05031618818f7df2e7f9c35102686948
2013-10-31mb_lpf_horizontal_edge AVX2 optimizationTamar Levy
This CL contains two AVX2 optimized loop filter functions, mb_lpf_horizontal_edge_w_avx2_8 and mb_lpf_horizontal_edge_w_avx2_16. Change-Id: I604e4fe6e99752b7800c2ea98721d97f7e0b931b
2013-10-24mips dsp-ase r2 vp9 decoder idct module optimizations (rebase)Parag Salasakar
Change-Id: Iedcdb8867084f328f4fce2fadb968e0984217308
2013-10-11Merge "SSE2 8-tap sub-pixel filter optimization"Yunqing Wang
2013-10-10SSE2 8-tap sub-pixel filter optimizationYunqing Wang
To ensure fast encoding/decoding on devices without ssse3 support, SSE2 optimization of sub-pixel filters was done. Test using 1080p clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps with sse2 filters, and ~15fps with c filters. Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
2013-10-10Merge "Moving all scan/iscan code into separate vp9_scan.{h, c} files."Dmitry Kovalev
2013-10-09mips dsp-ase r2 vp9 decoder bilinear convolve optimizationsParag Salasakar
Change-Id: Ic31b4ef85e65070b4f8b9f26e068ccfaae00c4f0
2013-10-07Moving all scan/iscan code into separate vp9_scan.{h, c} files.Dmitry Kovalev
Now we have entropy code separate from scan/iscan code. The next step in future is to move iscan code from common part to the encoder. Change-Id: Id9732f7d80aec00af35c1d58d1137c4c96c91451
2013-10-02mips dsp-ase r2 vp9 decoder convolve module optimizationsParag Salasakar
Change-Id: I401536778e3c68ba2b3ae3955c689d005e1f1d59
2013-09-29Merge "Removing vp9_subpelvar.h from common."Dmitry Kovalev
2013-09-27Properly save neon registers.Christian Duvivier
Replace current code which corrupts the stack by duplicate of vp8 code to save and restore neon registers. Change-Id: Ibb0220b9aa985d10533befa0a455ebce57a2891a
2013-09-25Fix a bunch of TODO from vp9_short_idct32x32_add_neon.Christian Duvivier
- full ASM version, no more C gateway file. - integrate combine-add with last step of 2nd pass. - remove a few push/pop pairs. - some instruction reordering to hide latency. Change-Id: Ic9d9933c908b65d1bf7ba8fd47b524cda808c9c6
2013-09-25Removing vp9_subpelvar.h from common.Dmitry Kovalev
Moving all code from that file to vp9_variace_c.c in the encoder. Change-Id: Ic803d5b4c78d5191e4d25541b3df97337878fc3e