summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-07-01Quantize (64-bit only, for now) SSSE3 SIMD.Ronald S. Bultje
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is x86-64 only, it needs some minor modifications to be 32bit compatible, because it uses 15 xmm registers, whereas 32bit only has 8. Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
2013-07-01fix a mismatch in cpuused 2Yaowu Xu
Change-Id: I921c9faba6386535aaf717a54301dd346a9b8540
2013-06-29Merge "Enable SSE2 4x4 ADST/DCT transform"Jingning Han
2013-06-29SSE2 version of vp9_short_fdct32x32_rd.Christian Duvivier
43,000 -> 5,750 cycles, about 7.5x faster. Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0
2013-06-29Merge "fixed a bug where sse is not populated"Ronald S. Bultje
2013-06-28Merge "add Neon optimized add constant residual functions"Johann
2013-06-28Merge "fix test compile error"James Zern
2013-06-28Merge "Inline vp9_get_coef_context() (and remove vp9_ prefix)."Ronald S. Bultje
2013-06-28Merge "Minor change to prevent one level of dereference in cost_coeffs()."Ronald S. Bultje
2013-06-28add Neon optimized add constant residual functionschm
- Add add_constant_residual_8x8 16x16 32x32 functions - Tested under RealView debugger enviroment Change-Id: I5c3a432f651b49bf375de6496353706a33e3e68e
2013-06-28Merge "Cosmetic reordering of FRAME_CONTEXT members."Dmitry Kovalev
2013-06-28fix test compile errorJames Zern
since: 92479d9 Make update_partition_context faster fixes: vp9/common/vp9_blockd.h:408:22: error: non-constant-expression cannot be narrowed from type 'int' to 'char' in initializer list [-Wc++11-narrowing] char pcvalue[2] = {~(0xe << boffset), ~(0xf <<boffset)}; ^~~~~~~~~~~~~~~~~ Change-Id: Id5b00b9a72d00a2b314081a23879bd1fa3ce983b
2013-06-28Enable SSE2 4x4 ADST/DCT transformJingning Han
This commit enables SSE2 4x4 foward hybrid transform. The runtime goes from 249 cycles down to 74 cycles. Overall around 2% speed-up at no compression performance change. Change-Id: Iad4d526346e05c7be896466c05500711bb763660
2013-06-28fixed a bug where sse is not populatedYaowu Xu
Change-Id: I692d800af1f976c84a76f8bd66864c4b39540abc
2013-06-28Merge "Fix switch statement in 8x8 transform"Jingning Han
2013-06-28Cosmetic reordering of FRAME_CONTEXT members.Dmitry Kovalev
Change-Id: Id641e5188adf55e53e606e5813ae45feaf7abbd2
2013-06-28Merge "Removing CONFIG_DEBUG checks on assertions."Dmitry Kovalev
2013-06-28Fix switch statement in 8x8 transformJingning Han
Change-Id: I7c46354c4983feb5f6202c3ab4a1d9534da7e30f
2013-06-28Merge "Some minor optimizations for cost_coeffs()."Ronald S. Bultje
2013-06-28Merge "Make coefficient skip condition an explicit RD choice."Ronald S. Bultje
2013-06-28Inline vp9_get_coef_context() (and remove vp9_ prefix).Ronald S. Bultje
Makes cost_coeffs() a lot faster: 4x4: 236 -> 181 cycles 8x8: 888 -> 588 cycles 16x16: 3550 -> 2483 cycles 32x32: 17392 -> 12010 cycles Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup. Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
2013-06-28Merge "Decoder's code cleanup."Dmitry Kovalev
2013-06-28Removing CONFIG_DEBUG checks on assertions.Dmitry Kovalev
Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated ones from vp9_onyx_int.h and vp9_onyxd_int.h. Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6
2013-06-28Minor change to prevent one level of dereference in cost_coeffs().Ronald S. Bultje
4x4: 234 -> 236 cycles 8x8: 878 -> 888 cycles 16x16: 3664 -> 3550 cycles 32x32: 18134 -> 17392 cycles Change-Id: I37a51bfbb0060a3a54f09c6045c14a989811ed78
2013-06-28Some minor optimizations for cost_coeffs().Ronald S. Bultje
Cycle timings for first 3 frames of bus (speed 0) at 1500kbps: 4x4: 298 -> 234 cycles 8x8: 1227 -> 878 cycles 16x16: 23426 -> 18134 cycles 32x32: 4906 -> 3664 cycles Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster. Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95
2013-06-28Make coefficient skip condition an explicit RD choice.Ronald S. Bultje
This commit replaces zrun_zbin_boost, a method of biasing non-zero coefficients following runs of zero-coefficients to be rounded towards zero, with an explicit skip-block choice in the RD loop. The logic is basically that if individual coefficients should be rounded towards zero (from a RD point of view), the trellis/optimize loop should take care of it. If whole blocks should be zero (from a RD point of view), a single RD check is much more efficient than a complete serialization of the quantization loop. Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim. SIMD for quantize will follow in a separate patch. Results for other test sets pending. Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
2013-06-28Merge "Minor cleanups"Yaowu Xu
2013-06-28Merge "Optimize partition search order"Yaowu Xu
2013-06-28Minor cleanupsYaowu Xu
Change-Id: I379617c1c731a686b3f7e032b8805860c1055b12
2013-06-28Optimize partition search orderYaowu Xu
This commit change the partition search order to allow checking of rectangular partition to be done after square partitions. It also added a speed feature to skip rectangular partition check when NONE is better than SPLIT in RD sense. This feature roughly speed up encoder by 1.5X with loss on compression -0.91% on cif set -0.56% on stdhd set Change-Id: I0d2d06993041aa9ea9073fcc39c54f73a127dfa4
2013-06-28Merge "Fix tile independence with both column tiling and static_thresh set."Ronald S. Bultje
2013-06-27Merge "variance_test: add missing ClearSystemState..."James Zern
2013-06-27Fix tile independence with both column tiling and static_thresh set.Ronald S. Bultje
Change-Id: I0b2be0ec2c410a527f88b95a44f24ac967b2dac1
2013-06-27Decoder's code cleanup.Dmitry Kovalev
Using vp9_set_pred_flag function instead of custom code, adding decode_tokens function which is now called from decode_atom, decode_sb_intra, and decode_sb. Change-Id: Ie163a7106c0241099da9c5fe03069bd71f9d9ff8
2013-06-27Add Neon optimized loop filter functions.Frank Galligan
- Added vp9_loop_filter_horizontal_edge_neon and vp9_loop_filter_vertical_edge_neon. - The functions are based off the vp8 loopfilter functions. - Matches x86 md5 checksum. Change-Id: Id1c4dddb03584227e5ecd29f574a6ac27738fdd0
2013-06-27Merge "General cleanup in segmentation-related code."Dmitry Kovalev
2013-06-27Merge "Moving subexp encoding functions in separate vp9_dsubexp.c file."Dmitry Kovalev
2013-06-27Inline quantize so idiv instruction gets removed from inner loop.Ronald S. Bultje
Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from 3min15.0 to 3min10.9, i.e. 2.1% faster overall. Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c
2013-06-27Merge "Auto adapt step size feature."Paul Wilkins
2013-06-27Merge "Start adaptive threshold for each mode at max."Paul Wilkins
2013-06-27Merge "Change meaning of cpi->sf.first_step and rename."Paul Wilkins
2013-06-26Merge "Make intra predictor reference buffer configurable"Jingning Han
2013-06-26Merge "Make update_partition_context faster"Jingning Han
2013-06-26variance_test: add missing ClearSystemState...James Zern
...to recently added SubpelVarianceTest Change-Id: I8775e39fd5dbfba81ad42b79b47bf6dd6ca8cc0e
2013-06-26Merge "Change to use LUT for mode-to-txfm conversion"Yaowu Xu
2013-06-26Make intra predictor reference buffer configurableJingning Han
This commit enables configurable reference buffer pointer for intra predictor. This allows later removal of spatial dependency between blocks inside a 64x64 superblock in the rate-distortion optimization loop. Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
2013-06-26Merge "Remove empty function vp9_build_block_offsets"Jingning Han
2013-06-26Make update_partition_context fasterJingning Han
Use vpx_memset for updating the partition contexts. Thanks to Noah for pointing out the need of refactoring in this part. Change-Id: I67fb78429d632298f1cd8a0be346cc76f79392a6
2013-06-26Remove unused macro RDTRUNC_8x8 from encodemb.c.Ronald S. Bultje
Change-Id: I0c097567adab24215d807963ccb34810a2afe007
2013-06-26Remove empty function vp9_build_block_offsetsJingning Han
This function is empty, hence is removed. Change-Id: Ia9d01710806bffe0398a6dc9405f8a5a81b27d74