summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-08-01Merge "Optimize 32x32 2D inverse DCT for speed-up"Jingning Han
2013-08-01Merge "Adding missing const to vp9_extra_bits array."Dmitry Kovalev
2013-07-31Adding missing const to vp9_extra_bits array.Dmitry Kovalev
Change-Id: Icd128ab58719e0b9066bdfa66a5d0d427a84d6df
2013-07-31Optimize 32x32 2D inverse DCT for speed-upJingning Han
This commit exploits the sparsity of quantized coefficient matrix. It detects each 32x8 array and skip the corresponding inverse transformation if all entries are zero. For ped1080p at 8000 kbps, this on average reduces the runtime of 32x32 inverse 2D-DCT SSE2 function from 6256 cycles -> 5200 cycles. It makes the overall encoding process about 2% faster at speed 0. The speed-up is more pronounceable for the decoding process. Change-Id: If20056c3566bd117642a76f8884c83e8bc8efbcf
2013-07-31Remove unnecessary arguments in rd_pick_ref_frameJingning Han
This commit removes redundant arguments passing in the function of rd_pick_reference_frame. This resolves the clang warnings about potential use of uninitialized values. Change-Id: Ic68f949a9f8fcd0a583786b0c75321104ea44739
2013-07-31vp9_decodemv.c cleanup.Dmitry Kovalev
Inlining VP9_NMV_UPDATE_PROB constant, consistent local variable names. Change-Id: I01692501982568fa535882d6b320e3c692f88abb
2013-07-31Removing get_mi_{row, col} functions.Dmitry Kovalev
Passing mi_row and mi_col parameters to functions explicitly. Removing unused xd argument from scale_mv function. Change-Id: Icb4c495ec72d26fb066c14470d3ae0b741fbf18a
2013-07-31Merge "Removing unused "ishp" arguments."Dmitry Kovalev
2013-07-31Merge "Consistent update for inter_mode probabilities."Dmitry Kovalev
2013-07-31Removing unused "ishp" arguments.Dmitry Kovalev
Using different variable names "allow_hp" and "use_hp" instead of "usehp". Change-Id: I0cd5996ddeb46bd754473b680a993c0aaf8eb879
2013-07-31Merge "Make the use of ref_frame index consistent"Jingning Han
2013-07-30Make the use of ref_frame index consistentJingning Han
Refactor the frame buffer referencing in choose_partition and make it consistent with other places. This means to prevent potential issues when we extend reference frame buffer. Change-Id: I5ff33ed5f671e1f4cc7049622212769a9b4578d9
2013-07-30Consistent update for inter_mode probabilities.Dmitry Kovalev
Using inter-mode counts instead of inter-mode-tree branch counts inside FRAME_COUNTS structure. Change-Id: I60dde13af37d06146d7d15543311c1b5044e9e04
2013-07-30Merge "Cleanup: remove two stray '+', fix typos."Adrian Grange
2013-07-30Merge "Cleanup typos, remove unnecessary lines, replace switch"Adrian Grange
2013-07-30Cleanup: remove two stray '+', fix typos.Adrian Grange
Change-Id: I9c30e3dbedabe4942439a0ee2f691fb9a04cd03b
2013-07-30Cleanup typos, remove unnecessary lines, replace switchAdrian Grange
Removed unnecessary code lines, replaced switch with an if, fixed spelling errors and formatting. Change-Id: Ie48aa4604aa0ed48362ca359d792fb21b2ec1dc6
2013-07-30Merge "removed duplication"Yaowu Xu
2013-07-30removed duplicationYaowu Xu
Change-Id: Ica23b66f6664e5a5b168499584f0afffbc54794f
2013-07-29Remove a redundant branching in tokenize_bJingning Han
The tokenize_b function is only called when output flag is on. Hence removing the conditional branch on it therein. Change-Id: Ib709f47f23f39ca05a695faf86fa3377f11f2dd0
2013-07-29Tune tokenization/detokenization flow for speed-upJingning Han
This commit optimizes the tokenization and detokenization operational flow for speed-up. It makes the coding process about 0.3% faster at speed 0. Change-Id: I28008df7482874e4b5f237f2d418ff82a249dd56
2013-07-29Skip redundant tokenization in rd loopJingning Han
This commit makes the encoder skip the redundant tokenization process in the rate-distortion optimization search loop, while updating the entropy contexts accordingly. It makes the speed 0 encoding process about 0.5% faster at no performance change. Change-Id: I34a4155a0b5332afeb45c93a51c7f35a294d685c
2013-07-29Merge "16x16 inverse 2D-DCT with DC only"Jingning Han
2013-07-29Merge "Remove unnecessary 64 byte alignment"John Koleszar
2013-07-2916x16 inverse 2D-DCT with DC onlyJingning Han
This commit provides special handle on 16x16 inverse 2D-DCT, where only DC coefficient is quantized to be non-zero value. Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
2013-07-29Renaming txfm to tx for consistency in some places.Dmitry Kovalev
Change-Id: I2a6a646570e2af66315e7c658d00d99f80c4b127
2013-07-29Remove unnecessary 64 byte alignmentJohn Koleszar
Fixes a warning on MSVS 2012 where the alignment of vp9_default_iscan_8x8 didn't match between its declaration and definition. Change-Id: I1466a15635f4b22594d705d570b7e399bfb6cf21
2013-07-29Renaming NB_TXFM_MODES constant to TX_MODES.Dmitry Kovalev
Change-Id: I10bf06e3a3d5271221ae6a42a36074d01d493039
2013-07-29Renaming TX_SIZE_MAX_SB to TX_SIZES.Dmitry Kovalev
Change-Id: I6aa4191935aa93461a07c41b59fdae1eb5f5f107
2013-07-29Merge "Shortcut 8x8/16x16 inverse 2D-DCT"Jingning Han
2013-07-26Cleanup: replacing xd->mode_info_context with temp variable.Dmitry Kovalev
Change-Id: I5a3e83102784cabb918a5404405fcab99c5bb9b6
2013-07-26Inverse dimension order in token_cost array.Ronald S. Bultje
This allows us to increment the position at the band-level only as we go from one band to the next; more importantly, that allows us to use an add instead of multiply instruction, and omit the instruction altogether if the band doesn't change from one coef to the next, thus being slightly faster (probably more noticeable on systems where a multiply is expensive, like arm). Change-Id: I4343fe35b9f9a47fa00b217bdcbf5f91ff96c381
2013-07-26Merge "vp9_decodemv.c cleanup."Dmitry Kovalev
2013-07-26Merge "d45 intra prediction SSSE3 optimizations."Ronald S. Bultje
2013-07-26Merge "Save pixels instead of coefficients in intra4x4 RD loop."Ronald S. Bultje
2013-07-26Merge "Add best_rd breakout in intra4x4 RD loop."Ronald S. Bultje
2013-07-26Shortcut 8x8/16x16 inverse 2D-DCTJingning Han
This commit brought back the shortcut implementation of 8x8/16x16 inverse 2D-DCT. When the eob <= 10, it skips the inverse transform operations on row 4:7/4:15 in the first round. For bus_cif at 1000 kbps, this provides about 2% speed-up at speed 0. Change-Id: I453e2d72956467d75be4ad8c04b4482ab889d572
2013-07-26vp9_decodemv.c cleanup.Dmitry Kovalev
Renaming: read_intra_mode_info -> read_intra_frame_mode_info read_inter_mode_info -> read_inter_frame_mode_info read_intra_block_part -> read_intra_block_mode_info read_inter_block_part -> read_inter_block_mode_info read_ref_frame -> read_ref_frames read_reference_frame -> read_is_inter_block Using num_4x4_blocks_{wide, high}_lookup instead of bit shifts. Change-Id: I83c81573b4ef6f53f2f8d24683895014bebfba61
2013-07-26Merge "Special handle on DC only inverse 8x8 2D-DCT"Jingning Han
2013-07-26Merge "Making read_inter_mode_info function more clear."Dmitry Kovalev
2013-07-26Merge "Fix some format error and code error in neon code."hkuang
2013-07-26Special handle on DC only inverse 8x8 2D-DCTJingning Han
This commit enables a special handle for the 8x8 inverse 2D-DCT, where only DC coefficient is quantized to be non-zero. For bus_cif at 2000 kbps, it provides about 1% speed-up at speed 0. Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
2013-07-26Fix some format error and code error in neon code.hkuang
Change-Id: I748dee8938dfb19f417f24eed005f3d216f83a82
2013-07-26Merge "General cleanups."Dmitry Kovalev
2013-07-26d45 intra prediction SSSE3 optimizations.Ronald S. Bultje
Change-Id: Ie48035ff4f93c41f8a9b3023e6444fd10432d8fb
2013-07-26Merge "Auto min and max partition size experiment."Yaowu Xu
2013-07-26Auto min and max partition size experiment.Paul Wilkins
Speed feature experiment to set an upper and lower partition size limit based on what has been seen in spatial neighbors. This seems to gives quite reasonable speed gains in local (10-15%) and when used with speed 0 the losses are small (0.25% derf, 0.35% stdhd). However, for now I am only enabling it on speed 1 as there may be clashes with the existing temporal partition selection in speed 2. Using a tighter min / max around the range derived from the neighbors increases speed further but at the cost of a bigger quality loss. However, I think this spatial method could be combined with data from either the last frame or a variance method (or both) to refine the range of minimum and maximum partition size. I.e. consider the min and max from spatial and temporal neighbors and the variance recommendation. Change-Id: I1b96bf8b84368d6aad0c7aa600fe141b4f07435f
2013-07-25Modify static threshold calculationYunqing Wang
Used 3 * standard_deviation in internal threshold calculation instead of fit curve. This actually approached the algorithm better. For comparison, similar tests were done: The overall psnr loss is less than before. 1. derf set: when static-thresh = 1, psnr loss is 0.329%; when static-thresh = 500, psnr loss is 0.970%; 2. stdhd set: when static-thresh = 1, psnr loss is 0.922%; when static-thresh = 500, psnr loss is 1.307%; Similar speedup is achieved. For example, clip bitrate static-thresh psnr time akiyo(cif) 500 0 48.952 5.077s(50f) akiyo 500 500 48.866 4.169s(50f) parkjoy(1080p) 4000 0 30.388 78.20s(30f) parkjoy 4000 500 30.367 70.85s(30f) sunflower(1080p) 4000 0 44.402 74.55s(30f) sunflower 4000 500 44.414 68.69s(30f) Change-Id: Ic78833642ce1911dbbd1cb6c899a2d7e2dfcc1f3
2013-07-25Making read_inter_mode_info function more clear.Dmitry Kovalev
Now read_inter_mode_info calls read_intra_block_part (renamed from read_intra_block_modes) or read_inter_block_part (just added). Change-Id: I541badea6b663e0ae692ec158665efb90ed20c03
2013-07-25Merge "Add const to vp9_accum_mv_refs parameter"Johann