Age | Commit message (Collapse) | Author |
|
|
|
|
|
Change-Id: Icd128ab58719e0b9066bdfa66a5d0d427a84d6df
|
|
This commit exploits the sparsity of quantized coefficient matrix.
It detects each 32x8 array and skip the corresponding inverse
transformation if all entries are zero.
For ped1080p at 8000 kbps, this on average reduces the runtime of
32x32 inverse 2D-DCT SSE2 function from 6256 cycles -> 5200
cycles. It makes the overall encoding process about 2% faster at
speed 0. The speed-up is more pronounceable for the decoding process.
Change-Id: If20056c3566bd117642a76f8884c83e8bc8efbcf
|
|
This commit removes redundant arguments passing in the function of
rd_pick_reference_frame. This resolves the clang warnings about
potential use of uninitialized values.
Change-Id: Ic68f949a9f8fcd0a583786b0c75321104ea44739
|
|
Inlining VP9_NMV_UPDATE_PROB constant, consistent local variable names.
Change-Id: I01692501982568fa535882d6b320e3c692f88abb
|
|
Passing mi_row and mi_col parameters to functions explicitly. Removing
unused xd argument from scale_mv function.
Change-Id: Icb4c495ec72d26fb066c14470d3ae0b741fbf18a
|
|
|
|
|
|
Using different variable names "allow_hp" and "use_hp" instead of "usehp".
Change-Id: I0cd5996ddeb46bd754473b680a993c0aaf8eb879
|
|
|
|
Refactor the frame buffer referencing in choose_partition and make
it consistent with other places. This means to prevent potential
issues when we extend reference frame buffer.
Change-Id: I5ff33ed5f671e1f4cc7049622212769a9b4578d9
|
|
Using inter-mode counts instead of inter-mode-tree branch counts inside
FRAME_COUNTS structure.
Change-Id: I60dde13af37d06146d7d15543311c1b5044e9e04
|
|
|
|
|
|
Change-Id: I9c30e3dbedabe4942439a0ee2f691fb9a04cd03b
|
|
Removed unnecessary code lines, replaced switch with an if,
fixed spelling errors and formatting.
Change-Id: Ie48aa4604aa0ed48362ca359d792fb21b2ec1dc6
|
|
|
|
Change-Id: Ica23b66f6664e5a5b168499584f0afffbc54794f
|
|
The tokenize_b function is only called when output flag is on. Hence
removing the conditional branch on it therein.
Change-Id: Ib709f47f23f39ca05a695faf86fa3377f11f2dd0
|
|
This commit optimizes the tokenization and detokenization operational
flow for speed-up. It makes the coding process about 0.3% faster at
speed 0.
Change-Id: I28008df7482874e4b5f237f2d418ff82a249dd56
|
|
This commit makes the encoder skip the redundant tokenization process
in the rate-distortion optimization search loop, while updating the
entropy contexts accordingly. It makes the speed 0 encoding process
about 0.5% faster at no performance change.
Change-Id: I34a4155a0b5332afeb45c93a51c7f35a294d685c
|
|
|
|
|
|
This commit provides special handle on 16x16 inverse 2D-DCT, where
only DC coefficient is quantized to be non-zero value.
Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
|
|
Change-Id: I2a6a646570e2af66315e7c658d00d99f80c4b127
|
|
Fixes a warning on MSVS 2012 where the alignment of vp9_default_iscan_8x8
didn't match between its declaration and definition.
Change-Id: I1466a15635f4b22594d705d570b7e399bfb6cf21
|
|
Change-Id: I10bf06e3a3d5271221ae6a42a36074d01d493039
|
|
Change-Id: I6aa4191935aa93461a07c41b59fdae1eb5f5f107
|
|
|
|
Change-Id: I5a3e83102784cabb918a5404405fcab99c5bb9b6
|
|
This allows us to increment the position at the band-level only as
we go from one band to the next; more importantly, that allows us to
use an add instead of multiply instruction, and omit the instruction
altogether if the band doesn't change from one coef to the next, thus
being slightly faster (probably more noticeable on systems where a
multiply is expensive, like arm).
Change-Id: I4343fe35b9f9a47fa00b217bdcbf5f91ff96c381
|
|
|
|
|
|
|
|
|
|
This commit brought back the shortcut implementation of 8x8/16x16
inverse 2D-DCT. When the eob <= 10, it skips the inverse transform
operations on row 4:7/4:15 in the first round. For bus_cif at 1000
kbps, this provides about 2% speed-up at speed 0.
Change-Id: I453e2d72956467d75be4ad8c04b4482ab889d572
|
|
Renaming:
read_intra_mode_info -> read_intra_frame_mode_info
read_inter_mode_info -> read_inter_frame_mode_info
read_intra_block_part -> read_intra_block_mode_info
read_inter_block_part -> read_inter_block_mode_info
read_ref_frame -> read_ref_frames
read_reference_frame -> read_is_inter_block
Using num_4x4_blocks_{wide, high}_lookup instead of bit shifts.
Change-Id: I83c81573b4ef6f53f2f8d24683895014bebfba61
|
|
|
|
|
|
|
|
This commit enables a special handle for the 8x8 inverse 2D-DCT,
where only DC coefficient is quantized to be non-zero. For bus_cif
at 2000 kbps, it provides about 1% speed-up at speed 0.
Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
|
|
Change-Id: I748dee8938dfb19f417f24eed005f3d216f83a82
|
|
|
|
Change-Id: Ie48035ff4f93c41f8a9b3023e6444fd10432d8fb
|
|
|
|
Speed feature experiment to set an upper and lower
partition size limit based on what has been seen
in spatial neighbors.
This seems to gives quite reasonable speed gains in local
(10-15%) and when used with speed 0 the losses are small
(0.25% derf, 0.35% stdhd). However, for now I am only
enabling it on speed 1 as there may be clashes with the existing
temporal partition selection in speed 2.
Using a tighter min / max around the range derived from the
neighbors increases speed further but at the cost of a
bigger quality loss. However, I think this spatial method could
be combined with data from either the last frame or a variance
method (or both) to refine the range of minimum and maximum
partition size. I.e. consider the min and max from spatial and
temporal neighbors and the variance recommendation.
Change-Id: I1b96bf8b84368d6aad0c7aa600fe141b4f07435f
|
|
Used 3 * standard_deviation in internal threshold calculation
instead of fit curve. This actually approached the algorithm
better.
For comparison, similar tests were done:
The overall psnr loss is less than before.
1. derf set:
when static-thresh = 1, psnr loss is 0.329%;
when static-thresh = 500, psnr loss is 0.970%;
2. stdhd set:
when static-thresh = 1, psnr loss is 0.922%;
when static-thresh = 500, psnr loss is 1.307%;
Similar speedup is achieved. For example,
clip bitrate static-thresh psnr time
akiyo(cif) 500 0 48.952 5.077s(50f)
akiyo 500 500 48.866 4.169s(50f)
parkjoy(1080p) 4000 0 30.388 78.20s(30f)
parkjoy 4000 500 30.367 70.85s(30f)
sunflower(1080p) 4000 0 44.402 74.55s(30f)
sunflower 4000 500 44.414 68.69s(30f)
Change-Id: Ic78833642ce1911dbbd1cb6c899a2d7e2dfcc1f3
|
|
Now read_inter_mode_info calls read_intra_block_part (renamed from
read_intra_block_modes) or read_inter_block_part (just added).
Change-Id: I541badea6b663e0ae692ec158665efb90ed20c03
|
|
|