Age | Commit message (Collapse) | Author |
|
Change-Id: I9fcb9f4cc5c565794229593fadde87286fcf0ffd
|
|
BUG=webm:1689
Change-Id: Id920816315c6586cd652ba6cd1b3a76dfc1f12b7
|
|
Under CONFIG_REALTIME_ONLY flag, map speed < 5 to speed 5.
Bug: webm:1684
This reverts commit 85cb983682fe9ca14fd302b50d27d762da05d665.
Change-Id: I67b7ed37e8b74417db310ea0c817d3c5a5de9e44
|
|
This reverts commit da24d35132e80422dc2c33e7c92462f4db7cd83d.
BUG=webm:1684
Change-Id: I552c37c7bdc844610879a65cc02038d76a5d32b1
|
|
Change-Id: I6dff1bda4bea760a32c2f8e38773e5913c830204
|
|
Change-Id: Id5c8b2d69a36d218ec04cd504868ce0efebf6b69
|
|
this prevents redefinition warnings if a toolchain sets one
BUG=b/117240165
Change-Id: Ib5d8c303cd05b4dbcc8d42c71ecfcba8f6d7b90c
|
|
Move vp9_nb_mvs_inconsistency to vp9_non_greedy_mv.c
This is to facilitate following SIMD optimizations.
Change-Id: I8eb8f820368928e0c4fb287e557cddf0bd2c763e
|
|
This reduces vp9 only binary size by ~5.7%.
Change-Id: I57e46baf591d68b0a0cecbc9319a1190df8b0457
|
|
Change-Id: If146bbf24f446f71be9147402e6d30533eee99d1
|
|
Change-Id: I3ca7442b10cbea4dd5dbabe147687d1cb3cce4d8
|
|
The SSE4_1 version of temporal filter does not distinguish between bd 10
and bd 12.
Speed up:
Function Level:
| !SS_X | SS_X
!SS_Y | 6.44X | 6.37X
SS_Y | 6.56X | 6.63X
Video Level:
2.5% speed up on basketballpass_240p over 150 frames on speed 1,
bitdepth 10, auto-alt-ref=1
BUG=webm:1591
Change-Id: I49aa2ed4acfe80a8d627038322de66cbe691296e
|
|
|
|
This adds a preliminary version of vp9_apply_temporal_filter in SSE4.1.
This patch merely adds the function and does not enable it yet.
Speed Up:
| ss_x=1 | ss_x=0 |
ss_y=1 | 19.80X | 19.04X |
ss_y=0 | 21.09X | 20.21X |
BUG=webm:1591
Change-Id: If590f1ccf1d0c6c3b47410541d54f2ce37d8305b
|
|
Exclude low bit depth optimizations from high bit depth builds.
BUG=webm:1584
Change-Id: I86a7ebafa557d262257358e1e055a06d52659977
|
|
BUG=webm:1584
Change-Id: I719c64734f4eae07def2d700006834a2420891a7
|
|
Clean up vp9_encodeframe.c.
Change-Id: I4035fee94da746c74d72f71ca8334f91c5d10116
|
|
Low bit depth version only. Passes the VP9QuantizeTest test suite.
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
4x4 C time = 86.3 ms (±0.7 ms), VSX time = 18.2 ms (±0.0 ms) [ 4.7x]
8x8 C time = 57.7 ms (±0.3 ms), VSX time = 7.6 ms (±0.0 ms) [ 7.6x]
16x16 C time = 50.7 ms (±0.1 ms), VSX time = 4.9 ms (±0.0 ms) [10.3x]
Change-Id: Ic09bc786c57cc89bba14624064216b52996075eb
|
|
Started from vp9_quantize_fp_sse2 and tweaked to use avx2.
Change-Id: Ic2da50cc9d73896c7ef2f3cd3db5b1c5d7795b8b
|
|
BUG=webm:1419
Change-Id: I99c954ffa50a62ccff2c4ab54162916141826d9b
|
|
when CONFIG_REALTIME_ONLY is enabled.
BUG=webm:1446
Change-Id: Id547783ec75383966c40ab5cf6abb4a0f7984f52
|
|
when CONFIG_REALTIME_ONLY is enabled.
BUG=webm:1446
Change-Id: I4b1b8e9a456830ba1b1bd3a8882e038d37ee7903
|
|
BUG=webm:1446
Change-Id: I6e0ea9342c715d354c641109737172afa649b85b
|
|
|
|
Denoiser on Neon is 5x faster than C code.
BUG=webm:1420
Change-Id: I805ab64f809ff2137354116be6213e7ec29c1dcb
|
|
There is only one avx2 implementation. Drop '_intrin'
Change-Id: I887a0d27d58567eaad49f749f127eca61313f312
|
|
Approximates division using multiply and shift.
Speeds up both sizes (8x8 and 16x16) by 30 times.
Fix the call sites to use the RTCD function.
Delete sse2 and mips implementation. They were based on a previous
implementation of the filter. It was changed in Dec 2015:
ece4fd5d2247c9512b31a93dd593de567beaf928
BUG=webm:1378
Change-Id: I0818e767a802966520b5c6e7999584ad13159276
|
|
this is similar to the x86 configuration and helps mitigate an issue
with a circular dependency between this function and the ssse3 variant
causing an outsized increase in binary size (~300K for chrome)
chrome.dll:
.text 255B000 -> 252B000
.data 7B000 -> 75000
-221184 bytes
BUG=chromium:697956
Change-Id: Ic95b142ecd62dd4f1795788aa27dd8fab59b708c
|
|
|
|
vp9_highbd_block_error_8bit_c was a very simple wrapper around
vp9_block_error_c. The SSE2 implemention was practically identical to
the non-HBD one. It was missing some minor improvements which only
went into the original version.
In quick speed tests, the AVX implementation showed minimal
improvement over SSE2 when it does not detect overflow. However, when
overflow is detected the function is run a second time. The
OperationCheck test seems to trigger this case and reverses any
speed benefits by running ~60% slower. AVX2 on the other hand is
always 30-40% faster.
Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
|
|
Only works for bitdepth = 8 when compiled with high bitdepth flag.
4x speed ups for handling 1:2 down/upsampling.
Validated manually for:
1) Dynamic resize for a single layer encoding
2) SVC encoding with 3 spatial layers
Results are bitexact with the patch and the speed gain (~4x) in the
scaling was verified.
BUG=webm:1371
Change-Id: I1bdb5f4d4bd0df67763fc271b6aa355e60f34712
|
|
Change-Id: Ia8f822bd6e70b3911bc433a5a750bfb6f9a3a75c
|
|
BUG=webm:1365
Change-Id: Id2ed3ebaaaa6a4b68628c23e08b64ea5f1341761
|
|
(yunqingwang)
1. Rebased the patch. Incorporated recent first pass changes.
2. Turned on the first pass unit test.
Change-Id: Ia2f7ba8152d0b6dd6bf8efb9dfaf505ba7d8edee
|
|
+ add a temporary dummy element to ALT_REF_AQ to avoid a warning about
an empty struct
Change-Id: Ib6e5c39ff62ad96eb4e3686d4882228a42b3843f
|
|
Current commit is just an API template for the rest of the code, and
I will add inner logic later.
Altref frames generate a lot of bitrate and at the same time
other frames refer to them a lot, so it makes sense to apply
special compensation-based adaptive quantization scheme for altref
frames. E.g., for blocks that are good predictors for the future
apply rate-control chosen quantizer while for bad predictors apply
worse one.
Change-Id: Iba3f8ec349470673b7249f6a125f6859336a47c8
|
|
BUG=b:29583530
Change-Id: I952da3fc0d4716dec897be0d2e9806af6612722b
|
|
This reverts commit be12fefa4b7d224e9f39275a6bb4fab01b8bae3b
and commit 057c1c4034ba5b9bf360c5c1f600ebc6d0718c3a.
Also, the mismatch between the avx version and the
c version has been fixed.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168
For a rt encode using 1080p@60fps material, up to 11% performance
improvement overall was seen.
Change-Id: Icd1f216209ebc6fc0b8da885f32f356fa4355ed0
|
|
|
|
Function level timing test shows about 27% time saving on
a Xeon E5-2680 v2 desktop.
Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
duplicate basenames.
Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
are identical. TODO: They should be unified later if there is
no intention to keep a duplicate.
Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
|
|
vp9_diamond_search_sad_avx was disabled in:
057c1c4 disable vp9_diamond_search_sad_avx
this removes a missing prototype warning as the prototype is no longer
included in vp9_rtcd.h. the file can be restored if someone gets around
to fixing the issue.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168
Change-Id: Ia9fda4b81c53dc5fba7c31d780d761f886940b52
|
|
See https://bugs.chromium.org/p/webm/issues/detail?id=1145
Change-Id: I778ee07dc39a524e3f729bef47a7abeed51e0cee
|
|
Change-Id: I9cc9780f506e025aea57485a9e21f0835faf173c
|
|
Different quality levels are used for different regions in
the frame depending on how far they are vertically from the
center. Specifically, three segments are used based on the
mi_row index with respect number to the number of mi_rows in
the frame.
Change-Id: Ifc8b777bc58ea8521dffc4640360c67d99f8d381
|
|
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
|
|
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
|
|
This reverts commit f1342a7b070ef61b9fbdf03e899ac2107cfcb6bd.
This breaks 32-bit builds:
runtime error: load of misaligned address 0xf72fdd48 for type 'const
__m128i' (vector of 2 'long long' values), which requires 16 byte
alignment
+ _mm_set1_epi64x is incompatible with some versions of visual studio
Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
|
|
|
|
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
|
|
Source noise level estimate is also useful for
setting variance encoder parameters (variance thresholds,
qp-delta, mode selection, etc), so allow it to be used also
if denoising is not on.
Change-Id: I4fe23d47607b4e17a35287057f489c29114beed1
|