Age | Commit message (Collapse) | Author |
|
Change-Id: I0cd91187e1efc1441086772e5683fbf72d9371cf
|
|
|
|
|
|
BUG=webm:1534
Change-Id: I535ac48e3dd2454cc7088c4f9a1e08ea74107da6
|
|
|
|
* changes:
Prepare motion estimation process for temporal dependency model
Construct temporal dependency building system
|
|
Moves the check into a function, check_gcc_avx512_compiles,
that behaves somewhat similarly to check_gcc_machine_options.
Change-Id: I2bef3ddd98e636eef12d9d5e548c43282fac7826
|
|
Set up needed stack for the motion estimation process to build up
the temporal dependency model.
Change-Id: I3436302c916a686e8c82572ffc106bf8023404b6
|
|
Schedule the frame processing to construct temporal dependency
statistics within a group of pictures. Align the corresponding
reference frames.
Change-Id: I8969f5c335a4a5c2614f4530b636fe13a25a8a98
|
|
|
|
|
|
|
|
Change-Id: Ic8e07b07790e067c014677cf33c3b016fcf4cb39
|
|
This CL separates the defining of the GF group structure from the
handling of its bitrate allocation. The encoder performance should stay
unchanged.
Change-Id: Ib77967757702bb4b284034e429d4c41ae86d0838
|
|
Allocate buffers to support gather temporal dependency statistics
at the encoder.
Change-Id: I97d4594913a2423e8a916f20caf82ab0f5836961
|
|
The model construction would incur 15% slowdown for speed 2. The
speed change on speed 0 is unnoticeable.
The current speed features set up would DISABLE temporal dependency
model for all speed settings.
Change-Id: Ic45dd962f3a54a8f5f0452502dc05e352dc09ca1
|
|
Add block and frame level data structures to support frame
dependent mode decision.
Change-Id: I996fc84155fcba8e2ec2a114bb0799d6aa5539dd
|
|
|
|
* changes:
[VSX] Optimize PROCESS16 macro
VSX Version of SAD8xN
|
|
|
|
The PROCESS16 macro now uses 8-bit lanes instead of 16-bit lanes.
SADTest Speed Test (POWER8 Model 2.1)
16x8 Old VSX time = 16.7 ms, new VSX time = 9.1 ms [1.8x]
16x16 Old VSX time = 15.7 ms, new VSX time = 7.9 ms [2.0x]
16x32 Old VSX time = 14.4 ms, new VSX time = 7.2 ms [2.0x]
32x16 Old VSX time = 14.0 ms, new VSX time = 7.4 ms [1.9x]
32x32 Old VSX time = 13.4 ms, new VSX time = 6.5 ms [2.0x]
32x64 Old VSX time = 12.7 ms, new VSX time = 6.3 ms [2.0x]
64x32 Old VSX time = 12.6 ms, new VSX time = 6.3 ms [2.0x]
64x64 Old VSX time = 12.7 ms, new VSX time = 6.2 ms [2.0x]
Change-Id: I51776f0e428162e78edde8eac47f30ffd2379873
|
|
Following are completed in defining GF group structure in firstpass:
1. Remove redundant alt_frame_index;
2. Remove hard coded index value with the variable of frame_index.
Change-Id: I7b56e454559bbf704afc7410ea9832b20ffcd57e
|
|
Cast the counter to uint64_t in case it overflows.
The assert was to prevent c[0] * Pfac being overflow beyong unsigned int
since Pfac could be 2^8. Thus c[0] needs to be smaller than 2^24.
In VP9, the assert was removed and c[0] was casted to uint64_t.
Bug: 805277
Change-Id: Ic46a3c5b4af2f267de4e32c1518b64e8d6e9d856
|
|
VSX versions of the SAD functions of width 8.
SADTest Speed Test (POWER8 Model 2.1)
8x4 C time = 68.7 ms (±0.3 ms), VSX time = 31.8 ms (±0.1 ms) [2.2x]
8x8 C time = 55.6 ms (±0.3 ms), VSX time = 18.3 ms (±0.1 ms) [3.0x]
8x16 C time = 46.5 ms (±0.1 ms), VSX time = 15.6 ms (±0.1 ms) [3.0x]
Change-Id: I843f3b34e103b72deeade4a939193d8b53cee460
|
|
Speed tests are added for the SADTest test suite. These test use the
AbstractBench and print the median run time of SAD operations. Speed
tests are disabled by default.
Change-Id: I5d0957248f9b5b307ae2d757d5f8d4761a1dd712
|
|
|
|
|
|
Check GCC specific AVX512 flags only when GCC is enabled.
Change-Id: I15dc2a0dbf8bce37f4364fedfd34a0a34882104b
|
|
Change-Id: I370f37c85a02c032a8ba266b9b9445ee38eb0756
|
|
|
|
When golden was the inter-layer reference, a block that selected the golden ref
would not be denoised.
But when golden is used as a second temporal reference then we should denoise
blocks that select the golden reference.
This changes allows for that.
Change-Id: Ifdea2ac88f6a74f73520fedcd7fec2f32c559ec9
|
|
Low bit depth version only. Passes the VP9QuantizeTest test suite.
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
32x32 C time = 93.1 ms (±0.4 ms), VSX time = 6.5 ms (±0.2 ms) [14.4x]
Change-Id: I7f1fd0fc987af86baf2b74147a25aee811289112
|
|
Low bit depth version only. Passes the VP9QuantizeTest test suite.
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
4x4 C time = 86.3 ms (±0.7 ms), VSX time = 18.2 ms (±0.0 ms) [ 4.7x]
8x8 C time = 57.7 ms (±0.3 ms), VSX time = 7.6 ms (±0.0 ms) [ 7.6x]
16x16 C time = 50.7 ms (±0.1 ms), VSX time = 4.9 ms (±0.0 ms) [10.3x]
Change-Id: Ic09bc786c57cc89bba14624064216b52996075eb
|
|
|
|
|
|
|
|
When the second (gf) temporal reference is used in SVC:
the reference is refreshed on base TL superframes, and so
the rc->frames_since_golden counter was also only updated on
base TL frames. But this was disabling the golden reference
from being used as a temporal reference for TL > 0 frames
(since frames_since_golden was 0/not updated on TL > 0 frames).
Fix is to copy the update of rc->frames_since_golden to all
upper temporal layers. This allows TL > 0 frames to test the
golden inter mode.
Gain on RTC set: ~2%, ~8% on desktop_vga clip.
Encode time increase ~5-8% on linux, 3SL-3TL run with 1 thread.
For now keep this off for TL > 0 frames in speed features, so
this change does not change current behavior for speed >= 7.
Change-Id: I405708f3f80039ae47bd64ec53e66f92160acd9e
|
|
Change-Id: I3c9aefd3ea5028797b9105d7e49b1cb2f762a9fc
|
|
Terminate early and skip neural net model when linear score is already
high enough, which indicates that we should not skip split and
rectangular partitions.
No changes on compression; encoding speed improves slightly.
Change-Id: I4e0995090200eb4889344da905d2f7048673af5f
|
|
|
|
|
|
|
|
+ remove obsolete FIXME
Change-Id: I97ceb94b0e7860167e9c8cc6900bec8d155f0e8f
|
|
* changes:
Implement subtract_block for VSX
Cast bsize as int to print a meaninful debug info
Speed test for subtract_block
|
|
For the feature of using second temporal reference (when
inter-layer is off): move the buffer_idx assignement and
refresh flag settings further down to vp9_rc_get_svc_params(),
since is_key_frame is set there for every frame/layer.
Otherwise it was using the setting from the previous frame/layer.
This makes the refresh more consistent for both layers for
2 spatial layers case.
Small/negligible change in metrics.
Change-Id: I88279243bc27898448e8891dba38143d936cf6d5
|
|
~2x speedup or better.
[ RUN ] C/VP9SubtractBlockTest.Speed/0
[ BENCH ] 4x4 365.1 ms ( ±2.2 ms )
[ BENCH ] 8x4 258.5 ms ( ±0.3 ms )
[ BENCH ] 4x8 202.7 ms ( ±0.2 ms )
[ BENCH ] 8x8 162.2 ms ( ±0.5 ms )
[ BENCH ] 16x8 138.8 ms ( ±0.3 ms )
[ BENCH ] 8x16 121.5 ms ( ±0.4 ms )
[ BENCH ] 16x16 110.2 ms ( ±0.5 ms )
[ BENCH ] 32x16 104.8 ms ( ±0.1 ms )
[ BENCH ] 16x32 32.7 ms ( ±0.1 ms )
[ BENCH ] 32x32 30.0 ms ( ±0.0 ms )
[ BENCH ] 64x32 28.7 ms ( ±0.0 ms )
[ BENCH ] 32x64 20.1 ms ( ±0.0 ms )
[ BENCH ] 64x64 19.3 ms ( ±0.0 ms )
[ RUN ] VSX/VP9SubtractBlockTest.Speed/0
[ BENCH ] 4x4 155.3 ms ( ±0.9 ms )
[ BENCH ] 8x4 99.3 ms ( ±0.4 ms )
[ BENCH ] 4x8 77.2 ms ( ±0.1 ms )
[ BENCH ] 8x8 45.7 ms ( ±0.0 ms )
[ BENCH ] 16x8 34.1 ms ( ±0.0 ms )
[ BENCH ] 8x16 29.5 ms ( ±0.0 ms )
[ BENCH ] 16x16 19.9 ms ( ±0.0 ms )
[ BENCH ] 32x16 15.1 ms ( ±0.0 ms )
[ BENCH ] 16x32 16.7 ms ( ±0.0 ms )
[ BENCH ] 32x32 14.1 ms ( ±0.0 ms )
[ BENCH ] 64x32 12.6 ms ( ±0.0 ms )
[ BENCH ] 32x64 12.0 ms ( ±0.0 ms )
[ BENCH ] 64x64 11.2 ms ( ±0.0 ms )
Change-Id: I89ce12b6475871dc9e8fde84d0b6fe5c420c28c7
|
|
cout helpfully decides to print the bsize value as non-printable char
otherwise.
Change-Id: Id91b52d6475ae9f869365468d1d56d94b2e10ecb
|
|
Change-Id: Icd7d4453f0ee699635a2a1d484d24cba71d748de
|
|
Some compiler releases allow the -mavx512f arg without actually
implementing support. Test for this situation, and disable avx512
when it is detected by configure.
BUG=webm:1536
Change-Id: I63952153bb4b24aa9f25267ed47a0fe845d61f8b
|
|
Bump up ABI version.
Change-Id: I4498d7ea4ed72994c5f847aa98e75b0150dd7f82
|