Age | Commit message (Collapse) | Author |
|
|
|
Roughly 2x speedup. Since the only change for HBD is to store(), the
improvement appears to hold there as well.
BUG=webm:1424
Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19
|
|
|
|
* changes:
buffer.h: add num_elements_
buffer.h: zero-init all values
buffer.h: use size_t
|
|
Enable simple_block_yrd for temporal enhancement layers (TL > 0).
And remove block size condiiton for SVC mode.
Only affects speed >= 7 SVC.
Speedup ~3-4%.
avgPSNR regression on RTC for (3 spatial, 3 temporal) layers: ~1%.
Change-Id: Iff4fc191623b71c69cd373e7c0823385e7ac67ed
|
|
raw_size_ was being incorrectly computed and used
Change-Id: Iade45f69964c567ffb258880f26006a96ae5a30d
|
|
Change-Id: I18d90658bcd4365d49adcadd6954090b3b399aa8
|
|
Change-Id: Ieca3f1ef23cd1d7b844ea3ecb054007ed280b04f
|
|
Change-Id: I4b51043cb3f5955efe947fe4685aed4a21adb8bd
|
|
|
|
|
|
Keep the 1/4subpel for all frames, use SUBPEL_TREE_PRUNED_EVENMORE
for all temporal enhancement layer frames.
Change-Id: Ibc681acbb6fc75b7b3c57fc483fcb11d591dfc9a
|
|
Change-Id: Idfbd2e01714ca9d00525c5aeba78678b43fb0287
|
|
Change-Id: I2da4110e843b6e361028b921c24b6ca2ea9077d9
|
|
It is initialized to be { INT_MAX, 0, ... } in ffe0f9b.
No effect on encoders.
Make it consistent with other initializations.
BUG=webm:1440
Change-Id: Ie2a180d93626b55914c8c4255e466a1986d2b922
|
|
visual studio will warn if a 32-bit shift is implicitly converted to 64.
in this case integer storage is enough for the result.
since:
f3a9ae5ba Fix ubsan failure in vp9_mcomp.c.
Change-Id: I7e0e199ef8d3c64e07b780c8905da8c53c1d09fc
|
|
|
|
|
|
BUG=webm:1440
Change-Id: I7074e42bdfa8dd25f11bbb3f2ab1b41d6f4c12e4
|
|
Change-Id: Iff1dea1fe9d4ea1d3fc95ea736ddf12f30e6f48d
|
|
For SVC 1 pass non-rd mode:
Force subpel seach off for SVC for non-reference frames
under motion threshold.
Add flag to svc context to indicate if the frame is not used
as a reference.
Little/no quaity loss, ~2% speedup.
Change-Id: Ic433c44b514d19d08b28f80ff05231dc943b28e9
|
|
|
|
Speed >=8: for resolutions above CIF, and for low motion content,
set subpel_search_method to SUBPEL_TREE_PRUNED_EVENMORE.
Small speed gain (~2%) on vga clips,
RTC metrics up by ~2-3% on average.
Change-Id: Ie26ba0264589652f92dfe74308740debf94cf0cc
|
|
Change-Id: I510b755550ebbfa2aaf9b974920d7f1c6454a845
|
|
For both vp8 and vp9.
BUG=webm:1437
Change-Id: Ifd06f68a876ade91cc2cc27c574c4641b77cce28
|
|
Use only the average of center 2x2 pixels in vp8.
Change-Id: I2b23ff19a90827226273e0fca49e90c734eda59b
|
|
|
|
|
|
|
|
BUG=webm:1423
Change-Id: I33de537f238f58f89b7a6c1c2d6e8110de4b8804
|
|
Change-Id: Ica1b4e918aa759cd0ce65920f9d88452bbf9e3b4
|
|
BUG=webm:1412
Change-Id: I26e4b34ae9bc1ae80c24f56d740d737a95f1ab84
|
|
|
|
|
|
x86 requires 16 byte alignment for some vector loads/stores.
arm does not have the same requirement.
The asserts are still in avg_pred_sse2.c. This just removes them from
the common code.
Change-Id: Ic5175c607a94d2abf0b80d431c4e30c8a6f731b6
|
|
|
|
Unlike x86 neon only requires type alignment when loading into vectors.
Change-Id: I7bbbe4d51f78776e499ce137578d8c0effdbc02f
|
|
|
|
|
|
Split vp8/vp9 implementations on yv12_copy_frame_c.
Remove high-bitdepth codes from vp8_yv12_extend_frame_borders_c.
Clean up vp8 codes usage in vp9.
BUG=webm:1435
Change-Id: Ic68e79e9d71e1b20ddfc451fb8dcf2447861236d
|
|
Fix the condition on usage of source_sad for temporal layers.
FIx allows it to be used for the case of 1 temporal layer.
Change-Id: I02b1b0ade67a7889d1b93cee66d27c0951131fc3
|
|
|
|
|
|
For 1 pass CBR SVC mode.
Change-Id: Ic026740f9d0ec5eee7c5845be9c5b15884fec48d
|
|
Change-Id: If760f28cbbf22beac1cc9bd1546f13831e9dd3f0
|
|
Adjust the max_copied_frame setting for temporal layers.
Keep the same setting for non-SVC at speed 8.
This change also enables copy_partiton for non-SVC at speed 7,
but with smaller value of max_copied_frame (=2).
~2% speedup for SVC speed 7, 3 layers, with little/no quality loss.
Change-Id: Ic65ac9aad764ec65a35770d263424b2393ec6780
|
|
Unlike x86, arm does not impose additional alignment restrictions on
vector loads. For incoming values to the first pass, it uses vld1_u32()
which typically does impose a 4 byte alignment. However, as the first
pass operates on user-supplied values we must prepare for unaligned
values anyway (and have, see mem_neon.h).
But for the local temporary values there is no stride and the load will
use vld1_u8 which does not require 4 byte alignment.
There are 3 temporary structures. In the C, one is uint16_t. The arm
saturates between passes but still passes tests. If this becomes an
issue new functions will be needed.
Change-Id: I3c9d4701bfeb14b77c783d0164608e621bfecfb1
|
|
Change-Id: Idb6248c1429b55176bb3e9f4e8365ea0ed2be62a
|
|
* changes:
sub pel avg variance neon: 4x block sizes
sub pel variance neon: 4x block sizes
|
|
* changes:
sub pel avg variance neon: add neon optimizations
sub pel variance neon: normalize variable names
|