Age | Commit message (Collapse) | Author |
|
the neon code made several assumptions which were broken by a recent
change: https://review.webmproject.org/2676
update the code with new assumptions and guard them with a compile time
assert
Change-Id: I32a8378030759966068f34618d7b4b1b02e101a0
|
|
Change-Id: Ia1ad66066a24c01915cd9e3ff75c7e070cc984c8
|
|
|
|
Change-Id: I7e6bc28e7974a376da747300744e0dd5dc1d21e9
|
|
Change-Id: Ibcf5b4b14153f65ce1b53c3bfba87ad2feb17bbd
|
|
|
|
Make sure to update last_sharpness_level from the current
sharpness_level whenever it changes.
Change-Id: I0258d2f5b11a407abf6176a8d4c4994d925943f0
|
|
Since this is the only ABI incompatible change since the last release,
convert it to use the control interface instead. The member of the
configuration struct is replaced with the VP8E_SET_MAX_INTRA_BITRATE_PCT
control.
More significant API changes were expected to be forthcoming when this
control was first introduced, and while they continue to be expected,
it's not worth breaking compatibility for only this change.
Change-Id: I799d8dbe24c8bc9c241e0b7743b2b64f81327d59
|
|
This change implemented same idea in change "Preload reference area
to an intermediate buffer in sub-pixel motion search." The changes
were made to vp8_find_best_sub_pixel_step() and vp8_find_best_half
_pixel_step() functions which are called when speed >= 5. Test
result (using tulip clip):
1. On Core2 Quad machine(Linux)
rt mode, speed (-5 ~ -8), encoding speed gain: 2% ~ 3%
rt mode, speed (-9 ~ -11), encoding speed gain: 1% ~ 2%
rt mode, speed (-12 ~ -14), no noticeable encoding speed gain
2. On Xeon machine(Linux)
Test on speed (-5 ~ -14) didn't show noticeable speed change.
Change-Id: I21bec2d6e7fbe541fcc0f4c0366bbdf3e2076aa2
|
|
|
|
There were some situations that the start motion vectors were
out of range. This fix adjusted range checks to make sure they
are checked and clamped.
Change-Id: Ife83b7fed0882bba6d1fa559b6e63c054fd5065d
|
|
|
|
Removes mixed usage of (unsigned) long long and INT64.
Fixes Issue #208.
Change-Id: I220d3ed5ce4bb1280cd38bb3715f208ce23cf83a
|
|
|
|
Fixed.
Change-Id: I3348e8dbcaee6ace263af413701101d77636e5df
|
|
|
|
Noticed small performance gains, depending on material.
Change-Id: I334369f6312bc19aa73481fc3f790ab181e11867
|
|
|
|
The change fixes building error on Win64.
Change-Id: I63d25b26220c4da8a98ca2e36530cbb802468e6b
|
|
CONFIG_FAST_UNALIGNED is enabled by default. Disable it if it is
not supported by hardware.
Change-Id: I7d6905ed79fed918bca074bd62820b0c929d81ab
|
|
|
|
sharpness was not recalculated in vp8cx_pick_filter_level_fast
remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.
always use cm->sharpness_level. the extra indirection was annoying.
don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init
move function declarations to their proper header
Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db
|
|
search"
|
|
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
3.4% at --rt --cpu-used =-4
2.8% at --rt --cpu-used =-3
2.3% at --rt --cpu-used =-2
2.2% at --rt --cpu-used =-1
Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.
Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.
Make this change exclusively for x86 platforms.
Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
|
|
|
|
|
|
This is done by expanding luma row to 32-byte alignment, since
there is currently a bunch of code that assumes that
uv_stride == y_stride/2 (see, for example, vp8/common/postproc.c,
common/reconinter.c, common/arm/neon/recon16x16mb_neon.asm,
encoder/temporal_filter.c, and possibly others; I haven't done a
full audit).
It also uses replaces the hardcoded border of 16 in a number of
encoder buffers with VP8BORDERINPIXELS (currently 32), as the
chroma rows start at an offset of border/2.
Together, these two changes have the nice advantage that simply
dumping the frame memory as a contiguous blob produces a valid,
if padded, image.
Change-Id: Iaf5ea722ae5c82d5daa50f6e2dade9de753f1003
|
|
Change-Id: Icb4e4f0d7c3074a8507852178be87541a1cb5bac
|
|
|
|
armv5 dequantizer is not referenced
Change-Id: Id1cc617dcee35ebd6a406816ec6aaa26e8bbc8ad
|
|
allowing the compiler to inline this function. For real-time
encodes, this gave a boost of 1% to 2.5%, depending on the
speed setting.
Change-Id: I3929d176cca086b4261267b848419d5bcff21c02
|
|
This patch attempts to improve the handling of CBR streams with
respect to the short term buffering requirements. The "buffer level"
is changed to be an average over the rc buffer, rather than a long
running average. Overshoot is also tracked over the same interval
and the golden frame targets suppressed accordingly to correct for
overly aggressive boosting.
Testing shows that this is fairly consistently positive in one
metric or another -- some clips that show significant decreases
in quality have better buffering characteristics, others show
improvenents in both.
Change-Id: I924c89aa9bdb210271f2e03311e63de3f1f8f920
|
|
|
|
Optimized C-code of the following functions:
- vp8_tokenize_mb
- tokenize1st_order_b
- tokenize2nd_order_b
Gives ~1-5% speed-up for RT encoding on Cortex-A8/A9
depending on encoding parameters.
Change-Id: I6be86104a589a06dcbc9ed3318e8bf264ef4176c
|
|
vpx_copy_and_extend_frame could incorrectly
resize uv frames which could result in a crash.
Change-Id: Ie96f7078b1e328b3907a06eebeee44ca39a2e898
|
|
min_fs_radius, max_fs_radius, full_freq were set but never read.
Change-Id: I82657f4e7f2ba2acc3cbc3faa5ec0de5b9c6ec74
|
|
|
|
Minor fix.
Change-Id: Iaf93f6e47e882a33c479e57c7a0d0bf321e291c0
|
|
Several improvements we made in good-quality mode can be added
into real-time mode to speed up encoding in speed 1, 2, and 3
with small quality loss. Tests using tulip clip showed:
--rt --cpu-used=-1
(before change)
PSNR: 38.028
time: 1m33.195s
(after change)
PSNR: 38.014
time: 1m20.851s
--rt --cpu-used=-2
(before change)
PSNR: 37.773
time: 0m57.650s
(after change)
PSNR: 37.759
time: 0m54.594s
--rt --cpu-used=-3
(before change)
PSNR: 37.392
time: 0m42.865s
(after change)
PSNR: 37.375
time: 0m41.949s
Change-Id: I76ab2a38d72bc5efc91f6fe20d332c472f6510c9
|
|
|
|
|
|
|
|
|
|
|
|
Change-Id: I5fe581d797571a7a9432fbd17fc557591d0c1afa
|
|
Change-Id: I65105a9c63832669237e6a6a7fcb4ea3ea683346
|
|
Clamp mv search to accomodate subpixel filtering
of UV mv.
Change-Id: Iab3ed405993ef6bf779ad7cf60863153068fb7d1
|
|
Scott suggested to move vp8_mv_pred() under "case NEWMV" to save
extra checks.
Change-Id: I09e69892f34a08dd425a4d81cfcc83674e344a20
|
|
|
|
Do mvp clamping in full-pixel precision instead of 1/8-pixel
precision to avoid error caused by right shifting operation.
Also, further fixed the motion vector limit calculation in change:
b7480454706a6b15bf091e659cd6227ab373c1a6
Change-Id: Ied88a4f7ddfb0476eb9f7afc6ceeddbf209fffd7
|