Age | Commit message (Collapse) | Author |
|
The obj_int_extract code is no longer worth maintaining. It creates
significant issues when adapting for different build systems and no
longer offers as significant of a performance benefit due to
improvements in intrinsics.
Source files will remain until the various third-party builds are updated.
The neon fast quantizer has been moved to intrinsics. The armv6 version
has been removed because so few remaining targets require it.
Compilers and processors have improved significantly since the
pack_tokens code was written. The assembly is no longer faster than the
C code.
pack_tokens were the only optimizations for the armv5te targets so the targets
will be removed after the test infrastructure has been updated.
BUG=710
Change-Id: Ic785b167cd9f95eeff31c7c76b7b736c07fb30eb
|
|
The intrinsics version of the pair quant is slower than running it
individually.
Change-Id: I7b4ea8599d4aab04be0a5a0c59b8b29a7fc283f4
|
|
Change-Id: Ie4686e1b15af6bcc8d59d585bbeb996f38224522
|
|
Change-Id: Ic9956ddf1c2ddffcf7be7fdfc23ad9a2426fc47a
WIP: Fixing unsafe threading in VP8 encoder.
|
|
Fixing unsafe threading in VP8 encoder.
Change-Id: Ibf4c89a2043654834747811bc11eb283de0bb830
|
|
Change-Id: I76fe20ade099573997404b8733cf7f79e82fb21e
WIP: Fixing unsafe threading in VP8 encoder.
|
|
Multi-threaded code was not updated to disable background
refresh for non base-layer frames at the time it was
disabled in the main C-code.
Change-Id: Id6cc376130b7def046942121cfd0526b4f0a71d4
|
|
Change-Id: Ifa78c0a953fab3e5dd7af0446924846c7022cd09
|
|
|
|
Change-Id: I44e4e3869f231ae270cca98c9565f23c512e3ddf
|
|
Change-Id: I650a593162280ab40e71e527ec6518303e2d5723
|
|
Change-Id: I28ac1519d1594801fef9a623cb64598d3d751eb0
|
|
Change-Id: Ie22841d096f3c86694b95bd06fc3a8ce1f032a10
|
|
Change-Id: Ib73c7b2bee4cb2eb2528fa6b381fffe9503079a0
|
|
Change-Id: Ie9a26be7c9baa54a0e43a63ed6c77f2746477a9c
|
|
Change-Id: I289564a5a27f0d03ddc6f19c7838542ff22719be
|
|
Code cleanup
Change-Id: I82f9d787a2f511d39895fd8dfd5347a1676d9dbc
|
|
|
|
The current way of counting inter_zz_count doesn't work correctly
in multi-threaded encoding. Calculating it after the frame is
encoded fixed the problem.
Change-Id: Ifcb1972cde950b8cc194f75c6d7b6af09e8b0e65
|
|
Added checks for pthread_create() errors.
Change-Id: Ie198ef5c14314fe252d2e02f7fe5bfacc7e16377
|
|
The sync interval for the multithreaded encoder was considered as not changing
during the encoding. This is not true if picture size is changed.
The encoder could dead-lock because the main thread and the other threads were
using different sync interval.
Change-Id: I75232bbdbc6c02d77f830d870fd8b4e96697c64e
|
|
Precalculated block ptrs do not need updates during encoding.
Set these at init stage.
Moved the allocation of 'mt_current_mb_col' (last encoded MB on each
row) to vp8_alloc_compressor_data(), so that it is correctly
reallocated when frame size is changing.
Change-Id: Idcdaa2d0cf3a7f782b7d888626b7cf22a4ffb5c1
|
|
Allows building the library with the gcc -pedantic option, for improved
portabilty. In particular, this commit removes usage of C99/C++ style
single-line comments and dynamic struct initializers. This is a
continuation of the work done in commit 97b766a46, which removed most
of these warnings for decode only builds.
Change-Id: Id453d9c1d9f44cc0381b10c3869fabb0184d5966
|
|
Conflicts:
vp8/common/entropymode.c
vp8/common/entropymode.h
vp8/encoder/encodeframe.c
vp8/vp8_cx_iface.c
Change-Id: I708b0f30449b9502b382e47b745d56f5ed2ce265
|
|
Change If4321cc5 fixed a bug caused by forward declarations not being
kept in sync across C files, resulting in a function call with the
wrong arguments. The commit moves the affected function declarations
into a header file, along with the other symbols from encodeframe.c
that were being sloppily shared.
Change-Id: I76a7b4c66d4fe175f9cbef7e52148655e4bb9ba1
|
|
mb_row and mb_col was not passed to vp8cx_encode_inter_macroblock in
threaded encoding.
Change-Id: If4321cc59bf91e991aa31e772f882ed5f2bbb201
|
|
mb_row and mb_col was not passed to vp8cx_encode_inter_macroblock in
threaded encoding.
Change-Id: If4321cc59bf91e991aa31e772f882ed5f2bbb201
|
|
RD costs were local to MACROBLOCK data and had to be copied all the
time to each thread's MACROBLOCK data. Tables moved to a common place
and only pointers are setup for each encoding thread.
vp8_cost_tokens() generates 'int' costs so changed all types to be
int (i.e. removed unsigned).
NOTE: Could do some more cleaning in vp8cx_init_mbrthread_data().
Change-Id: Ifa4de4c6286dffaca7ed3082041fe5af1345ddc0
|
|
Produce the token partitions on-the-fly, while processing each MB.
Context is updated at the beginning of each frame based on the
previoud frame's counters. Optimally encoder outputs partitions in
separate buffers. For frame based output, partitions are concatenated
internally.
Limitations:
- enabled just in combination with realtime-only mode
- number of encoding threads has to be equal or less than the
number of token partitions. For this reason, by default the encoder
will do 8 token partitions.
- vpxenc supports partition output (-P) just in combination with
IVF output format (--ivf)
Performance:
- Realtime encoder can be up to 13% faster (ARM) depending on the number
of threads and bitrate settings. Constant gain over the 5-16 speed
range.
- Token buffer reduced from one frame to 8 MBs
Quality:
- quality is affected by the delayed context updates. This again
dependents on input material, speed and bitrate settings. For VC
style input the loss seen is up to 0.2dB. If error-resilient=2
mode is used than the effect of this change is negligible.
Example:
./configure --enable-realtime-only --enable-onthefly-bitpacking
./vpxenc --rt --end-usage=1 --fps=30000/1000 -w 640 -h 480
--target-bitrate=1000 --token-parts=3 --static-thresh=2000
--ivf -P -t 4 -o strm.ivf tanya_640x480.yuv
Change-Id: I127295cb85b835fc287e1c0201a67e378d025d76
|
|
Second shot at this...
Sync with loopfilter thread as late as possible, usually just at the
beginning of next frame encoding. This returns control to application
faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed
imediatly so we cannot delay the sync. Same has to be done when
internal frame is previewed.
Change-Id: I64e110c8b224dd967faefffd9c93dd8dbad4a5b5
|
|
Change-Id: Ieb05270ac332a4cc38ec4b7b995fc0150e0fffdf
|
|
Change-Id: I10efa441d663fceb6bc97a3bfad518cd3d9a5128
|
|
This commit continues the process of converting to the new RTCD
system. It removes the last of the VP8_ENCODER_RTCD struct references.
Change-Id: I2a44f52d7cccf5177e1ca98a028ead570d045395
|
|
This commit continues the process of converting to the new RTCD
system.
Change-Id: I3f9c07db65eb206f6363d21bdb80e871570da767
|
|
This commit continues the process of converting to the new RTCD
system.
Change-Id: I9bfcf9bef65c3d4ba0fb9a3e1532bad1463a10d6
|
|
This patch removes the local copies of the dequantize
constants and implements John's idea as described
in "Make a local copy of the dequantized data" commit.
Change-Id: Ic6b7d681f00bf63263f71ff1e39ab2f80729e8b2
|
|
Change-Id: Ie2dc0d72363ff38e0f71b59f6e2d1a2d70c5266b
|
|
Change-Id: I72ed49ce14ca0124dd0d31bfcf4c7630a4681587
|
|
Remove BOOL, INTn, UINTn, etc, in favor of C99-style fixed width
types.
Change-Id: I396636212fb5edd6b347d43cc940186d8cd1e7b5
|
|
This value needs to be copied to each thread's data structure.
This fixed artifact problem in multi-thread encoder.
Change-Id: Iab6d9745a1d44846aa503184705376f63a505597
|
|
vp8cx_mb_init_quantizer() needs to be called at least once to get
all values calculated. This change added one check to decide if
we could skip initialization or not.
Change-Id: I3f65eb548be57580a61444328336bc18c25c085b
|
|
Change-Id: I4fcd6e4656d9823aead941616cd63501aecbd6e2
|
|
caused by the "Removed bmi copy to/from BLOCKD" commit.
Change-Id: I9fae71bdc34c8ecc07bb81cd3ccf498b91ce3ec7
|
|
I got this idea from Pascal (Thanks). Before encoding a macroblock,
copy it to a 16x16 buffer, and then read source data from there
instead. This will help keep the source data in cache, and help
with the performance.
Change-Id: Id05f4cb601299150511d59dcba0ae62c49b5b757
|
|
Some further re-structuring of activity masking code.
Still has various experimental switches.
Supports a metric based on intra encode.
Experimental comparison against a fixed activity target rather
than a frame average, for altering rd and zbin.
Overall the SSIM performance is similar to TT's original
code but there is a much smaller PSNR hit of circa
0.5% instead of 3.2%
Change-Id: I0fd53b2dfb60620b3f74d7415e0b81c1ac58c39a
|
|
|
|
Declared the bmi in BLOCKD as a union instead of B_MODE_INFO.
Then removed B_MODE_INFO completely.
Change-Id: Ieb7469899e265892c66f7aeac87b7f2bf38e7a67
|
|
vp8_fast_quantize_b_pair_neon function added to quantize
two adjacent blocks at the same time to improve performance.
- Additional 3-6% speedup compared to neon optimized fast
quantizer (Tanya VGA@30fps, 1Mbps stream, cpu-used=-5..-16)
Change-Id: I3fcbf141e5d05e9118c38ca37310458afbabaa4e
|
|
Change-Id: I6e5e921f03dc15a72da89a457848d519647677a3
|
|
Declared the bmi in MODE_INFO as a union instead of B_MODE_INFO.
This reduced the memory footprint by 518,400 bytes for 1080
resolutions. The decoder performance improved by ~4% for the
clip used and the encoder showed very small improvements. (0.5%)
This reduction was first mentioned to me by John K. and in a
later discussion by Yaowu.
This is WIP.
Change-Id: I8e175fdbc46d28c35277302a04bee4540efc8d29
|