Age | Commit message (Collapse) | Author |
|
make them symmetrical with the generated output and their vp9
counterparts
Change-Id: I72cc97c4d33d713dff620a6d7cc25955266216fc
|
|
Change-Id: Ib8f8a66c9fd31e508cdc9caa662192f38433aa3d
|
|
Creates a merge between the master and experimental branches. Fixes a
number of conflicts in the build system to allow *either* VP8 or VP9
to be built. Specifically either:
$ configure --disable-vp9 $ configure --disable-vp8
--disable-unit-tests
VP9 still exports its symbols and files as VP8, so that will be
resolved in the next commit.
Unit tests are broken in VP9, but this isn't a new issue. They are
fixed upstream on origin/experimental as of this writing, but rebasing
this merge proved difficult, so will tackle that in a second merge
commit.
Change-Id: I2b7d852c18efd58d1ebc621b8041fe0260442c21
|
|
Change-Id: Ic084c475844b24092a433ab88138cf58af3abbe4
|
|
Most of these were picked up by jenkins in the commit that changed
the vp8 namespace to vp9 in common/.
Change-Id: I5cbd56ffc753b92ef805133cda6acc1713a13878
|
|
This way, the code is not compiled in by default, thus decreasing
overall binary size.
Change-Id: I85cac8f5a22a51a7d99c820ef6d6ed179d4106a0
|
|
Quickly modified the ssse3 sixtap filters to support eight taps. For the test
clip used, a 23+% boost in decoder performance was seen. We can
revisit later and improve further.
Change-Id: I5f59860459e80d6fa23e6cc0fd91296a969f5240
|
|
3.7% boost in decoder performance for the clip used.
Change-Id: I74f28486a9352b472b36e21b5eaf30eff35e9199
|
|
Change-Id: I5bca7b7a4b230082d36ac6fb84db84137ad177d7
|
|
First sse2 version of vp8_mbloop_filter_vertical_edge(). For now,
intrinsics are being used until the bitstream is finalized. This function
will be revisited later for further performance improvements.
For the test clip used, a 34+% decoder performance improvement
was seen. This will vary depending on material.
Change-Id: I455b438bc8d8af76cf7533ac42eda5f689b21f7c
|
|
First sse2 version of vp8_mbloop_filter_horizontal_edge(). For now,
intrinsics are being used until the bitstream is finalized. This function
will be revisited later for further performance improvements.
For the test clip used, a 31+% decoder performance improvement
was seen. This will vary depending on material.
Change-Id: I03ed3a7182478bdd1f094644ff3e0442625600e7
|
|
this commit fixes the build on windows with visual studio 2008.
Change-Id: I0baa4044e9e54237da29f2e17332ea6f766dbbec
|
|
Alternative strategy for finding a list of candidate motion
vectors to use as reference values in mv coding and as
nearest and near.
Sort by sad in vp8_find_best_ref_mvs() rather than just
pick the best. Allow 0,0 as a best ref option but not a
nearest or near unless there are no alternatives.
Encode/Decode verified on at least some clips.
Some commented out experimental and stats code still in place.
Gain over existing code averages about 1% on derf (alll metrics)
with improvement on all clips. Other test results pending.
The entropy coding of the mode (nearest/near etc) still
depends upon and requires the old "findnear" code so
this needs looking at and may provide room for further gains.
Change-Id: I871d7cba1d1c379c4bad9bcccce1fb19c46b8247
|
|
|
|
About 20% overall encoder speedup (vs. about 30% for sse4 version).
Change-Id: Ibf608a6a1bc94b14ec47e8046d3206b275b5a8bd
|
|
This is being reimplemented more generically in terms of affine
transforms.
Change-Id: I9300bfde5f8b93c708c64f59427087720f8ed782
|
|
About 3.5x faster, 30% overall encoder speedup. Rest of optimizations
will come soon (see TODO section in filter_sse4.c).
Change-Id: If18108048bfd5345fc942e8574e4c7f58e0e86e0
|
|
Latest version of all scripts/makefile but rtcd_defs.sh is empty, all
existing functions are still selected using the old/current way.
Change-Id: Ib92946a48a31d6c8d1d7359eca524bc1d3e66174
|
|
Change-Id: I52a3b0a4a42e5af91b987e19523df07c8f467847
|
|
predict_d has become canonical. Remove previous helper function.
Disable ARM assembly pending update.
Change-Id: Idd84ac8a28f9b0221ea97904a77de1e705d06a7d
|
|
Signed-off-by: Raghu Gandham <raghu@mips.com>
Change-Id: I3a8bca425cd3dab746a6328c8fc8843c8e87aea6
|
|
The commit changed how baseline 8x8 coefficient probabilities are
initialized, to be consistent with the initialization of baseline
4x4 coefficient probabilities.
The commit does not have any effect on compression.
Change-Id: Ifb3902b5dc0b0c2e6dc3aa5d4a6589d528e58355
|
|
Consolodate the unit tests under vp8/ to the test/ directory
Change-Id: I6d6a0fb60f5e3874a4d6710e9e121dd3e81a93db
|
|
Rework unit tests to have a single executable rather than many, which
should avoid pollution of the visual studio project namespace, improve
build times, and make it easier to use the gtest test sharding system
when we get these going on the continuous build cluster.
Change-Id: If4c3e5d4b3515522869de6c89455c2a64697cca6
|
|
(see Change I9b2ccc88: Makes all mode token tables const)
Further remove runtime table initialization and use
precalculated const data. Data footprint reduced
by 4112 bytes.
Change-Id: Ia3ae9fc19f77316b045cabff01f6e5f0876a86ab
|
|
In a previous commit, the duplicate of headerfile defaultcoefcounts.h
was identified. This commit updates the .mk file to ensure configure
and make works properly for all platforms.
Change-Id: I31a39c809a734ba438ee53db700f252e9a03eddd
|
|
Break MFQE code into it's own file.
It is currently only valid for 16x16 and 8x8 Y blocks. It also filters
4x4 U/V blocks.
Refactor filtering and add associated assembly. Limited test cases show
--mfqe introduces a penalty of ~20% with HD content. The assembly
reduces the penalty to ~15%
Change-Id: I4b8de6b5cdff5413037de5b6c42f437033ee55bf
|
|
The MFQE function of the postprocessor depends on these
Change-Id: I256a37c6de079fe92ce744b1f11e16526d06b50a
|
|
add unit tests for vp8_short_idct4x4llm_c
Change-Id: I472b7c0baa365ba25dc99a3f6efccc816d27c941
|
|
On Android NDK, rand() is inlined function. But, on our SSE optimization,
we need symbol for rand()
Change-Id: I42ab00e3255208ba95d7f9b9a8a3605ff58da8e1
|
|
For the experimental branch we are trying to slim the codebase
down removing features such as threading for now which complicate
the process of development and testing.
Change-Id: I657c0246aef4d1fa8c8ffc6a1adfeee45bce8e24
|
|
This function adds the common prediction modules, some data structures
and a config option but does not use them.
It also corrects a bug in clearing down the MODE_INFO border and introduces
a new element that indicates if an entry corresponds to an "in image" macro block
or is part of the border.
Change-Id: Ib69eec0876173ebe9d1de9df9537d0b2447702e0
|
|
This commit continues the process of converting to the new RTCD
system.
Change-Id: I6c519ab61e4f4e0ebcc796f2df061f945c48cefe
|
|
This commit continues the process of converting to the new RTCD
system.
Change-Id: If54eb5cb5d1b0cac6c4c0633a9e99c93ca860ba2
|
|
This commit continues the process of converting to the new RTCD
system.
Change-Id: I9bfcf9bef65c3d4ba0fb9a3e1532bad1463a10d6
|
|
This commit continues the process of converting to the new RTCD
system.
Change-Id: I03c4dbf30dfd3558b0e256ff9d3ff4c012aadc80
|
|
This commit continues the process of converting to the new RTCD
system.
Change-Id: Ic8a4047d72ff3a54ec98977dd90e70c13213db71
|
|
This is a proof of concept RTCD implementation to replace the current
system of nested includes, prototypes, INVOKE macros, etc. Currently
only the decoder specific functions are implemented in the new system.
Additional functions will be added in subsequent commits.
Overview:
RTCD "functions" are implemented as either a global function pointer
or a macro (when only one eligible specialization available).
Functions which have RTCD specializations are listed using a simple
DSL identifying the function's base name, its prototype, and the
architecture extensions that specializations are available for.
Advantages over the old system:
- No INVOKE macros. A call to an RTCD function looks like an ordinary
function call.
- No need to pass vtables around.
- If there is only one eligible function to call, the function is
called directly, rather than indirecting through a function pointer.
- Supports the notion of "required" extensions, so in combination with
the above, on x86_64 if the best function available is sse2 or lower
it will be called directly, since all x86_64 platforms implement
sse2.
- Elides all references to functions which will never be called, which
could reduce binary size. For example if sse2 is required and there
are both mmx and sse2 implementations of a certain function, the
code will have no link time references to the mmx code.
- Significantly easier to add a new function, just one file to edit.
Disadvantages:
- Requires global writable data (though this is not a new requirement)
- 1 new generated source file.
Change-Id: Iae6edab65315f79c168485c96872641c5aa09d55
|
|
Easier to filter out all NEON asm.
Change-Id: I0022dae8321a9608e864b09d4181414c5fff4610
|
|
This introduces base functions for introducing implicit segmentation.
The code that actually stores the results to the segment map isn't
here yet. This just prints out the segmentation map results
if you call it.
Uses connected component labeling technique on mbmi info so that only
if 2 mbs are horizontally or vertically touching do they get the same
segment.
vp8next - plumbing for rotation
code to produce taps for rotation ( tapify. py ), code
for predicting using rotation ( predict_rotated.c ) , code
for finding the best rotation find_rotation.c.
didn't checkin code that uses this in the codec. still work
in progress.
Fixed copyright notice
Change-Id: I450c13cfa41ab2fcb699f3897760370b4935fdf8
|
|
A processor with ARMv7 instructions does not
necessarily have NEON dsp extensions. This CL
has the added side effect of allowing the ability
to enable/disable the dsp extensions cleanly.
Change-Id: Ie1e879b8fe131885bc3d4138a0acc9ffe73a36df
|
|
|
|
Remove BOOL, INTn, UINTn, etc, in favor of C99-style fixed width
types.
Change-Id: I396636212fb5edd6b347d43cc940186d8cd1e7b5
|
|
This file declared a bunch of nonexistent, unreferenced global
function pointers.
Change-Id: Ic26bb8c7712deba754c49fc01f383b53afc9e728
|
|
Make bilinearfilter_arm.c compiled only when HAVE_ARMV6, as its definitions
are v6 only. This is normally not a problem for static builds as the file
is elided at link time, but this was not being done properly for the
--enable-shared --enable-pic build.
Change-Id: Ic800a7cde751f74f22555c5b247f99f9df5e550d
|
|
These functions are now used by the encoder.
This is WIP with the goal of creating a common idct/add for
the encoder and decoder. A boost of 1.8% was seen for
the HD rt test clip used.
[Tero] Added needed changes to ARM side.
Change-Id: Ibbb8000be09034203d7adffc457d3c3f8b06a5bf
|
|
Storing vp8_bilinear_filters_mmx in an mmx file and using it in an sse2
file is bad
Moving towards allowing --disable-mmx
Change-Id: I20493b35bdedcdcfc0915e6f05fdbe6c81a4a742
|
|
Added ARM optimized intra 4x4 prediction
- 2x faster on Profiler compared to C-code compiled with -O3
- Function interface changed a little to improve BLOCKD structure
access
Change-Id: I9bc2b723155943fe0cf03dd9ca5f1760f7a81f54
|
|
This quite large check in includes the following:
Merge in some code from Ronald (mbgraph.c) that scans a Gf/arf group.
This is used as a basis for a simple segmentation for the normal frames
in a gf/arf group. This code also uses satd functions from Yaowu.
Adds functionality for coding the latest possible position of an EOB for
blocks in the segment. (Currently 0-15 only, hence just for 4x4 dct).
Where the EOB position is 0 this acts like "skip" and the normal coding
of skip at the per mb level is disabled.
Added functions (seg_common.c) for setting and reading segment feature
elements. These may want to be optimized away at some point but while the
mecahnism is in a state of flux they provide a single location for making
changes and keep things a bit cleaner.
This is still proof of concept code. Currently the tested feature set:-
Quantizer,
Loop Filter level,
Reference frame,
Prediction Mode,
EOB end stop.
TBD:-
Add functions for setting and reading the feature data with range
and validity checking.
Handling of signed and unsigned feature data. At the moment all is assumed
to be signed and a sign bit is coded but many cannot be negative.
Correct handling of EOB feature with intra coded blocks.
Testing/trapping of legal/illegal ref frame and mode combinations.
Transform size switch plus merge and test with 8c8 DCT work
Merge and test with Sumans Segmenation coding optimizations
Change-Id: Iee12e83661c7abbd1e0ce6810915eb4ec35e2d8e
|
|
Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer. For blocks with eob=0,
unnecessary idcts can be eliminated. This gave a performance
boost of ~1.8% for the HD clips used.
Tero: Added needed changes to ARM side and scheduled some
assembly code to prevent interlocks.
Patch Set 6: Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.
Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6
|