Age | Commit message (Collapse) | Author |
|
Change-Id: I96ed73e109c4f89dd06f3583cf7ecf9277401fae
|
|
|
|
This reverts commit 06e6d56fa138d84759e8bdfd4c721ead000051b4
Change-Id: If95598385b693945d6b144d03b6da8f6a57dac98
|
|
Some version of clang refuse 'asm volatile'
Change-Id: I79d909ac8ae3c04b608f20c6f10fa79b7f9fc8e0
|
|
|
|
This reverts commit e516a42527098a26798dbb3663a5bcdd38793839
Change-Id: I7c78712acc737ad5f580181cdab3aa76b23f3ca5
|
|
|
|
This eliminates the asm_offsets dependency for future
all-assembly versions of this function.
Change-Id: I3227073ecfcb8ee6e593934fab941e9081abdda0
|
|
Replace it with some intrinsic code and inline assembly.
Change-Id: I81b4df146db3d01039059be7dae31083e2943b97
|
|
The only reason for the _intrinsics part of the file name was for the
interim period where only one of the functions was redone and the base
file name was the same.
Change-Id: I7851154f1633d48821bee885b1cadb2148e65a23
|
|
Pick up VP8 encryption, quantization changes, and some fixes to vpxenc
Conflicts:
test/decode_test_driver.cc
test/decode_test_driver.h
test/encode_test_driver.cc
vp8/vp8cx.mk
vpxdec.c
vpxenc.c
Change-Id: I9fbcc64808ead47e22f1f22501965cc7f0c4791c
|
|
Use unique names and ditch the local label declaration. Visual Studio
does not support it.
https://code.google.com/p/webm/issues/detail?id=561
Change-Id: Ica643cf5abb56ee6156371f5bf73fdeb58014422
|
|
Change-Id: I5637d491eb6a9b7633f72e03fd9df72131eeb121
|
|
|
|
Remove dependency of this function on asm_offsets. ssse3/sse4 next.
Change quant_shift calculation so it be done using SIMD. Pre-calculate
as much as possible to simplify EOB selection.
Take advantage of qcoeff being zero'd by tying the if statements
together.
Speed parity with previous implementation with gcc x86_64 linux
Change-Id: Ife97556a1eca3a74b09def1a3d04084974dff1fb
|
|
|
|
s/movd/movq/
Change-Id: Id1a56de91551f8dc796f14f1056c565dfc1ba626
|
|
Reduce dependency on offsets file by using intrinsics. Disassembly shows
improvements over previous assembly specifically in register management,
preloading, and {pro,epi}log. Speed change is within margin of error.
Change-Id: I8131b4b4d62bc092407fe847bfaa8f2c0e1384ff
|
|
Some projects must define only win64 for Windows 64bit builds using
yasm.
Change-Id: I1d09590d66a7bfc8b4412e1cc8685978ac60b748
|
|
Change-Id: If7822e6fcd0d3568b934032322b19ba3e401df26
|
|
Change-Id: Ib8f8a66c9fd31e508cdc9caa662192f38433aa3d
|
|
Creates a merge between the master and experimental branches. Fixes a
number of conflicts in the build system to allow *either* VP8 or VP9
to be built. Specifically either:
$ configure --disable-vp9 $ configure --disable-vp8
--disable-unit-tests
VP9 still exports its symbols and files as VP8, so that will be
resolved in the next commit.
Unit tests are broken in VP9, but this isn't a new issue. They are
fixed upstream on origin/experimental as of this writing, but rebasing
this merge proved difficult, so will tackle that in a second merge
commit.
Change-Id: I2b7d852c18efd58d1ebc621b8041fe0260442c21
|
|
Change-Id: Ic084c475844b24092a433ab88138cf58af3abbe4
|
|
For non-static functions, change the prefix to vp9_. For static functions,
remove the prefix. Also fix some comments, remove unused code or unused
function prototypes.
Change-Id: I1f8be05362f66060fe421c3d4c9a906fdf835de5
|
|
This change encompasses VP8_PTR, VP8_COMP, VP8D_COMP, VP8_COMMON,
VP8Decompressor and VP8Common.
Change-Id: I514ef4ad4e682370f36d656af1c09ee20da216ad
|
|
For local symbols, make them static instead.
Change-Id: I13d60947a46f711bc8991e16100cea2a13e3a22e
|
|
Change-Id: Ie2e3652591b010ded10c216501ce24fd95d0aec5
|
|
Remove the fdct invoke macro calls
Change-Id: Ica2431c655819fa012133ee7abc75a16761e5fd6
|
|
Change-Id: I321280abcf48f3dc16e194d29bde2bd3baec6006
|
|
Change-Id: Idd2722a538423b451e1e3495f89a7141480493d6
|
|
This reinstates reverted commit 2113a831575d81faeadd9966e256d58b6b2b1633
Change-Id: I9a9af13497d1e58d4f467e3e083fddf06b1b786c
|
|
The denoiser function was modified to reduce the computational
complexity.
1. The denoiser c function modification:
The original implementation calculated pixel's filter_coefficient
based on the pixel value difference between current raw frame and last
denoised raw frame, and stored them in lookup tables. For each pixel c,
find its coefficient using
filter_coefficient[c] = LUT[abs_diff[c]];
and then apply filtering operation for the pixel.
The denoising filter costed about 12% of encoding time when it was
turned on, and half of the time was spent on finding coefficients in
lookup tables. In order to simplify the process, a short cut was taken.
The pixel adjustments vs. pixel diff value were calculated ahead of time.
adjustment = filtered_value - current_raw
= (filter_coefficient * diff + 128) >> 8
The adjustment vs. diff curve becomes flat very quick when diff increases.
This allowed us to use only several levels to get a close approximation
of the curve. Following the denoiser algorithm, the adjustments are
further modified according to how big the motion magnitude is.
2. The sse2 function was rewritten.
This change made denoiser filter function 3x faster, and improved the
encoder performance by 7% ~ 10% with the denoiser on.
Change-Id: I93a4308963b8e80c7307f96ffa8b8c667425bf50
|
|
Merges this experiment in to make it easier to run tests on
filter precision, vectorized implementation etc.
Also removes an experimental filter.
Change-Id: I1e8706bb6d4fc469815123939e9c6e0b5ae945cd
|
|
Approximate the Google style guide[1] so that that there's a written
document to follow and tools to check compliance[2].
[1]: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
[2]: http://google-styleguide.googlecode.com/svn/trunk/cpplint/cpplint.py
Change-Id: Idf40e3d8dddcc72150f6af127b13e5dab838685f
|
|
Allows building the library with the gcc -pedantic option, for improved
portabilty. In particular, this commit removes usage of C99/C++ style
single-line comments and dynamic struct initializers. This is a
continuation of the work done in commit 97b766a46, which removed most
of these warnings for decode only builds.
Change-Id: Id453d9c1d9f44cc0381b10c3869fabb0184d5966
|
|
Change-Id: I912384f526865089aa03ca8875591324e5c1c449
|
|
Change-Id: I1d2db53129dc6ec068093ad1e5fc0d94110473b3
|
|
Compares the sum of differences between the input block and the averaged
block. If they differ too much the block will not be filtered. Negligible
perfomance hit.
Change-Id: Ib1c31a265efd4d100b3abc4a1ea6675038c8ddde
|
|
Add PRIVATE macro for adding private_extern directive for yasm
to hide global symbols. This is only enabled if -DCHROMIUM is used
with YASM.
Also fixed a small problem with rtcd_defs.sh to guard TEMPORAL_DENOISING.
Change-Id: I9027fce3ebddcf20078293e4b86b396f21da7857
|
|
Faster version of denoiser, cut cost by 1.7x for C path, by 3.3x for
SSE2 path.
Change-Id: I154786308550763bc0e3497e5fa5bfd1ce651beb
|
|
Removes all runtime initialization of global data in entropy.c.
Precalculated values are used for initializing all entropy related
tabels.
First patch in a series to make sure code is reentrant.
Change-Id: I9aac91a2a26f96d73c6470d772a343df63bfe633
|
|
Deprecate fast quant and strict_quant code.
Small effect on quality as fast was used in first pass but the
effect is basically neutral across the derf set.
The rationale here is to reduce the number of code paths for
now to make experimentation easier. Optimized and fast code
options can be re-introduced later along with other encode
speed options.
Change-Id: Ia30c5daf3dbc52e72c83b277a1d281e3c934cdad
|
|
The MFQE function of the postprocessor depends on these
Change-Id: I256a37c6de079fe92ce744b1f11e16526d06b50a
|
|
Fixes a bug that was introduced in the high precision mv patch.
Change-Id: Ieadb433ebe4c3ef3e0e63944dab11528bf8bd73a
|
|
This is the initial patch for supporting 1/8th pel
motion. Currently if we configure with enable-high-precision-mv,
all motion vectors would default to 1/8 pel. Encode and
decode syncs fine with the current code. In the next phase
the code will be refactored so that we can choose the 1/8
pel mode adaptively at a frame/segment/mb level.
Derf results:
http://www.corp.google.com/~debargha/vp8_results/enhinterp_hpmv.html
(about 0.83% better than 8-tap interpoaltion)
Patch 3: Rebased. Also adding 1/16th pel interpolation for U and V
Patch 4: HD results.
http://www.corp.google.com/~debargha/vp8_results/enhinterp_hd_hpmv.html
Seems impressive (unless I am doing something wrong).
Patch 5: Added mmx/sse for bilateral filtering, as well as enforced
use of c-versions of subpel filters with 8-taps and 1/16th pel;
Also redesigned the 8-tap filters to reduce the cut-off in order to
introduce a denoising effect. There is a new configure option
sixteenth-subpel-uv which will use 1/16 th pel interpolation for
uv, if the motion vectors have 1/8 pel accuracy.
With the fixes the results are promising on the derf set. The enhanced
interpolation option with 8-taps alone gives 3% improvement over thei
derf set:
http://www.corp.google.com/~debargha/vp8_results/enhinterpn.html
Results on high precision mv and on the hd set are to follow.
Patch 6: Adding a missing condition for CONFIG_SIXTEENTH_SUBPEL_UV in
vp8/common/x86/x86_systemdependent.c
Patch 7: Cleaning up various debug messages.
Patch 8: Merge conflict
Change-Id: I5b1d844457aefd7414a9e4e0e06c6ed38fd8cc04
|
|
Depending on implementation the optimized SAD functions may return early
when the calculated SAD exceeds max_sad.
Change-Id: I05ce5b2d34e6d45fb3ec2a450aa99c4f3343bf3a
|
|
Removal of the pickinter.c and .h files and calls to this
code.
Removal of some code relating to real time and one pass
settings though there is more to be done in this regard.
However, vp8_set_speed_features() now
only supports modes 0 and 1 and speeds up to 3
so rd should always be set.
Change-Id: I62c0c1b6154ab499785baef310536080e87bc4d8
|
|
Removed ~CONFIG_REALTIME_ONLY code.
Change-Id: I5fafff29a08acd8928699f9ddce8744787024d8c
|
|
Change-Id: I9fb510f9421fb3c317a8e32e3058cee977ddf9fa
|
|
In the variance calculations the difference is summed and later squared.
When the sum exceeds sqrt(2^31) the value is treated as a negative when
it is shifted which gives incorrect results.
To fix this we cast the result of the multiplication as unsigned.
The alternative fix is to shift sum down by 4 before multiplying.
However that will reduce precision.
For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and
change).
PPC change is untested.
Change-Id: I1bad27ea0720067def6d71a6da5f789508cec265
|