Age | Commit message (Collapse) | Author |
|
These files don't contain generic arm code, so should
only be compiled by neon.
Change-Id: Ie712823aa04d4235e7cfe7a3b725e73ee4c3e564
|
|
|
|
- Removed fast_fdct4x4_neon and fast_fdct8x4_neon
- Uses now short_fdct4x4 and short_fdct8x4
- Gives ~1-2% speed-up on Cortex-A8/A9
Change-Id: Ib62f2cb2080ae719f8fa1d518a3a5e71278a41ec
|
|
- Updated walsh transform to match C
(based on Change Id24f3392)
- Changed fast_fdct4x4 and 8x4 to short_fdct4x4 and 8x4
correspondingly
Change-Id: I704e862f40e315b0a79997633c7bd9c347166a8e
|
|
Change-Id: Ibcf5b4b14153f65ce1b53c3bfba87ad2feb17bbd
|
|
Useful for leaving out any version specific asm files.
Change-Id: I233514410eb9d7ca88d2d2c839673122c507fa99
|
|
vp8_fast_quantize_b_pair_neon function added to quantize
two adjacent blocks at the same time to improve performance.
- Additional 3-6% speedup compared to neon optimized fast
quantizer (Tanya VGA@30fps, 1Mbps stream, cpu-used=-5..-16)
Change-Id: I3fcbf141e5d05e9118c38ca37310458afbabaa4e
|
|
vp8_fast_quantize_b_neon function updated and further optimized.
- match current C implementation of fast quantizer
- updated to use asm_enc_offsets for structure members
- updated ads2gas scripts to handle alignment issues
Change-Id: I5cbad9c460ad8ddb35d2970a8684cc620711c56d
|
|
|
|
Address calculations moved from encodemb_arm.c file to neon
optimized assembly function to save cycles in function calls.
- vp8_subtract_b_neon_func replaced with vp8_subtract_b_neon
that contains all needed address calculations
- unnecessary file encodemb_arm.c removed
- consistent with ARMv6 optimized version
Change-Id: I6cbc1a2670b56c2077f59995fcf8f70786b4990b
|
|
|
|
|
|
Adds following ARMv6 optimized functions to encoder:
- vp8_subtract_b_armv6
- vp8_subtract_mby_armv6
- vp8_subtract_mbuv_armv6
Gives 1-5% speed-up depending on input sequence and encoding
parameters. Functions have one stall cycle inside the loop body
on Cortex pipeline.
Change-Id: I19cca5408b9861b96f378e818eefeb3855238639
|
|
now that we need asm_enc_offsets.c for x86 and arm and it is
harmless to build it for other targets, add it unconditionally
Change-Id: I320c5220afd94fee2b98bda9ff4e5e34c67062f3
|
|
Half pixel interpolations optimized in variance calculations. Separate
function calls to vp8_filter_block2d_bil_x_pass_armv6 are avoided.On
average, performance improvement is 6-7% for VGA@30fps sequences.
Change-Id: Idb5f118a9d51548e824719d2cfe5be0fa6996628
|
|
Optimized fdct4x4 (8x4) for ARMv6 instruction set.
- No interlocks in Cortex-A8 pipeline
- One interlock cycle in ARM11 pipeline
- About 2.16 times faster than current C-code compiled with -O3
Change-Id: I60484ecd144365da45bb68a960d30196b59952b8
|
|
Change-Id: I08edaffc62514907fa5e90e1689269e467c857f5
|
|
|
|
Change-Id: I77e9f2f521a71089228f96e2db72524189364ffb
|
|
Adds new ARMv6 optimized function vp8_fast_quantize_b_armv6
to the encoder.
Change-Id: I40277ec8f82e8a6cbc453cf295a0cc9b2504b21e
|
|
Adds a new ARMv6 optimized function vp8_sad16x16_armv6 to encoder.
Change-Id: Ibbd7edb8b25cb7a5b522d391b1e9a690fe150e57
|
|
Adds vp8_sub_pixel_variance16x16_armv6 function to encoder. Integrates
ARMv6 optimized bilinear interpolations from vp8/common/arm/armv6
and adds new assembly file for variance16x16 calculation.
- vp8_filter_block2d_bil_first_pass_armv6 (integrated)
- vp8_filter_block2d_bil_second_pass_armv6 (integrated)
- vp8_variance16x16_armv6 (new)
- bilinearfilter_arm.h (new)
Change-Id: I18a8331ce7d031ceedd6cd415ecacb0c8f3392db
|
|
it's difficult to mux the *_offsets.c files because of header conflicts.
make three instead, name them consistently and partititon the contents
to allow building them as required.
Change-Id: I8f9768c09279f934f44b6c5b0ec363f7943bb796
|
|
Change-Id: Ibd6e3bc82471839904b1086b499efc55f7c5cbaf
|
|
previously wasn't guarded with ifdef ARMV7, causing a link error with
ARMV6
Change-Id: I0526858be0b5f49b2bf11e9090180b2a6c48926d
|
|
NEON has optimized 16x16 half-pixel variance functions, but they
were not part of the RTCD framework. Add these functions to RTCD,
so that other platforms can make use of this optimization in the
future and special-case ARM code can be removed.
A number of functions were taking two variance functions as
parameters. These functions were changed to take a single
parameter, a pointer to a struct containing all the variance
functions for that block size. This provides additional flexibility
for calling additional variance functions (the half-pixel special
case, for example) and by initializing the table for all block sizes,
we don't have to construct this function pointer table for each
macroblock.
Change-Id: I78289ff36b2715f9a7aa04d5f6fbe3d23acdc29c
|
|
The primary goal is to allow a binary to be built which supports
NEON, but can fall back to non-NEON routines, since some Android
devices do not have NEON, even if they are otherwise ARMv7 (e.g.,
Tegra).
The configure-generated flags HAVE_ARMV7, etc., are used to decide
which versions of each function to build, and when
CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen
at run time.
In order for this to work, the CFLAGS must be set to something
appropriate (e.g., without -mfpu=neon for ARMv7, and with
appropriate -march and -mcpu for even earlier configurations), or
the native C code will not be able to run.
The ASFLAGS must remain set for the most advanced instruction set
required at build time, since the ARM assembler will refuse to emit
them otherwise.
I have not attempted to make any changes to configure to do this
automatically.
Doing so will probably require the addition of new configure options.
Many of the hooks for RTCD on ARM were already there, but a lot of
the code had bit-rotted, and a good deal of the ARM-specific code
is not integrated into the RTCD structs at all.
I did not try to resolve the latter, merely to add the minimal amount
of protection around them to allow RTCD to work.
Those functions that were called based on an ifdef at the calling
site were expanded to check the RTCD flags at that site, but they
should be added to an RTCD struct somewhere in the future.
The functions invoked with global function pointers still are, but
these should be moved into an RTCD struct for thread safety (I
believe every platform currently supported has atomic pointer
stores, but this is not guaranteed).
The encoder's boolhuff functions did not even have _c and armv7
suffixes, and the correct version was resolved at link time.
The token packing functions did have appropriate suffixes, but the
version was selected with a define, with no associated RTCD struct.
However, for both of these, the only armv7 instruction they actually
used was rbit, and this was completely superfluous, so I reworked
them to avoid it.
The only non-ARMv4 instruction remaining in them is clz, which is
ARMv5 (not even ARMv5TE is required).
Considering that there are no ARM-specific configs which are not at
least ARMv5TE, I did not try to detect these at runtime, and simply
enable them for ARMv5 and above.
Finally, the NEON register saving code was completely non-reentrant,
since it saved the registers to a global, static variable.
I moved the storage for this onto the stack.
A single binary built with this code was tested on an ARM11 (ARMv6)
and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder,
and produced identical output, while using the correct accelerated
functions on each.
I did not test on any earlier processors.
Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa
|
|
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.
Fixes issue #97.
Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
|
|
When the license headers were updated, they accidentally contained
trailing whitespace, so unfortunately we have to touch all the files
again.
Change-Id: I236c05fade06589e417179c0444cb39b09e4200d
|
|
Change-Id: Ieebea089095d9073b3a94932791099f614ce120c
|
|
|