Age | Commit message (Collapse) | Author |
|
Change-Id: I5fe581d797571a7a9432fbd17fc557591d0c1afa
|
|
|
|
The current code stores pointers to coefficient tables and loads them to
access the tables contents. As these pointers are stored in the code
sections, it means we end up with text relocations. eu-findtextrel will
thus complain about code not compiled with -fpic/-fPIC.
Since the pointers are stored in the code sections, we can actually cheat
and let the assembler generate relative addressing when accessing the
coefficient tables, and just load their location with adr.
Change-Id: Ib74ae2d3f2bab80b29991355f2dbe6955f38f6ae
|
|
About 9~10% decoding perf improvement on non-Neon ARM cpus
Change-Id: I7dc2a026764e84e9c2faf282b4ae113090326837
|
|
make reference version of bilinear_filters short.
use reference versions of bilinear_filters and sub_pel_filters when
possible.
recognize that Width was being passed into
filter_block2d_bil_first_pass multiple times. ARM version had already
fixed this. propegate to C.
change references to src_pixels_per_line to src_pitch and standardize on
src/dst (instead of input/output).
recognize that first_pass is only run in the verticle and second_pass
only horizontal. ARM version had already fixed this. propegate to C
Change-Id: I292d376d239a9a7ca37ec2bf03cc0720606983e2
|
|
Adds following targets to configure script to support RVCT compilation
without operating system support (for Profiler or bare metal images).
- armv5te-none-rvct
- armv6-none-rvct
- armv7-none-rvct
To strip OS specific parts from the code "os_support"-config was added
to script and CONFIG_OS_SUPPORT flag is used in the code to exclude OS
specific parts such as OS specific includes and function calls for
timers and threads etc. This was done to enable RVCT compilation for
profiling purposes or running the image on bare metal target with
Lauterbach.
Removed separate AREA directives for READONLY data in armv6 and neon
assembly files to fix the RVCT compilation. Otherwise
"ldr <reg>, =label" syntax would have been needed to prevent linker
errors. This syntax is not supported by older gnu assemblers.
Change-Id: I14f4c68529e8c27397502fbc3010a54e505ddb43
|
|
The existing code applied a 6-tap filter with 0's on either end.
We're already paying the branch penalty to avoid computing the two
extra columns needed as input to this filter.
We might as well save time computing the filter as well.
This reduces the inner loop from 21 instructions to 16, the number
of loads per iteration from 4 to 1, and the number of multiplies
from 7 to 4.
The gain in overall decoding performance, however, is small (less
than 1%).
This change also means we now valgrind clean on ARMv6, which is
its real purpose.
The errors reported here were valgrind's fault (it does not detect
that 0 times an uninitialized value is initialized), but Julian
Seward says it would slow down valgrind considerably to make such
checks.
Speeding up libvpx rather, even by a small amount, seems a much
better idea if only to enable proper valgrind checking of the
rest of the codec.
Change-Id: Ifb376ea195e086b60f61daf1097d8910c4d8ff16
|
|
This function was accessing values below the stack pointer, which
can be corrupted by signal delivery at any time.
Change-Id: I92945b30817562eb0340f289e74c108da72aeaca
|
|
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.
Fixes issue #97.
Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
|
|
move some things around, reorder some instructions
constant 0 is used several times. load it once per call in horiz,
once per loop in vert.
separate saturating instructions to avoid stalls.
just use one usub8 call to set GE flags, rather than uqsub8 followed by
usub8 w/ 0
document some stalls for further consideration
Change-Id: Ic3877e0ddbe314bb8a17fd5db73501a7d64570ec
|
|
test cases were causing a crash because the count was being read
incorrectly. after fixing that, noticed that the output was not
matching. fixed that.
Change-Id: Idb0edb887736bd566a3cf6d4aa1a03ea8d20eb27
|
|
only saved r4-11+lr, but were storing r4-r12+lr
Change-Id: If77df1998af50e9badee7d99ef53543046434675
|
|
Jeff Muizelaar posted some changes to the idct/reconstruction c code.
This is the equivalent update for the arm assembly.
This shows a good boost on v6, and a minor boost on neon.
Here are some numbers for highway in qcif, 2641 frames:
HEAD neon: ~161 fps
new neon: ~162 fps
HEAD v6: ~102 fps
new v6: ~106 fps
The following functions have been updated for armv6 and neon:
vp8_dc_only_idct_add
vp8_dequant_idct_add
vp8_dequant_dc_idct_add
Conflicts:
vp8/decoder/arm/armv6/dequantdcidct_v6.asm
vp8/decoder/arm/armv6/dequantidct_v6.asm
Resolved by removing these files. When I rewrote the functions, I also
moved the files to dequant_dc_idct_v6.asm/dequant_idct_v6.asm
Change-Id: Ie3300df824d52474eca1a5134cf22d8b7809a5d4
|
|
When the license headers were updated, they accidentally contained
trailing whitespace, so unfortunately we have to touch all the files
again.
Change-Id: I236c05fade06589e417179c0444cb39b09e4200d
|
|
Change-Id: Ieebea089095d9073b3a94932791099f614ce120c
|
|
|