Age | Commit message (Collapse) | Author |
|
Define LIBVPX_{ELF,MACHO} to simplify blocks.
Create new globalsym macro and include logic for PRIVATE.
BUG=webm:1679
Change-Id: I303ba1492a2813f685de51155ccef7e4831e1881
|
|
Chromium needs :function hidden and the space between the symbol and the
colon removed, at least for nasm. This matches x86inc.asm.
BUG=webm:1679
Change-Id: Ie47bb75d44d3130791639cbf4e2ebe019e2d686e
|
|
Change-Id: Ia244bfd4b4eb9d703653792bc4f64c6f5358ae19
|
|
BUG=webm:1403
Change-Id: Id9833e985fb70958cf4bde38f8e6303ed83c12f9
|
|
BUG=webm:1413
Change-Id: I8d7eeae1bd219eb848c1a86071046a477f7a91af
|
|
Change-Id: I4f3052d8748e16b06e9155f8daf22f867dfaa7a3
|
|
BUG=webm:1413
Change-Id: Id9038226902b2d793fc6c17ac81bb104c1a18988
|
|
BUG=webm:1413
Change-Id: I14930d0af24370a44ab359de5bba5512eef4e29f
|
|
nasm should infer .text but does not for windows:
https://bugzilla.nasm.us/show_bug.cgi?id=3392451
Change-Id: Ib195465e5f33405f5ff61c4cf88aa2a72640cacb
|
|
Change-Id: I51106e90344035452621c49a6e1be7d5276b6c70
|
|
Split to load_input_data4() and load_input_data8().
Use pack with signed saturation instruction for high bitdepth.
Change-Id: Icda3e0129a6fdb4a51d1cafbdc652ae3a65f4e06
|
|
Change-Id: Id59865fd6c453a24121ce7160048d67875fc67ce
|
|
It's a bit faster to call idct4_sse2() in vpx_idct4x4_16_add_sse2()
Change-Id: I1513be7a895cd2fc190f4a8297c240b17de0f876
|
|
Change-Id: Ie0f150fdcfcbf7c4db52d3a08bc8238ed1c72e3b
|
|
The deblocking filters used in vp8 have been moved to vpx_dsp for
use by both vp8 and vp9.
Change-Id: I5209d76edafc894b550f751fc76d3aa6799b392d
|
|
add a trailing ':', though it's optional with the tools we support, it's
more common to use it to mark a label. this also quiets the
orphan-labels warning with nasm/yasm.
BUG=b/29583530
Change-Id: I46e95255e12026dd542d9838e2dd3fbddf7b56e2
|
|
Change-Id: I39a67ffea7b0a55b45cdf935986439537b65601f
|
|
Change-Id: I12218d8331c0558c0587a66321e3ca46da7e5cc7
|
|
When configured with high bitdepth enabled, the 8bit transform
stopped using optimised code. This made 8bit content decode slowly.
Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
|
|
This commit moves the module inverse transform functions from vp9
to vpx_dsp folder. The hybrid transform wrapper functions stay in
the vp9 folder, since it involves codec-specific data structures.
Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
|
|
It in essence refactors the code for both the interpolation
filtering and the convolution. This change includes the moving
of all the files as well as the changing of the code from vp9_
prefix to vpx_ prefix accordingly, for underneath architectures:
(1) x86;
(2) arm/neon; and
(3) mips/msa.
The work on mips/drsp2 will be done in a separate change list.
Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
|
|
|
|
Change-Id: I64edc26cf4aab050c83f2d393df6250628ad43b8
|
|
Separate the common coefficient constant into vpx_dsp/txfm_common.h.
Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h.
This clears the use case of vp9_idct.h in vpx_dsp folder.
Change-Id: I319735a2abf42888e5080ac14cfbcde34be7b121
|
|
The various tap loop filter operations are common functions across
codec. This commit moves them along with SIMD optimizations to
vpx_dsp folder.
Change-Id: Ia5fa0b2e5289cdb98467502a549c380b9c60e92c
|
|
Roughly half as many cycles as plain C.
Change-Id: I8c16c29940b76d54ee7e4fb874c328ce90bff5d4
|
|
This reverts commit f8d35016408f3957c67945160d65be467ca97fdc.
Change-Id: If8c7af403c091b7fb447a6f0c73fecdbccbc51b3
|
|
80% fewer cycles than C
Change-Id: I841bde1e268ddd33ae2ee75eee94737a400e2cde
|
|
The vp9_lpf_vertical_16_dual function optimized for x86 32bit target. The hot code in that function was caused by the call to the transpose8x16.
The gcc generated assembly created uneeded fills and spills to the stack. By interleaving 2 loads and unpack instructions, in addition to hoisting the consumer
instruction closer to the producer instructions, we eliminated most of the fills and spills and improve the function-level performance by 17%.
credit for writing the function as well as finding the root cause goes to Erik Niemeyer (erik.a.niemeyer@intel.com)
Change-Id: I6173cf53956d52918a047d1c53d9a673f952ec46
|
|
Change-Id: Ia0ff859ff1c813dbe100e2f27b1ef78167483f4e
|
|
+ synchronize filter function signatures
this makes any intrinsics filters available for inlining and has the
side-effect of making those filters static, quieting missing-prototype
warnings.
Change-Id: I1908875caffa585bd4fc65aaf10d17a5e20cfb46
|
|
+ synchronize filter function signatures
this makes any intrinsics filters available for inlining and has the
side-effect of making those filters static, quieting missing-prototype
warnings.
Change-Id: I1cd55c9d52547793ad65aa90c7620f0e426edaa2
|
|
collect the vp9_convolve function definition macros there; this will
allow some relocation of functions from vp9_asm_stubs.c
Change-Id: Idadd117fa256dd48748379856973fd985b8204e8
|
|
reorder includes to avoid:
warning C4985: 'ceil': attributes not present on previous declaration.
this is the same workaround used in vp9/common/vp9_systemdependent.h
Change-Id: Ia10dd63de24f96fa1507a6179220e9d6ec774db6
|
|
silences a missing declaration warning
Change-Id: I59a34e1a1377cf3529b678d7ec0122bd43ab1bf1
|
|
With the sad functions, and hopefully the variance functions soon,
moving to the vpx_dsp location, place the defines used in the
reference C code in a common location.
Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
|
|
|
|
this macro was used inconsistently and only differs in behavior from
DECLARE_ALIGNED when an alignment attribute is unavailable. this macro
is used with calls to assembly, while generic c-code doesn't rely on it,
so in a c-only build without an alignment attribute the code will
function as expected.
Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
|
|
vp9_dc_left_predictor_16x16
vp9_dc_top_predictor_32x32
vp9_dc_left_predictor_32x32
vp9_dc_128_predictor_32x32
Change-Id: Ib9861deefd01c3527235b92ff6b3d571ef6b4bc6
|
|
widen the loads and stores to 128-bit.
this was added, but not enabled in:
493a857 Add some sse2 code for intra prediction.
Change-Id: I277d7db608a7db7d75cc0bde86f48fa66ad487e4
|
|
|
|
+ fix some whitespace
Change-Id: Id61b739282014288a7e5d3c17a9d6448d9d4cda2
|
|
offsetting by a variable stride prevents instruction reordering,
resulting in poor assembly
Change-Id: Id62d6b3299cdd23f8c44f97b630abf4fea241446
|
|
offsetting by a variable stride prevents instruction reordering,
resulting in poor assembly.
additionally reroll 16x16/32x32 loops to reduce register spill with this
new format
Change-Id: I0635b8ba21ecdb88116e927dbdab53acdf256e11
|
|
Change-Id: I16c0a62e52dab62837c547345df31e7518620ed4
|
|
The rotation computation using 2X of cos(pi/16) has a potential to
overflow 32 bit, this commit disable the function to allow further
investigation and optimization.
Change-Id: I4a9803bc71303d459cb1ec5bbd7c4aaf8968e5cf
|
|
|
|
|
|
|
|
Change-Id: Idb14b9a285f8098126f967c5e2750221d6a58f69
|