Age | Commit message (Collapse) | Author |
|
1. vpx_convolve_avg_mmi
2. vpx_convolve8_avg_horiz_mmi
Change-Id: Ie544aac45b4b1c0a0e51b44b650189ae5e88aee1
|
|
|
|
googletest imports tuple into testing to allow for compatibility across
c++ versions where tuple may be in std::tr1 or std. fixes deprecation
warnings under visual studio 2017
Change-Id: Id78b372d5478b12d8c8f63fd3f2166fec25aa8be
|
|
1. vpx_convolve8_vert_mmi
2. vpx_convolve8_horiz_mmi
3. vpx_convolve8_mmi
4. vpx_convolve8_avg_mmi
5. vpx_convolve8_avg_vert_mmi
Change-Id: I41a6b3b4f327d6b67d282e0163cfa0aee8648abe
|
|
Compiler -- gcc (Debian 7.3.0-5) 7.3.0
Change-Id: If2dcc6e215a2990cde575f0e744ce0c7a44a15f1
|
|
Change-Id: I638507b360c71489ab0e87bd558d2719ad995333
|
|
Changed the intrinsics to perform summation similiar to the way the assembly does.
The new code diverges from the assembly by preferring unsaturated additions.
Results for haswell
SSSE3
Horiz/Vert Size Speedup
Horiz x4 ~32%
Horiz x8 ~6%
Vert x8 ~4%
AVX2
Horiz/Vert Size Speedup
Horiz x16 ~16%
Vert x16 ~14%
BUG=webm:1471
Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668
|
|
|
|
Let it test extreme inputs and all filter types.
In the future ConvolveTest should test regular 8-bit functions in
high bitdepth mode.
Change-Id: I1042564d1d390589ca203070fe332c6da3315d75
|
|
Also adds vpx_convolve8_avg_horiz_avx2.
Change-Id: I38783d972ac26bec77610e9e15a0a058ed498cbf
|
|
vpx_convolve8_avg works by first running a normal horizontal filter then a
vertical filter averages at the end.
The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the
horizontal step.
vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code.
Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983
|
|
Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac
|
|
BUG=webm:1419
Change-Id: I39c8033734562efc0ac0e28e7f06fa05130f9b96
|
|
so that the convolve functions are independent of table alignment.
Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee
|
|
User level speed improvement on i7-6700, cpu-used=1,
x86_64 Linux, bitrate, 1080p, 8Mbps, 4K, 16Mbps:
- Decoder:
1080p: ~4%
4K: ~5%
- Encoder:
1080p: ~1%
4K: ~3%
Change-Id: I51b48f9c5de0d62487d5a11aa579c97bd03dd640
|
|
Change-Id: Ia5293d948003a7fff5a7cbad6e83d8a72717c857
|
|
Only the generic one again, speedups for 8x8 and larger blocks to
come later.
Change-Id: I90d481d3a602d1e277ead8f3934eca126b86b72d
|
|
Only the generic one again, speedups for 8x8 and larger blocks
to come later.
Change-Id: Ia509d6225984b4930ec03928c9bcbf51486da99f
|
|
The 8x8 and larger blocks cases can be sped up further.
Change-Id: I54549b03ac6c7a4e3f485738b100c3cac7ac2e15
|
|
The 8x8 and larger blocks cases can be sped up further.
Change-Id: I89b635d6b01c59f523f2d54b1284ed32916c5046
|
|
Change-Id: Ib203c444c708f42072e38301ee3db97b5b53d014
|
|
Change-Id: Ie26d6dbe090e711d84bac01ba7da270db983f405
|
|
BUG=webm:1388
Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
|
|
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.
BUG=webm:1388
Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
|
|
Change-Id: Ibcef70e4fead74e2c2909330a7044a29381a8074
|
|
BUG=webm:1299
Change-Id: I236bfa0441e357b6ff05add8269a2cfb543924d1
|
|
BUG=webm:1299
Change-Id: Ib87ac466ada63251eb06ae2abd1e13e61e0d1538
|
|
BUG=webm:1290
Change-Id: Ia27e58521eba5a4852b50381c56746fa5767f6d6
|
|
Combine test MatchesReferenceSubpixelFilter and
MatchesReferenceAveragingSubpixelFilter.
Change-Id: I75f96befbbb118cdc6b8c6001b4cdda8d88fbbd3
|
|
applied against a x86_64 configure with and without
--enable-vp9-highbitdepth
clang-tidy-3.7.1 \
-checks='-*,google-readability-braces-around-statements' \
-header-filter='.*' -fix
+ clang-format afterward
Change-Id: Ia2993ec64cf1eb3505d3bfb39068d9e44cfbce8d
|
|
Change-Id: I0d9ab85855eb723f653a7bb09b3d0d31dd6cfd2f
|
|
* changes:
configure: remove x86inc.asm distinction
test: remove x86inc.asm distinction
vpx_dsp: remove x86inc.asm distinction
|
|
BUG=b:29583530
Change-Id: I296a0b81755e3086bc0a40cb126d0200ff03c095
|
|
CONVERT_TO_BYTEPTR(x) was corrected in:
003a9d2 Port metric computation changes from nextgenv2
to use the more common (x) within the expansion. offsets should occur
after converting the pointer to the desired type.
+ factorized some common expressions
Change-Id: I171c3faaa5606d098e984baa9aa74bb36042f57f
|
|
Add a cast.
BUG=webm:1225
Change-Id: I34ea18ee816569485c1f1046a81fd2a0ce527ac8
|
|
Add a cast.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1216
Change-Id: I40627de387bc9cfba37860e7a0a4f2d4524f3431
|
|
Brings f03e238f to master.
Change-Id: I7f7754e7d1288b103a4510303d10afc68a7d8ca8
|
|
Change-Id: Iff8b0d77234f78bf407676891bccad92825bfcc6
|
|
single-threaded:
swanky (silvermont): ~1% faster overall
peppy (celeron,haswell): ~1.5% faster overall
Change-Id: Ib74f014374c63c9eaf2d38191cbd8e2edcc52073
|
|
Change-Id: Iccb4cdc23c1845cf9cb7d69101c9f4f43675d368
|
|
and FUN_CONV_2D macros. The predict lut now handles
this case. The encoder now calls vpx_scaled_2d() instead
of vpx_convolve8() for scaling.
Change-Id: Ia1c8af8a31e4cb4887a587143108cb45835f7df7
|
|
It in essence refactors the code for both the interpolation
filtering and the convolution. This change includes the moving
of all the files as well as the changing of the code from vp9_
prefix to vpx_ prefix accordingly, for underneath architectures:
(1) x86;
(2) arm/neon; and
(3) mips/msa.
The work on mips/drsp2 will be done in a separate change list.
Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
|
|
Change-Id: I9582a8d74990125b71e8fe620f7f3f2585a30798
|
|
This test places 128 in positions that would not be found
in the VP9 filter tables. The ssse3 code packs this table
into chars and uses the pmaddubsw instruction, which treats
the value as signed. The ssse3 code checks for 128 in
position 3, skipping the ssse3 code if found, and calls
vp9_convolve8_c(). vp9_convolve8_c() is also used for scaling.
ChangeFilterWorks breaks the ssse3 scaling code found in other
commits.
Change-Id: I1f5a76834bc35180b9094c48f9421bdb19d3d1cb
|
|
expose filter_kernels[] and do the table lookup directly
Change-Id: I0b10bff0327c3e01a723736141a9ffd377cd3d20
|
|
Change-Id: I374fcd8fb45a6893dcdeac6896671be142a99f06
|
|
average improvement ~4x-6x
Change-Id: I7c8b4f2334491be8a859592606e568bc95d019aa
|
|
average improvement ~5x-8x
Change-Id: I179a69ec620fbd69979bd128f05d18113618aab4
|
|
average improvement ~4x-6x
Change-Id: Ia2e6f770da46416ebec31fdcea5cc7878879a9d9
|
|
Updated sources according to improved version of common MSA macros.
Enabled respective convolve MSA hooks and tests.
Overall, this is just upgrading the code with styling changes.
Change-Id: If5ad6ef8ea7ca47feed6d2fc9f34f0f0e8b6694d
|