summaryrefslogtreecommitdiff
path: root/vpx_dsp
AgeCommit message (Collapse)Author
2017-05-15move neon load/stores to a new fileJohann
Move the tran_low_t helper functions to a new file. Additional load/store functions will be added here. Change-Id: I52bf652c344c585ea2f3e1230886be93f5caefc3
2017-05-12Add visibility="protected" attribute for global variables referenced in asm ↵Rahul Chaudhry
files. During aosp builds with binutils-2.27, we're seeing linker error messages of this form: libvpx.a(subpixel_mmx.o): relocation R_386_GOTOFF against preemptible symbol vp8_bilinear_filters_x86_8 cannot be used when making a shared object subpixel_mmx.o is assembled from "vp8/common/x86/subpixel_mmx.asm". Other messages refer to symbol references from deblock_sse2.o and subpixel_sse2.o, also assembled from asm files. This change marks such symbols as having "protected" visibility. This satisfies the linker as the symbols are not preemptible from outside the shared library now, which I think is the original intent anyway. Change-Id: I2817f7a5f43041533d65ebf41aefd63f8581a452
2017-05-12Merge changes I1b54a7a5,I3028bdad,I59788cd9James Zern
* changes: ppc: Add get_mb_ss_vsx ppc: Add get4x4sse_cs_vsx ppc: Add comp_avg_pred_vsx
2017-05-12ppc: Add get_mb_ss_vsxLuca Barbato
Change-Id: I1b54a7a5bb642e4b836d786ea1ae506eed025e3f
2017-05-12ppc: Add get4x4sse_cs_vsxLuca Barbato
Change-Id: I3028bdadf653665d18e781d28e9625f62804b3d8
2017-05-12ppc: Add comp_avg_pred_vsxLuca Barbato
Change-Id: I59788cd98231e707239c2ad95ae54f67cfe24e10
2017-05-12ppc: Add vpx_sad64x32/64_vsxAlexandra Hájková
Change-Id: I84e3705fa52f75cb91b2bab4abf5cc77585ee3e2
2017-05-12ppc Add vpx_sad32x16/32/64_vsxAlexandra Hájková
Change-Id: I3c4f9d595275669580413a71b3c3c810e7ddcacd
2017-05-12Merge "ppc: Add vpx_sad16x8/16/32_vsx"James Zern
2017-05-10ppc: Add vpx_sad16x8/16/32_vsxAlexandra Hájková
Change-Id: I60619d28fffd9809f93b1af510a50e1aa02519a9
2017-05-10Update specializations of idct functionsLinfeng Zhang
Introduced append situation in Commit 0178d97 which could be confusing. Clean a little bit and add some comments. Change-Id: I69ad336f805aca7ce9d45515b8cd237423fadbb2
2017-05-10Merge changes I92eb4312,Ibb2afe4eJohann Koenig
* changes: subpel variance neon: add mixed sizes sub pixel variance neon: use generic variance
2017-05-09Clean 32x32 idct C codeLinfeng Zhang
Change-Id: I73b8104a9e7a70ffe827c1b7ff43618f24f5d7bd
2017-05-08Update 4x4 idct sse2 functionsLinfeng Zhang
It's a bit faster to call idct4_sse2() in vpx_idct4x4_16_add_sse2() Change-Id: I1513be7a895cd2fc190f4a8297c240b17de0f876
2017-05-08neon variance: process 16 values at a timeJohann
Read in a Q register. Works on blocks of 16 and larger. Improvement of about 20% for 64x64. The smaller blocks are faster, but don't have quite the same level of improvement. 16x32 is only about 5% BUG=webm:1422 Change-Id: Ie11a877c7b839e66690a48117a46657b2ac82d4b
2017-05-08Merge changes Id602909a,Ib0e85608Johann Koenig
* changes: neon variance: process two rows of 8 at a time neon variance: add small missing sizes
2017-05-08Merge changes I0cfe4117,I3581d80d,Ida62c941Linfeng Zhang
* changes: Split dsp/x86/inv_txfm_sse2.c Update highbd idct functions arguments to use uint16_t dst Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct
2017-05-04subpel variance neon: add mixed sizesJohann
Add support for everything except block sizes of 4. Performance is better but numbers will improve again when the variance optimizations land. BUG=webm:1423 Change-Id: I92eb4312b20be423fa2fe6fdb18167a604ff4d80
2017-05-04sub pixel variance neon: use generic varianceJohann
When a neon version is available it will be called. This allows decoupling the variance implementations and has no real downside. For most configurations, the call will be #define'd to the neon implementation. Change-Id: Ibb2afe4e156c5610e89488504d366b3e6d1ba712
2017-05-04fdct 8x8 neon: minor comment cleanupJohann
Simplify HBD/non distinction in test. Document why transpose_neon.h is not used Change-Id: I17659414206ddbb8c2f1ef0d9f4a17f1745d5a52
2017-05-04neon variance: process two rows of 8 at a timeJohann
When the width is equal to 8, process two rows at a time. This doubles the speed of 8x4 and improves 8x8 by about 20%. 8x16 was using this technique already, but still improved a little bit with the rewrite. Also use this for vpx_get8x8var_neon BUG=webm:1422 Change-Id: Id602909afcec683665536d11298b7387ac0a1207
2017-05-04neon variance: add small missing sizesJohann
Some of the mixed sizes were missing. They can be implemented trivially using the existing helper function. When comparing the previous 16x8 and 8x16 implementations, the helper function is about 10% faster than the 16x8 version. The 8x16 is very close, but the existing version appears to be faster. BUG=webm:1422 Change-Id: Ib0e856083c1893e1bd399373c5fbcd6271a7f004
2017-05-03Split dsp/x86/inv_txfm_sse2.cLinfeng Zhang
Spin out highbd idct functions. BUG=webm:1412 Change-Id: I0cfe4117c00039b6778c59c022eee79ad089a2af
2017-05-03Update highbd idct functions arguments to use uint16_t dstLinfeng Zhang
BUG=webm:1388 Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5
2017-05-03Clean CONVERT_TO_BYTEPTR/SHORTPTR in idctLinfeng Zhang
BUG=webm:1388 Change-Id: Ida62c941f2b836d6c9e27b427a7d5008ab6dc112
2017-05-03High bit depth inter prediction horizontal/vertical filters AVX2Yi Luo
User level speed improvement on i7-6700, cpu-used=1, x86_64 Linux, bitrate, 1080p, 8Mbps, 4K, 16Mbps: - Decoder: 1080p: ~4% 4K: ~5% - Encoder: 1080p: ~1% 4K: ~3% Change-Id: I51b48f9c5de0d62487d5a11aa579c97bd03dd640
2017-05-03Merge changes I8bb660de,Ica51d780,I6037525dLinfeng Zhang
* changes: Clean specializes of idct functions Clean add_protos of highbd idct functions Clean add_protos of idct functions
2017-05-02ppc: Add convolve8_vsx and convolve8_avg_vsxLuca Barbato
Change-Id: Ia5293d948003a7fff5a7cbad6e83d8a72717c857
2017-05-02ppc: Add convolve8_avg_vert_vsxLuca Barbato
Only the generic one again, speedups for 8x8 and larger blocks to come later. Change-Id: I90d481d3a602d1e277ead8f3934eca126b86b72d
2017-05-02ppc: Add convolve8_vertLuca Barbato
Only the generic one again, speedups for 8x8 and larger blocks to come later. Change-Id: Ia509d6225984b4930ec03928c9bcbf51486da99f
2017-05-02ppc: Add convolve8_horiz_avgLuca Barbato
The 8x8 and larger blocks cases can be sped up further. Change-Id: I54549b03ac6c7a4e3f485738b100c3cac7ac2e15
2017-05-02ppc: Add convolve8_horizLuca Barbato
The 8x8 and larger blocks cases can be sped up further. Change-Id: I89b635d6b01c59f523f2d54b1284ed32916c5046
2017-05-02Clean specializes of idct functionsLinfeng Zhang
Change-Id: I8bb660de47b5f97263ec381dc428db96e9c9a4b2
2017-05-02Clean add_protos of highbd idct functionsLinfeng Zhang
Change-Id: Ica51d780b92b316ce9112740c56cdf7670816371
2017-05-02Clean add_protos of idct functionsLinfeng Zhang
Change-Id: I6037525d92ec172810edab720389eb1865ed3b1a
2017-04-29ppc: Add convolve_avgLuca Barbato
Change-Id: Ib203c444c708f42072e38301ee3db97b5b53d014
2017-04-29ppc: Add convolve_copyLuca Barbato
Change-Id: Ie26d6dbe090e711d84bac01ba7da270db983f405
2017-04-25Update highbd convolve functions arguments to use uint16_t src/dstLinfeng Zhang
BUG=webm:1388 Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
2017-04-19ppc: h predictor 8x8Luca Barbato
Slightly faster with the current compiler. Change-Id: Iae225fac08395eb430c97a2abec69c60f5cf5c47
2017-04-19ppc: d63 predictor 8x8Luca Barbato
10x faster. Change-Id: I7cedbf4df2ce7df5b6f1108b11815d088fdb9ba8
2017-04-19ppc: tm predictor 4x4Luca Barbato
Slightly faster. Change-Id: I0ca43f309b3d9b50435d69bd5be64b53a99bd191
2017-04-19ppc: h predictor 4x4Luca Barbato
2x faster. Change-Id: I0583dec353299c6797401b646099f18db4e0420d
2017-04-19ppc: dc predictor 8x8Luca Barbato
Slightly faster, the other dc predictors cannot be faster since the computation speedup is overwhelmed by the time spent reading dst to write just the 8x8 part. Change-Id: I94a0b50500adf8b7b6bb919dbf5c7adf5b9fba66
2017-04-19ppc: d45 predictor 8x8Luca Barbato
11x faster. Change-Id: I5b8f39213ee1f5260724fc254e3fb5c462435798
2017-04-19ppc: d63 predictor 32x32Luca Barbato
About 10x faster. Change-Id: If7d0645f75c5d7deb9751edd0bf47e2f9068e9e7
2017-04-19ppc: d63 predictor 16x16Luca Barbato
About 18x faster. Change-Id: Id043bf76c011e03e992085bb5e20f330d3e98cd4
2017-04-19ppc: d45 predictor 32x32Luca Barbato
About 12x faster. Change-Id: I22c150256aefb4941861ab1f6c17d554fb694bed
2017-04-19ppc: d45 predictor 16x16Luca Barbato
About 16x faster. Change-Id: Ie5469fb32d5fd11bb6cb06318cea475d8a5b00b9
2017-04-19ppc: dc predictor 32x32Luca Barbato
10x and 5x faster. Change-Id: I7913c58c768334d818f541a5e219f1035791eeaf
2017-04-19ppc: dc top and left predictor 32x32Luca Barbato
6x faster. Change-Id: I717995b4056e5579c68191d11b495372971fe1ae