summaryrefslogtreecommitdiff
path: root/vpx_dsp
AgeCommit message (Collapse)Author
2017-04-07Merge changes from topic 'Wshorten'James Zern
* changes: configure: enable -Wshorten-64-to-32 for hbd vp9_encodeframe: resolve -Wshorten-64-to-32 in hbd Resolve -Wshorten-64-to-32 in highbd variance.
2017-04-05Resolve -Wshorten-64-to-32 in highbd variance.James Zern
For 8-bit the subtrahend is small enough to fit into uint32_t. This is the same that was done for: c0241664a Resolve -Wshorten-64-to-32 in variance. For 10/12-bit apply: 63a37d16f Prevent negative variance Change-Id: Iab35e3f3f269035e17c711bd6cc01272c3137e1d
2017-04-05Update 32x32 high bitdepth idct NEON optimizationLinfeng Zhang
Preparation of CONVERT_TO_BYTEPTR/SHORTPTR clean up. BUG=webm:1388 Change-Id: I928d30a5698023bb90888d783cf81c51ec183760
2017-03-24intrapred: sync highbd_d135_predictor w/d135_James Zern
previously: 05437805f intrapred/d135: flatten border results before storing BUG=webm:1316 Change-Id: I3b8bd89117ad7f2f4560b57f7c148da781e86f85
2017-03-24intrapred: specialize highbd 4x4 predictorsJames Zern
d207/d63/d45/d117/d135/d153 ~9-45% better depending on the predictor on 32-bit ARM, similar range on x86-64 this matches the non-highbitdepth implementation BUG=webm:1316 Change-Id: Iddebdf7c58c6f31c47cae04da95c6e5318200e4c
2017-03-24intrapred: rename d63f to d63eJames Zern
this is consistent with he/ve/d45e Change-Id: I75641ae5667430b0ecd370db86fff6e666cb577d
2017-03-24remove CONFIG_MISC_FIXESJames Zern
this belonged to vp10 with the changes now migrated to av1. Change-Id: Ie30ead3e7b71f465bc14136e1b6f156ea978c43f
2017-03-23Merge "Fix mips msa fwd xform mismatch"Kaustubh Raste
2017-03-23Merge "vp9_rdopt: correct size to vpx_sum_squares_2d_i16"James Zern
2017-03-22Merge "idct_neon: prefix non-static functions w/'vpx_'"James Zern
2017-03-22vp9_rdopt: correct size to vpx_sum_squares_2d_i16James Zern
the current implementations expect pixel size, not the block type BUG=webm:1392 Change-Id: Ib91e9f30a1f56e13566b1fb76f089dae9bb50cdc
2017-03-22idct_neon: prefix non-static functions w/'vpx_'James Zern
Change-Id: I94fcdeae18468e6ef0cb7119b8142d982a048031
2017-03-22Fix mips msa fwd xform mismatchKaustubh Raste
Change-Id: I32a6df11463144aa1a562256ee7d57a41fd678d6
2017-03-21Merge "Make butterfly_self() signature consistent with butterfly()"Yi Luo
2017-03-21Make butterfly_self() signature consistent with butterfly()Yi Luo
- Refer to patch: 48fca113d inv_txfm_ssse3,butterfly: fix win32 abi compatibility. - Change four butterfly() calls to butterfly_self(), to simplify the operations. Change-Id: Ib2a8cfe6cddcaf0a59e6e6270d8380055ea42ef3
2017-03-21Merge "Add vpx_highbd_idct32x32_1024_add_neon()"James Zern
2017-03-21Merge "Add vpx_highbd_idct32x32_34_add_neon()"James Zern
2017-03-17inv_txfm_sse2: clear conversion warning in hbd buildJames Zern
tran_high -> tran_low in return from dct_const_round_shift() Change-Id: I2fe06c4b604823b1d1fe40a487017c3c2819a440
2017-03-17Add vpx_highbd_idct32x32_1024_add_neon()Linfeng Zhang
BUG=webm:1301 Change-Id: Ib90af0c1712e56b301d0e981dbe9a641e15e36ca
2017-03-17Add vpx_highbd_idct32x32_34_add_neon()Linfeng Zhang
BUG=webm:1301 Change-Id: I74dd16c6c64e7bb71aa991cedccddf0663ef5e06
2017-03-17Merge "Add vpx_highbd_idct32x32_135_add_neon()"James Zern
2017-03-16Add vpx_highbd_idct32x32_135_add_neon()Linfeng Zhang
BUG=webm:1301 Change-Id: I58c2d65d385080711c3666d6d8f9d241dac7b21a
2017-03-17Merge "Clean vpx_idct32x32_1024_add_neon()"James Zern
2017-03-15Add Hadamard for Power8Rafael de Lucena Valle
Change-Id: I3b4b043c1402b4100653ace4869847e030861b18 Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
2017-03-15Clean vpx_idct32x32_1024_add_neon()Linfeng Zhang
Change-Id: I05921e16d6a3e4e7e5b00a90624735050a186636
2017-03-15Merge "Improve idct32x32_1024_add SSSE3 intrinsics performance"Yi Luo
2017-03-14Fix overflow issue in 32x32 idct NEON intrinsicsLinfeng Zhang
Similar issue as Change bc1c18e. The PartialIDctTest.ResultsMatch test on vpx_idct32x32_135_add_neon() in high bit-depth mode exposes 16-bit overflow in final stage of pass 2, when changing the test number from 1,000 to 1,000,000. Change to use saturating add/sub for vpx_idct32x32_34_add_neon(), vpx_idct32x32_135_add_neon and vpx_idct32x32_1024_add_neon() in high bit-depth mode. Change-Id: Iaec0e9aeab41a3fdb4e170d7e9b3ad1fda922f6f
2017-03-14Improve idct32x32_1024_add SSSE3 intrinsics performanceYi Luo
- Function level speed improves ~12%. Change-Id: I9b7dbddabf08c7d0f6b25264e6074d5ccbe39290
2017-03-13Merge "Add vpx_highbd_idct32x32_135_add_c()"Linfeng Zhang
2017-03-10inv_txfm_ssse3,butterfly: fix win32 abi compatibilityJames Zern
only the first 3 parameters can be aligned to 16 as required by __m128i, make them all pointers for consistency. since: 07c48ccfe Improve idct32x32_34_add SSSE3 intrinsics performance BUG=webm:1384 Change-Id: I0324f701e723a27cb470036a180693ba8829d01d
2017-03-09Improve idct32x32_135_add SSSE3 intrinsics performanceYi Luo
- Split the inv txfm into three parts to avoid stack spillover. - Function level speed improves ~12%. - Use function and macro to remove some repeated code. Change-Id: I14f5f072334fd766808cb52bf648df792e7379ee
2017-03-08Update vpx_idct32x32_1024_add_neon()Linfeng Zhang
Most are cosmetics changes. Speed has no change with clang 3.8, and about 5% faster with gcc 4.8.4 Tried the strategy used in 8x8 and 16x16 (which operations' orders are similar to the C code), though speed gets better with gcc, it's worse with clang. Tried to remove store_in_output(), but speed gets worse. Change-Id: I93c8d284e90836f98962bb23d63a454cd40f776e
2017-03-08Add vpx_highbd_idct32x32_135_add_c()Linfeng Zhang
When eob is less than or equal to 135 for high-bitdepth 32x32 idct, call this function. BUG=webm:1301 Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6
2017-03-08cosmetics,dsp/arm/: vpx_idct32x32_{34,135}_add_neon()Linfeng Zhang
No speed changes and disassembly is almost identical. Change-Id: Id07996237d2607ca6004da5906b7d288b8307e1f
2017-03-08cosmetics,dsp/arm/: rename a variableLinfeng Zhang
Rename cospi_6_26_14_18N to cospi_6_26N_14_18N for consistency. Change-Id: I00498b43bb612b368219a489b3adaa41729bf31a
2017-03-01Improve idct32x32_34_add SSSE3 intrinsics performanceYi Luo
- Split the transform into first half and second half. - Reschedule the instructions to avoid stack spillover. - Function level speed improves ~16%. Change-Id: I166889840d23aa8a273eca00f6fbdae8b4566f35
2017-02-24get_prob(): rationalize int typesJames Zern
promote the unsigned int calculation to uint64_t rather than int64_t for type consistency Change-Id: Ic34dee1dc707d9faf6a3ae250bfe39b60bef3438
2017-02-22Merge "Fix segmentation fault caused by denoiser working with spatial SVC."Jerome Jiang
2017-02-21Following SSSE3 intrinsics functions also work for HBDYi Luo
- vpx_idct8x8_12_add_ssse3 vpx_idct8x8_64_add_ssse3 vpx_idct32x32_34_add_ssse3 vpx_idct32x32_135_add_ssse3 vpx_idct32x32_1024_add_ssse3 - turn on unit tests. Change-Id: I788b2b3b2074a6f3ab6a0e6f469c1327a123eff7
2017-02-21Fix segmentation fault caused by denoiser working with spatial SVC.Jerome Jiang
Re-enable the affected test. BUG=webm:1374 Change-Id: I98cd49403927123546d1d0056660b98c9cb8babb
2017-02-17Fix idct8x8 SSSE3 SingleExtremeCoeff unit testsYi Luo
- In SSSE3 optimization, 16-bit addition and subtraction would overflow when input coefficient is 16-bit signed extreme values. - Function-level speed becomes slower (unit ms): idct8x8_64: 284 -> 294 idct8x8_12: 145 -> 158. BUG=webm:1332 Change-Id: I1e4bf9d30a6d4112b8cac5823729565bf145e40b
2017-02-17Merge "Add vpx_highbd_idct16x16_10_add_neon()"James Zern
2017-02-16Replace idct32x32_1024_add_ssse3 assembly with intrinsicsYi Luo
- Encoding/decoding test, BQTerrace_1920x1080_60.y4m, on i7-6700, no obvious user-level speed performance downgrade. - Passed unit tests. Change-Id: I20688e0dd3731021ec8fb4404734336f1a426bfc
2017-02-16Merge "block error avx2: use tran_low_t"Johann Koenig
2017-02-16Add vpx_highbd_idct16x16_10_add_neon()Linfeng Zhang
BUG=webm:1301 Change-Id: If686c8144764c4162458f0bc4bb1bbf6555c48ab
2017-02-16Merge "Fix mips vpx_post_proc_down_and_across_mb_row_msa function"James Zern
2017-02-16Merge "correct bitdepth_conversion_sse2.h header guard"Johann Koenig
2017-02-16correct bitdepth_conversion_sse2.h header guardJohann
Change-Id: Ic4ffd861608e67fe59bcb3a86010ce3ef11a5519
2017-02-16Merge "Add idct32x32_135_add SSSE3 intrinsics"Yi Luo
2017-02-16block error avx2: use tran_low_tJohann
Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c