Age | Commit message (Collapse) | Author |
|
- Split the transform into first half and second half.
- Reschedule the instructions to avoid stack spillover.
- Function level speed improves ~16%.
Change-Id: I166889840d23aa8a273eca00f6fbdae8b4566f35
|
|
promote the unsigned int calculation to uint64_t rather than int64_t for
type consistency
Change-Id: Ic34dee1dc707d9faf6a3ae250bfe39b60bef3438
|
|
|
|
- vpx_idct8x8_12_add_ssse3
vpx_idct8x8_64_add_ssse3
vpx_idct32x32_34_add_ssse3
vpx_idct32x32_135_add_ssse3
vpx_idct32x32_1024_add_ssse3
- turn on unit tests.
Change-Id: I788b2b3b2074a6f3ab6a0e6f469c1327a123eff7
|
|
Re-enable the affected test.
BUG=webm:1374
Change-Id: I98cd49403927123546d1d0056660b98c9cb8babb
|
|
- In SSSE3 optimization, 16-bit addition and subtraction would
overflow when input coefficient is 16-bit signed extreme values.
- Function-level speed becomes slower (unit ms):
idct8x8_64: 284 -> 294
idct8x8_12: 145 -> 158.
BUG=webm:1332
Change-Id: I1e4bf9d30a6d4112b8cac5823729565bf145e40b
|
|
|
|
- Encoding/decoding test, BQTerrace_1920x1080_60.y4m, on
i7-6700, no obvious user-level speed performance downgrade.
- Passed unit tests.
Change-Id: I20688e0dd3731021ec8fb4404734336f1a426bfc
|
|
|
|
BUG=webm:1301
Change-Id: If686c8144764c4162458f0bc4bb1bbf6555c48ab
|
|
|
|
|
|
Change-Id: Ic4ffd861608e67fe59bcb3a86010ce3ef11a5519
|
|
|
|
Change-Id: Ic5f3a1f569d6f82afeaf4fcd7235374bb460db3c
|
|
- Replace the corresponding assembly code.
- No user level speed performance degrade.
- Unit tests passed.
Change-Id: Idd0c5a4bad4976f1617c34100cb46e75e3b961e5
|
|
Change-Id: Iebade0efc0efbb0a80a0f3adbef4962e3a2f25e8
|
|
The previous implementation confused bit/bytes/elements. It was using
'32' as the multiplier but that was mistakenly adopted because a 32x32
transform embedded the stride.
Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a
|
|
Added fix to handle non-multiple of 16 cols case for size 16
Change-Id: If3a6d772d112077c5e0a9be9e612e1148f04338c
|
|
|
|
Change-Id: Ie0f7689ebe230c68eadb22a32b14838c1a7543a6
|
|
BUG=webm:1301
Change-Id: Ic6cd8c1e63e1b7a997cbed221e20fff4c599e0fe
|
|
When eob is less than or equal to 38 for high-bitdepth 16x16 idct,
call this function.
BUG=webm:1301
Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060
|
|
This matches bitdepth_conversion_sse2.asm and produces substantially
better assembly. The old way had lots of 'movzwl' and 'shl' and storing
back to memory before loading into an xmm register.
Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b
|
|
Change-Id: I2a39a3bb87516b04d273bc1c0f4a634e3fb6f0f6
|
|
Change-Id: I75e4a9e0b37bd4586f26c8d6c1fa27f3f6ff1bce
|
|
|
|
- No user-level speed performance change.
- Pass unit tests.
Change-Id: Idfc598e00f354265e41f6b3219f4734216c115c6
|
|
|
|
BUG=webm:1301
Change-Id: I6bb755552a39bdd26eef3f449601f6a9766c65ec
|
|
Change-Id: I100c4a1955d80bec4d28e82796b3e7f57e84d0ba
|
|
and update vpx_highbd_idct8x8_1_add_neon()
BUG=webm:1301
Change-Id: I18d1a0cbe98ba822d5194c1b4e13a4c29c5c75f4
|
|
|
|
The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of
pass 2. Change to use saturating add/sub for both
vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high
bitdepth.
Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712
|
|
- Performance achieves the same as assembly.
- Unit tests pass.
Change-Id: I6eacfbbd826b3946c724d78fbef7948af6406ccd
|
|
When eob is less than or equal to 38 for 16x16 idct, call this function.
Change-Id: Ief6f3fb16a49ace3c92cebf4e220bf5bf52a6087
|
|
|
|
Create new helper files specifically for converting tran_low_t types.
Change-Id: I7c4c458ef910f3b3d10a3cfbf9df4de7682fd905
|
|
|
|
|
|
|
|
* changes:
satd highbd neon: use tran_low_t for coeff
satd highbd sse2: use tran_low_t for coeff
|
|
Remove redundant memory accesses.
Change-Id: I8049074bdba5f49eab7e735b2b377423a69cd4c8
|
|
The intrinsic version reduces the average cycles from 183 to 175.
Change-Id: I7c1bcdb0a830266e93d8347aed38120fb3be0e03
|
|
* changes:
hadamard highbd ssse3: use tran_low_t for coeff
hadamard highbd neon: use tran_low_t for coeff
hadamard highbd sse2: use tran_low_t for coeff
|
|
|
|
BUG=webm:1365
Change-Id: I43521ad32b6c96737a8ef2b8c327f901fd7eaf84
|
|
BUG=webm:1365
Change-Id: I013659f6b9fbf9cc52ab840eae520fe0b5f883fb
|
|
BUG=webm:1365
Change-Id: I374dfc08732932382043905f128e928b08cb4f57
|
|
BUG=webm:1365
Change-Id: I7e15192ead3a3631755b386f102c979f06e26279
|