libvpx.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Yunqing Wang <yunqingwang@google.com>	2011-06-28 09:14:13 -0400
committer	Yunqing Wang <yunqingwang@google.com>	2011-07-22 09:28:06 -0400
commit	20bd1446c0c3ee82c6be9102ed911477639908c5 (patch)
tree	93b6abf071dc9766c5d3e0d38fc232b7103a5f4e /vp8/common/invtrans.h
parent	b5ea2fbc2c1554769848774c836aad262af95072 (diff)
download	libvpx-20bd1446c0c3ee82c6be9102ed911477639908c5.tar libvpx-20bd1446c0c3ee82c6be9102ed911477639908c5.tar.gz libvpx-20bd1446c0c3ee82c6be9102ed911477639908c5.tar.bz2 libvpx-20bd1446c0c3ee82c6be9102ed911477639908c5.zip

Preload reference area to an intermediate buffer in sub-pixel motion search

In sub-pixel motion search, the search range is small(+/- 3 pixels). Preload whole search area from reference buffer into a 32-byte aligned buffer. Then in search, load reference data from this buffer instead. This keeps data in cache, and reduces the crossing cache- line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux) showed encoder speed improvement: 3.4% at --rt --cpu-used =-4 2.8% at --rt --cpu-used =-3 2.3% at --rt --cpu-used =-2 2.2% at --rt --cpu-used =-1 Test on Atom notebook showed only 1.1% speed improvement(speed=-4). Test on Xeon machine also showed less improvement, since unaligned data access latency is greatly reduced in newer cores. Next, I will apply similar idea to other 2 sub-pixel search functions for encoding speed > 4. Make this change exclusively for x86 platforms. Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f

Diffstat (limited to 'vp8/common/invtrans.h')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: