Age | Commit message (Collapse) | Author |
|
This reverts commit f1342a7b070ef61b9fbdf03e899ac2107cfcb6bd.
This breaks 32-bit builds:
runtime error: load of misaligned address 0xf72fdd48 for type 'const
__m128i' (vector of 2 'long long' values), which requires 16 byte
alignment
+ _mm_set1_epi64x is incompatible with some versions of visual studio
Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
|
|
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
|
|
A new version of vp9_highbd_error_8bit is now available which is
optimized with AVX assembly. AVX itself does not buy us too much, but
the non-destructive 3 operand format encoding of the 128bit SSEn integer
instructions helps to eliminate move instructions. The Sandy Bridge
micro-architecture cannot eliminate move instructions in the processor
front end, so AVX will help on these machines.
Further 2 optimizations are applied:
1. The common case of computing block error on 4x4 blocks is optimized
as a special case.
2. All arithmetic is speculatively done on 32 bits only. At the end of
the loop, the code detects if overflow might have happened and if so,
the whole computation is re-executed using higher precision arithmetic.
This case however is extremely rare in real use, so we can achieve a
large net gain here.
The optimizations rely on the fact that the coefficients are in the
range [-(2^15-1), 2^15-1], and that the quantized coefficients always
have the same sign as the input coefficients (in the worst case they are
0). These are the same assumptions that the old SSE2 assembly code for
the non high bitdepth configuration relied on. The unit tests have been
updated to take this constraint into consideration when generating test
input data.
Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
|
|
If high bit depth configuration is enabled, but encoding in profile 0,
the code now falls back on optimized SSE2 assembler to compute the
block errors, similar to when high bit depth is not enabled.
Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
|
|
Rename updated version of x86inc.asm
Use "private_prefix" instead of "program_name" and make vpx the default
prefix.
Change-Id: I4883a99b2aee8e5dc9f2c16a2e6f4b5d6e4de458
|
|
Change-Id: I20c7b42631b579fade6cf7ebf6d4c69b2fcb5e5e
|
|
This commit moves the module inverse transform functions from vp9
to vpx_dsp folder. The hybrid transform wrapper functions stay in
the vp9 folder, since it involves codec-specific data structures.
Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
|
|
Clean up the forward 2D-DCT function names in vpx_dsp.
Change-Id: I3117978596d198b690036e7eb05fe429caf3bc25
|
|
This completes the forward transform functions layout refactoring.
Change-Id: I996fb0fb795f41e2040f7b21db985774098aedbd
|
|
Move the 32x32 2D-DCT implementations from vp9/ to vpx_dsp/.
Change-Id: Id3980696f8b69906ff7a59ff9fb2b9013d60047d
|
|
Change-Id: Iba03852ce778c956200818e3473cfb2b48cf8d8e
|
|
This commit replaces vp9_idct.h with txfm_common.h in many SIMD
implementation files for precise file dependency.
Change-Id: If73dd726bb16537e7494f28538b0a169810f9756
|
|
Separate the common coefficient constant into vpx_dsp/txfm_common.h.
Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h.
This clears the use case of vp9_idct.h in vpx_dsp folder.
Change-Id: I319735a2abf42888e5080ac14cfbcde34be7b121
|
|
Change-Id: I283d364a4e65ca9bf6ff581da1d0b498433c5402
|
|
This commit factors the 4x4, 8x8, and 16x16 2D-DCT forward
transform operations into vpx_dsp folder.
Change-Id: I084b117b79c0925edcbcabb93f62b9f4bf8dbe7d
|
|
Remove redundant file dependency.
Change-Id: I4708218157617dabe00e2e33e237be2838c16603
|
|
The SSE2 version high bit-depth forward hybrid transforms are
essentially using the C functions via cross referencing to 1-D
functions in vp9_dct.c. This commit unifies the two versions and
removes the unnecessary dependency.
Change-Id: Ib4d0702a138f8daf7d0bd97c141ee7088f293765
|
|
The following quantization functions were moved:
vp9_quantize_b
vp9_quantize_b_32x32
vp9_highbd_quantize_b
vp9_highbd_quantize_b_32x32
vp9_quantize_dc
vp9_quantize_dc_32x32
vp9_highbd_quantize_dc
vp9_highbd_quantize_dc_32x32
The purpose of doing that was to allow these functions to be shared
by multiple codecs.
Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
|
|
The clamp calls with INT32_MIN and INT32_MAX have no effect at all on
int values passed in, therefore this commit removes those effectless
clamps and also adds more const intermediate results to make the code
more readable.
Change-Id: I66d8811f58bb74ec31cbec9a6c441983a662352e
|
|
Change-Id: I1bab0c104df2ec4825d050cd516e26ab635a7b3e
|
|
Change-Id: I66bf6720c396c89aa2d1fd26d5d52bf5d5e3dff1
|
|
Factor out the subtraction operator as common function.
Change-Id: I526e703477c6a290e0e3e3c8898f8bb1ca82779b
|
|
This commit fixes a potential integer overflow issue in function
hadamard_16x16. It adds corresponding dynamic range comment.
Change-Id: Iec22f3be345fb920ec79178e016378e2f65b20be
|
|
The only difference between the two was that the vp9 function allowed
for every step in the bilinear filter (16 steps) while vp8 only allowed
for half of those. Since all the call sites in vp9 (<< 1) the input, it
only ever used the same steps as vp8.
This will allow moving the subpel variance to vpx_dsp with the rest of
the variance functions.
Change-Id: I6fa2509350a2dc610c46b3e15bde98a15a084b75
|
|
subpel functions will be moved in another patch.
Change-Id: Idb2e049bad0b9b32ac42cc7731cd6903de2826ce
|
|
this file shouldn't be built directly, it is included in vp9_dct_sse2.c
to create a non-high-bitdepth and a high-bitdepth version
silences missing prototype warnings for the unused FDCT* functions
Change-Id: Ide6ff8c24ab31bdb0f833260505ae33660a1ad5b
|
|
this file shouldn't be built directly, it is included in vp9_dct_sse2.c
to create a non-high-bitdepth and a high-bitdepth version
silences missing prototype warnings for the unused FDCT32x32* functions
Change-Id: I0e38f16dae5ea1728de184ee2c89287d48675c51
|
|
this file shouldn't be built directly, it is included in vp9_dct_avx2.c
to create a non-high-bitdepth and a high-bitdepth version
silences missing prototype warnings for the unused FDCT32x32* functions
Change-Id: I4c19935c0e035b393be513bde735e9a78064a494
|
|
silences a missing declaration warning
Change-Id: I59a34e1a1377cf3529b678d7ec0122bd43ab1bf1
|
|
+ include vp9_rtcd.h
silences missing prototype warnings
Change-Id: I77902f07a454029baad4fe5fe6fc37c65644e6f7
|
|
silences missing prototype warnings
Change-Id: I773b6a6b5bd7c57db18c3b17c519534f80e131de
|
|
With the sad functions, and hopefully the variance functions soon,
moving to the vpx_dsp location, place the defines used in the
reference C code in a common location.
Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
|
|
this macro was used inconsistently and only differs in behavior from
DECLARE_ALIGNED when an alignment attribute is unavailable. this macro
is used with calls to assembly, while generic c-code doesn't rely on it,
so in a c-only build without an alignment attribute the code will
function as expected.
Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
|
|
Create a new component, vpx_dsp, for code that can be shared
between codecs. Move the SAD code into the component.
This reduces the size of vpxenc/dec by 36k on x86_64 builds.
Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
|
|
vestigial. replace instances with memset() which they already were being
defined to.
Change-Id: Ie030cfaaa3e890dd92cf1a995fcb1927ba175201
|
|
vestigial. replace instances with memcpy() which they already were being
defined to.
Change-Id: Icfd1b0bc5d95b70efab91b9ae777ace1e81d2d7c
|
|
This reverts commit 004b9d83e37d355f590a6976a27b7b845d19a869
Change-Id: I2f2d0bdb9368c2c07f1d29a69cd461267a3a8743
|
|
This reverts commit eb8c667570aa83134c7db0690de9dbdde4d90291.
The patch caused mismatch while using multi-threads.
Change-Id: Icd646340af25b5d91e32f03ed3ea212e00e3e0be
|
|
Force split on 16x16 block (to 8x8) based on the minmax over the 8x8 sub-blocks.
Also increase variance threshold for 32x32, and add exit condiiton in choose_partition
(with very safe threshold) based on sad used to select reference frame.
Some visual improvement near moving boundaries.
Average gain in psnr/ssim: ~0.6%, some clips go up ~1 or 2%.
Encoding time increase (due to more 8x8 blocks) from ~1-4%, depending on clip.
Change-Id: I4759bb181251ac41517cd45e326ce2997dadb577
|
|
|
|
It uses about 10% less CPU cycles than the SSE2 intrinsic
implementation.
Change-Id: I91017c0c068679a214b98cdd4cff3a6facfb7499
|
|
|
|
|
|
Change-Id: If0ca8b25b4800d4336e6cbc97194cd9b01c5b5a3
|
|
|
|
Use 6 xmms instead of 8.
Change-Id: If976ad85d09191d2fb0565399d690f2869dbbcc7
|
|
This commit separates Hadamard transform/quantization operations
from rate and distortion computation in block_yrd. This allows one
to skip SATD computation when all transform blocks are quantized
to zero. It also uses a new block error function that skips
repeated computation of sum of squared residuals. It reduces the
CPU cycles spent on block error calculation in block_yrd by 40%.
Change-Id: I726acb2454b44af1c3bd95385abecac209959b10
|
|
This commit allows the quantizer to compare the AC coefficients to
the quantization step size to determine if further multiplication
operations are needed. It makes the quantization process 20% faster
without coding statistics change.
Change-Id: I735aaf6a9c0874c82175bb565b20e131464db64a
|
|
This reduces the 8x8 Hadamard transform cycles by 20%.
Change-Id: If34c5e02f3afa42244c6efabe121f7cf5d2df41b
|
|
This commit fixes the SSE2 version 8x8 Hadamard transform
alignment and makes it consistent with the C version.
Change-Id: I1304e5f97e0e5ef2d798fe38081609c39f5bfe74
|