Age | Commit message (Collapse) | Author |
|
Change-Id: I74cf028e8c732cd0dbc070326152d3085b824a80
|
|
|
|
Change-Id: I76c2720546b737cb63018a8ab6a3ff62a291786d
|
|
|
|
VS2013 Chromium builds failed with:
warning C4742: 'vp9_coefband_trans_8x8plus' has different alignment in
https://code.google.com/p/chromium/issues/detail?id=336620
Change-Id: I865f72bc23ae958531eeb5f497002c12e9a36fcd
|
|
Encoder's boarder is still 160, while decoder's boarder will be 32.
With on demand and separate boarder buffer for boarder extension.
The decoder's boarder does not need to to 160 anymore.
Change-Id: I93d5aaff15a33a2213e9761eaa37c5f2870747db
|
|
This commit deprecates the use of best_mv from encoding and bit-stream
writing stages. It hence removes the definition from MACROBLOCKD.
Change-Id: I8e5302775a2aa4a18900726df407bff881f2dfb1
|
|
boarder."
|
|
reference buffer is out of boarder.
Change-Id: Ic7ad136e54a4d68abe0fd4345146a86b0ba824e1
|
|
Change-Id: I660b53da8ebf3049832ce8a10721051c4e0ebb00
|
|
Change-Id: I3149e562fe9500914f67b6f908283edcdc381ac6
|
|
This reverts commit f9404f240642222775a371acde8fc0721b3812df.
This patch caused some ASAN error.
Change-Id: If15b7e581310e19061d111c69f2931809662ed19
|
|
|
|
|
|
|
|
|
|
Change-Id: I10c423bde7ea5a3bac9f14f35c73b6bc31c8f3e3
|
|
Rearranges the END_USAGE typedef to make it compatible with the
vpx user input.
Change-Id: Ic9fa9e9edbee7c0ad01e12e685b219582fcecd16
|
|
Change-Id: I83031180723ee59270ec8fb66b2f73c0796bee25
|
|
Change-Id: I7e53f6345a4cf89309262f50850c9ad08ed3c527
|
|
This reverts commit b645257121da20b422dbbebf02aae0fc6dff95d4.
Change-Id: I60d1bf57ae8e9eb6127f42f2d5a780124ac51b45
|
|
|
|
Change-Id: I1f0ae2edc3a96b33c0494d165ae756a8feba6184
|
|
This reverts commit 511d218c60b9b6c1ab9383db746815e907af0359.
In current form intrinsics break borg build.
Change-Id: Ied37936af841250ecff449802e69a3d3761c91b9
|
|
|
|
|
|
|
|
This commit further optimizes SSE2 operations in the second 1-D
inverse 16x16 DCT, with (<10) non-zero coefficients. The average
runtime of this module goes down from 779 cycles -> 725 cycles.
Change-Id: Iac31b123640d9b1e8f906e770702936b71f0ba7f
|
|
|
|
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization done only for 64bit.
Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
|
|
|
|
This commit is the first patch optimizing SSE2 implementation of inverse
16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row)
transformation. It exploits the fact that only top-left 4x4 block contains
non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients.
The average runtime of idct16x16_10 unit is reduced from
883 cycles -> 779 cycles (12% faster).
For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes
down from 310651 ms -> 305910 ms. The decoding speed goes up from
80.37 fps -> 80.87 fps.
Change-Id: Ic6f3ac5a637a76c07ba73ddaafe318a699fea645
|
|
|
|
|
|
Change-Id: I6cdd670d66288dbd66228f38bba6b30502d25362
|
|
Change-Id: I54513dc3b3321e0c0bb6b15ea5c34085ed80b4a4
|
|
For systems without __builtin_clz() or _BitScanReverse(), taken from libwep
Change-Id: Iead257efc1772c466c79e1dc0356ed571d38d43e
|
|
More intra optimizations will be added.
Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8
|
|
|
|
Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32,
vp9_variance64x64, vp9_variance32x16, vp9_variance64x32,
vp9_mse16x16 by migrating to AVX2
some of the functions were optimized by processing 32 elements instead of 16.
some of the functions were optimized by processing 2 loop strides of 16
elements in a single 256 bit register
This optimization gives between 2.4% - 2.7% user level performance gain
and 42% function level gain.
Change-Id: I265ae08a2b0196057a224a86450153ef3aebd85d
|
|
Change-Id: I44eb44eb3f36c05d916ef140ef42cc84f72f99ec
|
|
|
|
|
|
|
|
This commit adds input/output ports for IDCT8_1D macro function to
provide more flexibility in variable use. It allows to skip several
buffer swap operations.
Change-Id: I21f3450509537322293043b3281bfd3949868677
|
|
Adding RefBuffer to simplify reference buffer management. The struct has a
pointer to image data and scale factors relative to the current frame.
Change-Id: If38eb1491ff687cc11428aee339f3e052e2c5d9e
|
|
This commit merges the initial buffer swap operations in idct8_1d_sse2
into the array transpose step, hence reducing number of instructions
therein.
Change-Id: I219f6f50813390d2ec3ee37eecf2a4a2b44ae479
|
|
This commit optimizes the SSE2 implmentation of idct8x8_10. It exploits
the fact that only top-left 4x4 block contains non-zero coefficients,
and hence reduces the instructions needed.
The runtime of idct8x8_10_sse2 goes down from 216 to 198 CPU cycles,
estimated by averaging over 100000 runs. For pedestrian_area_1080p 300
frames coded at 4000kbps, the average decoding speed goes up from
79.3 fps to 79.7 fps.
Change-Id: I6d277bbaa3ec9e1562667906975bae06904cb180
|
|
|
|
|