Loadimage RGBYCC Downsample (optional) DCT Quantize Zig-zag Entropy coding
Zig-Zag ordering is applied to each of the Y, U
and V components separately and two different quantization tables are used for both the
Y and UV components. This information is
then variable-length encoded using Huffman
encoding. Again, two different encoding tables
are applied to both the Y and UV components.
Once again this can be implemented on the
GPU ( http://bit.ly/VSD-VLE).
By loading both the JPEG Luma and
Chroma Huffman encoding lookup tables
(LUTs) and Zig-Zag mapping LUTs to the
GPU both Huffman and Zig-Zag encoding
can be performed in a single step on the GPU.
By default, the JPEG compression standard
defines the size of a Minimum Coding Unit
(MCU) as being an 8 x 8 coefficient. By assigning these 64 coefficients to a single shared
memory bank, up to 32 MCUs can be processed simultaneously.
To compute Zig-Zag ordering mapping
table and run-length encoding (RLE) classification, 32 threads are used to process the
32 banks and then Huffman encoding is
performed using the standard JPEG Huffman encoding lookup table. This process is
repeated for each MCU block to process the
Y, U and V image component planes.
Encoded data is then written to the JPEG
File Interchange Format (JFIF), a format that
includes the embedded encoded image data
and coding and quantization tables. This packing step is not a trivial task to implement on
a CPU and is typically a sequential process.
Parallelizing this function on a GPU
requires a different approach. Initially, the
first bit position of each MCU in the final
JPEG buffer is calculated using a paral-
lelized prefix sum algorithm ( http://bit.ly/
VSD-PFX). Knowing the offset position, each
MCU can then be parallel pack data using
the stream processors of the GPU.
After such packing, the data is then moved
to the host memory for either archiving or
streaming. Using the data in this JFIF image
file, JPEG decoders can reconstruct the compressed image data.
Deciding on which GPU to use to implement baseline JPEG algorithms is important
as compression speeds will depend on the
number of multiprocessors, amount of shared
memory size and number of register (Figure
5). For instance, adding more CUDA cores
without increasing shared memory size or
the number of registers will not increase performance. After image compression, images
can be copied to the CPU’s host memory or
archiving or data streaming. Although the
GPU-to-host memory copy time is significant, compressed image data will generally
be 10-20x smaller than the original uncompressed image.
As an example, the StreamPix 7 multiple
camera DVR software from NorPix (Mon-
tréal, QC, Canada; www.norpix.com), opti-
mized for NVIDIA CUDA graphics processors
can JPEG compress 3. 3 billion monochrome
pixels or 2 billion color pixels per second using
an NVIDIA GTX1080. Using the software,
capturing and compressing 12 MPixel (4096
x 3072 8-bit raw Bayer images) at 40fps from
four CoaXPress (CXP) cameras simultane-
ously can be achieved using baseline JPEG
image compression, preserving 75% image
quality and allowing a 10-15x increase in
image data storage compared to methods that
record raw uncompressed image data.
While NorPix’s StreamPix 7 multiple camera
DVR software uses an implementation of the
baseline JPEG standard, references to code
samples used in this article are for illustration
purposes only. NorPix nor its affiliated companies or customers are responsible for the
development or use of this code.
1.“CUDA Host/Device Transfers and Data
Movement,” by Justin McKennon for Micro-way (Plymouth, MA, USA; www.microway.
com). Reference: http://bit.ly/VSD-CUDA.
2. “Bryce Bayer’s notebook describing his RGB
Figure 4: In the implementation of the baseline JPEG standard using color cameras, Bayer interpolation is first used to render the image in RGB
space. Images are then transformed from RGB to YUV color space, DCT applied followed by quantization, zig-zag re-ordering and run-length
encoding. The resultant data is then written to a JFIF format that includes the embedded encoded image data and coding and quantization tables.
The number of pixels per second is defined as: Image size X Image size Y frame rate per second.
Companies and organizations mentioned
Woburn, MA, USA
Rochester, NY, USA
European Machine Vision Association (EMVA)
Joint Photographic Experts
Montréal, QC, Canada
Santa Clara, CA, USA