0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400
0 50 100 150 200 250 300 350 400
87.5405k 87.5410k 87.5415k 87.5420k
87.5425k 87.5430k 87.5435k 87.5440k
Frames per sec Latency (msec) Gstreamer change
Base 20 45
Resized 50 20 Videoscale video/x-raw, width=800, height=600
Queue 60 400 Queue
Size= 1 100 35 Queue max-size-buffers= 1
interface, the algorithm can be easily tuned and
debugged, showing any intermediate results in
a numerical and graphical way.
Once the correct vision algorithm has been developed, it is important to have an implementation that can run at the desired throughput.
It is optimized to run effciently on the embedded AMD R series SoC–specifcally to ensure
that optimum use is made of the four Excavator x86 CPU cores on the device.
In this case, the implementation was done
using Gstreamer ( https://gstreamer.freedesktop.org)–a multimedia framework that links
together a variety of custom built or off-the-shelf media processing elements called plug-ins into a software pipeline. Hence, the frst step
in the analysis process involved converting the
image processing software into a GStreamer
plug-in that was labeled “Euronote”.
Using Gstreamer, the off-the-shelf Video-
4Linux plug-in was used to fetch images into
the pipeline from the 2MPixel camera and
set the frame rate at 100fps. Then, another
Gstreamer plug in was used to send the images
to the Euronote plug in, after which it could
be scaled to 1024 x 768 into a I420 format
( http://bit.ly/VSD-I420) where the
images are represented in the YUV.
Once the application was written, Mentor’s source analyzer was
employed to determine the speed
of execution. To do so, the code was
copied into the Mentor Embedded
Sourcery Codebench from where
the binary code could be transferred to the target AMD processor
where it was executed. The
Sourcery Analyzer then graphically highlights CPU statistics that
show the time each core spent in a
given state, scheduling statistics that highlight which soft ware threads ran on each core
and the thread migration rate, or how frequently the system scheduler moved software
threads between cores.
The Sourcery Analyzer enables
a throughput analysis to be performed on the Euronote Gstreamer application. If the system is not
meeting the image capture rate
demanded by the application, it is
possible to access the Euronote software program from the Sourcery
However, if scaling the image size still
does not effectively enhance the speed of the
system, it is possible to enhance performance
further by ensuring that all of the four ALUs on
the embedded processor are utilized in full. By
using the scaling tracer on the Mentor Embedded Sourcery Analyzer, the number of ALUs
on the SoC used can be determined.
If the processing power is under used, it
is possible to optimize the execution of the
program so that multiple threads, or components of the Euronote application, can be
run in parallel. To do so, the Gstreamer code
can again be accessed from the analyzer and
Figure 6: The frst stage of image processing involves locating the notes in each of the images. To do, so
a Canny flter is applied to detect the edges of the notes. With the edge detection performed, a flling
algorithm is applied to fll out all closed regions in the image, after which the contours of the notes can be
detected to determine the specifc location of the notes within the image.
Figure 7: Once the position of a note is determined, an
ROI is located in the image where the location of features
of a legitimate bank note would be invisible when the
note is illuminated by the IR light source.
Figure 8: By using the scaling tracer on the Mentor Embedded Sourcery Analyzer, the number of
ALUs on the SoC used can be determined. If the processing power is underutilized, the execution of the program can be optimized so that multiple threads, or components of the Euronote
application, can be scheduled to run in parallel.
Figure 9: Prior to optimization, the system was capable of capturing images at a rate of 20fps,
while the latency between image acquisition and processed output was 45ms. Having resized the
image data, image capture rate was increased to 50fps with 20ms latency. By introducing a queue
to the process, system throughput was increased to 60fps with a 400ms latency. Finally, by limiting
the maximum size of the buffer, the system captured images at 100fps with 35ms latency.