During the recent Consumer
Electronics Show, I talked to many chip and software vendors whose designs
are being used in some of the feature-packed, multimedia-laden mobile and
embedded devices introduced there. Listening to them, I was constantly reminded
of a scene from
Steve Martin’s LA Story.
Martin, playing a Los Angeles TV weatherman, gets in his car in the morning
to go to work. But instead of getting on the freeway system - clogged to a
standstill with traffic beyond its capacity - Steve travels to work by way of an
elaborate system of short-cuts: through a neighbor’s yard, down an alley, across
an empty lot, through a car wash, zigzagging through a parking lot, and so on.
It was that or face driving on a freeway system that was designed fifty years
ago and unable to handle today’s volume of traffic. The scene was hilarious
because it was an accurate reflection of the real world, and Martin’s solution
wasn’t that much of an exaggeration of the extremes to which L.A. drivers
sometimes resort.
Embedded SoC traffic jams
In many embedded applications in mobile devices and portable electronics
systems, developers and builders of the silicon are in the same situation.
Builders of MP2 players, video recorders, mobile TV devices and all-in-one
mobile phones with video capability are driven by the need to deliver multimedia
content over high-bandwidth wired and wireless Internet connections at higher
and higher data rates. But they are having problems with outdated shared bus
architectures that simply can’t handle the increased traffic loads.
The use of multi-core CPUs in such designs only partially addresses such
problems, and in other ways exacerbates them, because to move the data around
the chip it has been necessary to depend on a shared-bus “freeway” system that
is decades old and inadequate for present and future needs.
Of course, there are new freeway systems, such as networks-on-chip and
on-chip point-to-point, packet-based, serial switched fabric linkages, similar
in concept to
Infiniband,
PCI Express and
RapidIO at the board-to-board and
system to system level. Many of these chip-level alternatives and the problems
they raise are described in an excellent recent book “Networks
On Chips,” by Giovanni De Micheli and Luca Benini.
There are at least two problems I see with most such topologies for the
vendors of the devices that use these multimedia-optimized SoCs. First, there
are so many of them. How do you make a choice? How do you assess their
compatibility with existing “freeway” designs? Second, there are the numerous
software development issues. These are also covered extensively in the De
Micheli/Benni book.
After reading in their book about all the complicated software problems
ahead, I have come to the conclusion that even if we agree on a common nextgen
freeway system for on-chip traffic, the software problems alone will prevent its
widespread adoption for many years. Consider the amount of time it is taking for
the industry to develop a common set of standards for multicore software
development. So far I hear a lot of talk, with minimal action taken.
Making do with work-arounds
It should come as no surprise that, faced with such challenges, not a few
current licensees of core processors - including those from
ARM, MIPS,
Power, and
PowerPC - are taking a page
from the script for Steve Martin’s movie. They’re making do with what they have,
using current shared bus topologies where appropriate, replacing them where they
can, or finding work-arounds and shortcuts that get around the traffic jams when
they can’t.
For example, most recently, Atmel’s
ARM926EJ-S-based microcontroller - designed for what it calls human
interface applications with loads of graphics, audio and video - takes the
work-around approach to the extreme to eliminate the data traffic bottlenecks
that often occur on the ARM architecture’s
traditional AMBA
bus to achieve on-chip data transfer rates of up to 41.6 Gbps.
No less innovative in its work-around strategies is Digi with the bus
workaround it uses in its
Netsilicon
NS9360 deployed in the several dedicated I/O devices it has built for
cellular gateways, WiFi device servers, and Wireless Video appliances. Similar
to the approach taken by Atmel, they stick with the existing AMBA
AHB shared bus
topology, but greatly modify the peripheral DMA structure. It even
incorporates mechanisms that enable the developer to modify specific registers
to allow direct control in software over however much bandwidth is allocated.
They are not alone. For example,
Faraday Technology has
opted for a QoS-aware non-blocking crossbar switch to get the intra-chip data
flow bandwidth it needed, as well as a smart DMA engine of its own design.
PortalPlayer also uses a crossbar switch
of its own design as an alternative to AMBA, and
NXP uses a modified bus architecture,
retaining AMBA for deterministic control and processing tasks and adding an
additional data flow optimized bus of its own design that handles media rich
operations. Other companies have opted for the approach that
Cirrus Logic has taken. Direct and
simple, it just puts two AMBA buses on the chip and separates data flows such
that each processing element gets as much bandwidth as possible.
Others, such as Texas Instruments with its
OMAP, NXP with the
Nexperia and
Toshiba et. al. in
the Cell
architecture have opted for a shared memory approach, on top of which they
layer various message-passing mechanisms, based as much as possible on existing
standards, such as Open MPI.
There are still a lot of questions that occur to me as the industry makes the
shift to this new architectural paradigm. How long can such workarounds be
effective? Are there any commonalities between the various new NoC bus
topologies that a developer can look to, to at least minimize the cost of
converting from the existing shared bus methods?
Do any of the new NoC alternatives incorporate features that make this
translation easier? Can the software solutions being considered to solve various
programming and debug issues with current homogeneous symmetric multi-cores be
extended to operate effectively in this much more complex, heterogeneous and
asymmetric multicore environment that NoCs represent?
What do you think? What approaches are you pursuing now? And in the future?
What is the best way to make the transition? The Steve Martin approach will only
work for so long.
Bernard Cole is editor in chief and
site leader for
iApplianceweb and site editor on Embedded.com as well as an
independent editorial services consultant working with high technology
companies. He welcomes your feedback. Call him at 602-288-7257 or send an
email to bccole@acm.org.
For more information about topics, issues and technologies mentioned in this story go to the flashing icon in the upper left corner on
this page or go to the iAppliance Web Views page and call up the associatively-linked Java/XML-based Web map of the iApplianceWeb site.
Enter the appropriate key word, product or company name to list instantly every news and product story, product review and product database entry relating to the topic since the beginning of the 2002.
|
|