Current
solutions for ensuring the viability of our platforms, i.e., that manufactured
platforms are indeed working correctly, have been either pushed to the limits
or have proven to be either cost-ineffective or inadequate in the face of enormous
complexity, parametric variations, environmental variations, and aging. We need
fundamental breakthroughs in design, verification, validation, and test
technologies to continue to produce and maintain working platforms at an
affordable cost. Addressing these immensely complex challenges requires
collaborative research in all areas of system validation, software and hardware
verification, post-silicon validation, manufacturing testing, and
post-deployment resiliency. Two themes, Platform Viability and Resilient
Systems, will jointly address these challenges. The Platform Viability theme
will target quality assurance from design specification to shipment, and will
explore shared solutions jointly with the Resilient Systems theme, which will
focus exclusively on post-deployment and lifetime resiliency.

The
objective of the Platform Viability theme is to deliver low-cost solutions that
can guarantee the design and production of working platforms. Our overall goals
are (i) to develop solutions which can collectively achieve coverage greater
than 99% for all relevant error/fault models used in software and hardware
verification, silicon debug and manufacturing testing, and will only incur less
than 5% area, performance and power overheads for meeting these targets, (ii)
to deliver formal verification capabilities ensuring the real-time correctness
of concurrent hardware-software for heterogeneous many-core platforms with 100+
nodes, and (iii) to investigate new solutions for testing and verifying power
consumption and power management (in contrast to existing objectives for
functionality and speed).
To support
the infrastructure and mobile segments, the modeling and verification
technologies must address the growing concerns of concurrency-related bugs that
result from the sophisticated interactions between concurrent software and
hardware, as well as between the language-level concurrency abstractions and
the hardware-level abstractions. For future many-core designs, we must develop
scalable verification solutions in order to support architectural and
micro-architectural exploration as well as to ensure that the uncore
(non-processor) components work properly in the face of functional/electrical
bugs and manufacturing/reliability defects. Post-silicon validation is another
critical area demanding focused attention. Test solutions we develop will be
embedded and self-test in nature and thus can support both packaged-chip and
bare-die testing. Therefore, they will also support known-good-die for 3D
integration.
To achieve overall cost reduction and quality improvement, we need to carefully
investigate the possibility of hardware resource sharing and joint optimization
among all post-silicon and deployment quality assurance functions - including
validation, calibration, manufacturing testing, adaptation, diagnosis, and
post-deployment testing. Several tasks within the theme and collaborations with
the Resilient Systems theme have been planned with this objective in mind.