Just one in a billion | Future Markets Magazine

Reliability is increasingly becoming a critical factor in microchips. This is because electronics are assuming more and more functions critical to safety – whether these are in automated driving, in medical technology or in industrial production using robots. Different approaches can be used to increase the reliability of micro-electronics, however.

Reliability means that the microchip performs its tasks throughout its entire service life without any faults. Up to now, the semiconductor industry has concentrated on quality control during the production process, followed by a test of the finished chip – but this only ensures a product that is manufactured without fault, not its reliability in the field over the longer term. In the consumer industry, where high-end chips with feature sizes of 10 nanometres or smaller are above all used, this is not yet such a big problem. Here, it has so far generally been permitted for one in a million chips to fail within an assumed service life of two years. However, since more and more high-end chips are also being used in safety-critical applications, they must become more reliable. The automotive industry, for example, is demanding that chips can function for 18 years without fault, or that only one chip per billion fails in this time. The requirements are becoming more stringent in other markets, too. Smartphone manufacturers are also now specifying that chips have to work for four years, rather than the two years previously. And in certain industrial and IoT applications where replacing sensors is difficult, chips also sometimes have to last 20 years or longer.

Increasing reliability

In order to increase the reliability of a microchip, the designers must have an overview of the interaction between all of the components. The circuit board, joining technology and chip housing must be perfectly designed, taking into account the environmental conditions of where it will be used in the future, too. Moisture can also lead to corrosion in the chip and vibration can cause connections to loosen, etc.

Furthermore, the reliability of the actual semiconductor device must be considered. There are some rough guidelines here: chips made of relatively coarse patterns tend to be less susceptible to influences such as cosmic radiation or fluctuating operating voltages. In contrast, chips with a smaller area suffer less from mechanical stress factors such as vibration or differences in temperature. Chips are also subject to an ageing process. Electron migration causes discontinuity in conductor strips, and effects from temperature, such as bias temperature instability (BTI) and hot carrier injection (HCI), play an ever-greater role in highly integrated chips. The terms ageing, wear or degradation are used here. Due to the continuing miniaturisation of microelectronic components, these negative changes to the material properties have become even more varied and complicated. Locally occurring current densities and field intensities within a circuit tend to reach critical values in smaller patterns, for example.

In standard electronics, designers usually minimise the risk of failure by integrating a safety reserve in their designs. This “over design” is, however, expensive, time consuming and is no longer feasible with technologies that are becoming smaller and smaller.

Chips with integrated self-test

One solution to at least detect pending failures at an earlier stage is to integrate self-tests into the chip. With so-called built-in self tests (BIST), integrated circuits are enhanced by hardware or software functions which they can use to test if they are working. The processor clock can be monitored, for example, by a “clock control” detecting any clock errors. And if the worst comes to the worst, the system is automatically placed in a safe state and an applicable signal generated.

Predicting failures

There are solutions that go one step further by monitoring the entire chip and using artificial intelligence to signal when a failure is about to occur. Israeli company proteanTecs has, for example, developed an on-chip monitoring method. It links a software platform based on machine learning with specifically developed agents. These are already integrated into the semiconductor design during the development stage and work as sensors in the semiconductor. When they are read out and the data obtained is analysed, it is possible to gain an insight into the functionality and performance capability of semiconductors and electronic systems. Especially when it comes to new semiconductor generations, these results can be used to improve quality and reliability and extend the service life.

Simulating ageing

To avoid an “over design”, designers can also integrate a simulation of the expected ageing into the IC development process. This means the reliability of the designs can be precisely predicted as early as the design stage. For example, the Fraunhofer IIS is developing approaches for this at its Division Engineering of Adaptive Systems EAS in Dresden. Under the slogan “Physics-of-Failure”, they are linking knowledge about the physical mechanisms with approaches based on statistical data regarding failures during use. This means that, in the future, electronics design teams can efficiently assess potential reliability issues with semiconductors and systems – and do this before they are manufactured.

Fingerprint for electronics

Trustworthiness is a topic that is closely related to reliability. This is because counterfeit chips or chips that have been tampered with can also cause failure during use. Researchers at Ulm University are therefore working on developing a forgery-proof physical “fingerprint” for electronic circuit boards, programmable circuits and integrated circuits (FPGA and microcontrollers). The idea is based on there being unavoidable process fluctuations during the production of the components, which causes the smallest deviations on a nanolevel. By recording these deviations in detail, it becomes possible to identify the component over its entire service life. This means that it is always possible at a later stage to find out whether a component is an original or whether it has been modified to the detriment of the application. The idea behind this is that uniquely identifying electronic components is the key to greater reliability.