From the graphics processing unit through neuromorphic chips to the quantum computer – the development of Artificial Intelligence chips is supporting many new advances.
AI-supported applications must keep pace with rapidly growing data volumes and often have to respond simultaneously in real time. The classic CPUs that you will find in every computer quickly reach their limits in this area because they process tasks sequentially. Significant improvements in performance, particularly in the context of deep learning, would be possible if the individual processes could be executed in parallel.
Hardware for parallel computing processes
A few years, ago, the AI sector focused its attention on the graphics processing unit (GPU), a chip that had actually been developed for an entirely different purpose. It offers a massive parallel architecture, which can perform computing tasks in parallel using many smaller yet still efficient computer units. This is exactly what is required for deep learning. Manufacturers of graphics processing units are now building GPUs specifically for AI applications. A server with just one of these high-performance GPUs has a throughput 40 times greater than that of a dedicated CPU server.
However, even GPUs are now proving too slow for some AI companies. This in turn is having a significant impact on the semiconductor market. Traditional semiconductor manufacturers are now being joined by buyers and users of semiconductors – such as Microsoft, Amazon and even Google – who are themselves becoming semiconductor manufacturers (along with companies who want to produce chips to their own specifications). For example, Alphabet, the parent company behind Google, has developed its own Application-Specific Integrated Circuit (ASIC), which is specifically tailored to the requirements of machine learning. The second generation of this tensor processing unit (TPU) from Alphabet offers 180 teraflops of processing power, while the latest GPU from Nvidia offers 120 teraflops. Flops (Floating Point Operations Per Second) indicate how many simple mathematical calculations, such as addition or multiplication, a computer can perform per second.
Different performance requirements
Flops are not the only benchmark for the processing power of a chip. With AI processors, a distinction is made between performance in the training phase, which requires parallel computing processes, and performance in the application phase, which involves putting what has been learned into practice – known as inference. Here the focus is on deducing new knowledge from an existing database through inference. “In contrast to the massively parallel training component of AI that occurs in the data centre, inferencing is generally a sequential calculation that we believe will be mostly conducted on edge devices such as smartphones and Internet of Things, or IoT, products,” says Abhinav Davuluri, analyst at Morningstar, a leading provider of independent investment research. Unlike cloud computing, edge computing involves decentralised data processing at the “edge” of the network. AI technologies are playing an increasingly important role here, as intelligent edge devices such as robots or autonomous vehicles do not have to transfer data to the cloud before analysis. Instead, they can acquire the data directly on site – saving the time and energy required for transferring data to the data centre and back again.
Solutions for edge computing
For these edge computing applications, another new chip variant – Field-Programmable Gate Array (FPGA) – is currently establishing itself alongside CPUs, GPUs and ASICs. This is an integrated circuit, into which a logical circuit can be loaded after manufacturing. Unlike processors, FPGAs are truly parallel in nature thanks to their multiple programmable logic blocks, which mean that different processing operations are not assigned to the same resource. Each individual processing task is assigned to a dedicated area on a chip and can thus be performed autonomously. Although they do not quite match the processing power of a GPU in the training process, they rank higher than graphics processing units when it comes to inference. Above all, they consume less energy than GPUs, which is particularly important for applications on small, mobile devices. Tests have shown that FPGAs can detect more frames per second and watt than GPUs or CPUs, for example. “We think FPGAs offer the most promise for inference, as they can be upgraded while in the field and could provide low latencies if located at the edge alongside a CPU,” says Morningstar analyst Davuluri.
More start-ups are developing Artificial Intelligence chips
More and more company founders – and investors – are recognising the opportunities offered by AI chips. At least 45 start-ups are currently working on corresponding semiconductor solutions, while at least five of these have received more than USD 100 million from investors. According to market researchers at CB Insights, venture capitalists invested more than USD 1.5 billion in chip start-ups in 2017 – double the amount that was invested just two years ago. British firm Graphcore has developed the Intelligence Processing Unit (IPU), a new technology for accelerating machine learning and Artificial Intelligence (AI) applications. The AI platform of US company Mythic performs hybrid digital/analogue calculations in flash arrays. The inference phase can therefore take place directly within the memory, where the “knowledge” of the neural network is stored, offering benefits in terms of performance and accuracy. China is one of the most active countries when it comes to Artificial Intelligence chip start-ups. The value of Cambricon Technologies alone is currently estimated at USD 1 billion. The start-up has developed a neural network processor chip for smartphones, for instance.
New chip architectures for even better performance of Artificial Intelligence
Neuromorphic chips are emerging as the next phase in chip development. Their architecture mimics the way the human brain works in terms of learning and comprehension. A key feature of these chips is the removal of the separation between the processor unit and the data memory. Launched in 2017, neuromorphic test chips with over 100,000 neurons and 100 million plus synapses can unite training and inference on one chip. When in use, they should be able to learn autonomously at a rate that is a 1 million times better than the third generation of neural networks. At the same time, they are highly energy-efficient.
Quantum computers represent a quantum leap for AI systems in the truest sense of the word. The big players in the IT sector, such as Google, IBM and Microsoft, as well as countries, intelligence services and even car manufacturers are investing in this technology. These computers are based on the principles of quantum mechanics. A quantum computer can perform each calculation step for all states at the same time. This means that it delivers exceptional processing power for the parallel processing of commands and has the potential to compute at a much higher speed than conventional computers. Although the technology may still be in its infancy, the race for faster and more reliable quantum processors is already well underway.