Efficient processing of deep neural networks

1/10/2024

Compared to today’s neural networks, cognitive systems have a deeper understanding of how to interpret data at a different level of abstraction. In the future, cognitive systems, which aim to simulate human thought processes, will emerge with greater prominence. For example, while autonomous navigation demands a computational response latency limit of 20μs, voice and video assistants must understand spoken keywords in less than 10μs and hand gestures in a few hundred milliseconds. The edge offers a tremendous variety of applications that requires AI accelerators to be specifically optimized for different characteristics like latency, energy efficiency, and memory based on the needs of the end application. For an independent assessment of training and inference performance of machine learning hardware, software, and services, teams can consult MLPerf, an independent organization formed by a group of engineers and researchers from industry and academia.Īs intelligence moves to the edge in many applications, this is creating greater differentiation in AI accelerators. Measuring performance of AI accelerators has been a contentious topic. A representative example is the Facebook Glow compiler. To facilitate connectivity between high-level software frameworks, such as TensorFlow™ or PyTorch™, and different AI accelerators, machine learning compilers are emerging to enable interoperability. Different AI accelerator architectures may offer different performance tradeoffs, but they all require an associated software stack to enable system-level performance otherwise, the hardware could be underutilized.

0 Comments

Efficient processing of deep neural networks

Leave a Reply.

Author

Archives

Categories