At this week's 2020 VLSI Technology and Circuits Symposium, Intel will introduce a series of research results and technical perspectives on the transformation of computing caused by the growing data distributed on cores, edges, and endpoints. CTO Mike Mayberry will deliver a keynote speech entitled "Future Computing: How Data Transformation Reshapes VLSI", emphasizing the importance of transitioning from hardware/program-centric computing to data/information-centric computing.
"There is a huge amount of data flow on the distributed edge, network and cloud infrastructure, which requires high energy efficiency and powerful processing near the location where the data is generated, but this processing is often constrained by bandwidth, memory and power resources The Intel Research Institute highlighted several new methods for improving computing efficiency at the VLSI seminar. These methods show the broad prospects of various application fields, including robotics, augmented reality, machine vision, and video analysis. The focus is on solving obstacles in data movement and computing, which represent the biggest data challenges in the future."
– Vivek K. De, Intel fellow, Director of Circuit Technology Research, Intel Research Institute
What will be displayed: This seminar will introduce some Intel research papers to explore how higher intelligence and higher energy efficiency can be achieved in the future edge-network-cloud system to support the growing number of edge applications. Some of the topics covered in the research paper (see the end of this press release for a complete list of research) include:
Use light projection hardware accelerator to improve the efficiency and accuracy of 3D scene reconstruction of edge robots
Paper: In edge robots and augmented reality applications, efficient 3D scene reconstruction through 10 nanometer CMOS ray projection accelerator
Significance: Some applications, including edge robots and augmented reality, require accurate, fast, and energy-efficient reconstruction of complex 3D scenes from the large amounts of data generated by raycasting operations to achieve real-time intensive simultaneous positioning and Mapping (SLAM). In this research paper, Intel highlights a new type of ray casting hardware accelerator that can use new technology to maintain the accuracy of scene reconstruction while achieving excellent energy-efficient performance. These innovative methods include technologies such as 3D pixel overlap search and hardware-assisted approximate calculation of 3D pixels, which reduces the need for local memory, and also improves power efficiency to adapt to future edge robots and augmented reality applications.
Use event-driven visual data processing unit (EPU) to reduce power consumption of video streaming analysis based on deep learning
Paper: A 0.05pJ/pixel 70fps FHD 1Meps event-driven visual data processing unit
Significance: Visual data analysis based on real-time deep learning is mainly used in fields such as safety and security. It requires fast detection of objects in multiple video streams, which requires long calculation time and high memory bandwidth. The input frames in these cameras are usually down-sampled to minimize the load, which reduces the image accuracy. In this study, Intel demonstrated an event-driven visual data processing unit (EPU), combined with a novel algorithm, can instruct the deep learning accelerator to use only motion-based "target regions" to process visual input. This new method alleviates the intensive computing and high memory requirements in edge vision analysis.
Expand local memory bandwidth to meet the needs of artificial intelligence, machine learning, and deep learning applications
Paper: 2x bandwidth burst 6T-SRAM designed for workloads with limited memory bandwidth
Significance: Many AI chips, especially those used for natural language processing (such as voice assistants), are increasingly restricted by local memory. To meet the challenges of memory, it is necessary to provide frequency multiplication or increase the number of memory slots, but at the cost of lower power consumption and area efficiency, especially for edge devices with limited area. Through this research, Intel showed how to use the 6T-SRAM array to provide 2 times the read bandwidth as needed in burst mode, its energy efficiency is 51% higher than frequency doubling, and the area efficiency is higher than doubling the number of memory slots. 30%.
All digital binary neural network accelerator
Paper: 617TOPS/W All Digital Binary Neural Network Accelerator with 10nm FinFET CMOS
Significance: In power- and resource-constrained edge devices, some applications can accept low-precision output, so the analog binary neural network (BNN) can be used as a substitute for higher-precision neural networks. The latter is more computationally demanding and has memory-intensive requirements. However, the prediction accuracy of simulated BNNs is low because of their low tolerance to process changes and noise. Through this study, Intel demonstrated the use of an all-digital BNN, which has energy efficiency similar to analog input memory technology, while providing better robustness and scalability for advanced process nodes.
Other Intel research presented at the 2020 VLSI seminar included the following papers:
Future Computing: How Data Transformation Reshapes VLSI
Low clock power digital standard unit IP for high-performance graphics/AI processors of 10 nm CMOS
An autonomously reconfigurable power output network (RPDN) for multi-core SoCs with dynamic current control
3D monolithic heterogeneous integration enables GaN and Si transistors on 300mm silicon wafers (111)
Low swing and column multiplexed bit line technology for low-Vmin, noise-resistant, high-density 1R1W 8T bit cell SRAM for 10nm FinFET CMOS
A dual-rail hybrid analog/digital LDO with dynamic current control for tunable high PSRR and high efficiency
A 435MHz, 600Kops/J anti-side channel attack encryption processor, suitable for 14nm CMOS secure RSA-4K public-key encryption
A 14nm CMOS 0.26% BER 10^28 anti-modeling challenge-response PUF, with Stability-Aware Adversarial Challenge Selection function
An anti-SCA AES engine with 6000 times time/frequency domain leakage suppression, using a nonlinear digital low leakage regulator, and cascaded with 14 nanometer CMOS computing countermeasures
SOT-MRAM CMOS compatible process integration with the heavy metal double bottom electrode and 10ns field-free SOT conversion with STT assistance
A 10nm SRAM design with gate-modulated self-folding write assist can reduce VMIN by 175 millivolts with minimal power expenditure
"There is a huge amount of data flow on the distributed edge, network and cloud infrastructure, which requires high energy efficiency and powerful processing near the location where the data is generated, but this processing is often constrained by bandwidth, memory and power resources The Intel Research Institute highlighted several new methods for improving computing efficiency at the VLSI seminar. These methods show the broad prospects of various application fields, including robotics, augmented reality, machine vision, and video analysis. The focus is on solving obstacles in data movement and computing, which represent the biggest data challenges in the future."
– Vivek K. De, Intel fellow, Director of Circuit Technology Research, Intel Research Institute
What will be displayed: This seminar will introduce some Intel research papers to explore how higher intelligence and higher energy efficiency can be achieved in the future edge-network-cloud system to support the growing number of edge applications. Some of the topics covered in the research paper (see the end of this press release for a complete list of research) include:
Use light projection hardware accelerator to improve the efficiency and accuracy of 3D scene reconstruction of edge robots
Paper: In edge robots and augmented reality applications, efficient 3D scene reconstruction through 10 nanometer CMOS ray projection accelerator
Significance: Some applications, including edge robots and augmented reality, require accurate, fast, and energy-efficient reconstruction of complex 3D scenes from the large amounts of data generated by raycasting operations to achieve real-time intensive simultaneous positioning and Mapping (SLAM). In this research paper, Intel highlights a new type of ray casting hardware accelerator that can use new technology to maintain the accuracy of scene reconstruction while achieving excellent energy-efficient performance. These innovative methods include technologies such as 3D pixel overlap search and hardware-assisted approximate calculation of 3D pixels, which reduces the need for local memory, and also improves power efficiency to adapt to future edge robots and augmented reality applications.
Use event-driven visual data processing unit (EPU) to reduce power consumption of video streaming analysis based on deep learning
Paper: A 0.05pJ/pixel 70fps FHD 1Meps event-driven visual data processing unit
Significance: Visual data analysis based on real-time deep learning is mainly used in fields such as safety and security. It requires fast detection of objects in multiple video streams, which requires long calculation time and high memory bandwidth. The input frames in these cameras are usually down-sampled to minimize the load, which reduces the image accuracy. In this study, Intel demonstrated an event-driven visual data processing unit (EPU), combined with a novel algorithm, can instruct the deep learning accelerator to use only motion-based "target regions" to process visual input. This new method alleviates the intensive computing and high memory requirements in edge vision analysis.
Expand local memory bandwidth to meet the needs of artificial intelligence, machine learning, and deep learning applications
Paper: 2x bandwidth burst 6T-SRAM designed for workloads with limited memory bandwidth
Significance: Many AI chips, especially those used for natural language processing (such as voice assistants), are increasingly restricted by local memory. To meet the challenges of memory, it is necessary to provide frequency multiplication or increase the number of memory slots, but at the cost of lower power consumption and area efficiency, especially for edge devices with limited area. Through this research, Intel showed how to use the 6T-SRAM array to provide 2 times the read bandwidth as needed in burst mode, its energy efficiency is 51% higher than frequency doubling, and the area efficiency is higher than doubling the number of memory slots. 30%.
All digital binary neural network accelerator
Paper: 617TOPS/W All Digital Binary Neural Network Accelerator with 10nm FinFET CMOS
Significance: In power- and resource-constrained edge devices, some applications can accept low-precision output, so the analog binary neural network (BNN) can be used as a substitute for higher-precision neural networks. The latter is more computationally demanding and has memory-intensive requirements. However, the prediction accuracy of simulated BNNs is low because of their low tolerance to process changes and noise. Through this study, Intel demonstrated the use of an all-digital BNN, which has energy efficiency similar to analog input memory technology, while providing better robustness and scalability for advanced process nodes.
Other Intel research presented at the 2020 VLSI seminar included the following papers:
Future Computing: How Data Transformation Reshapes VLSI
Low clock power digital standard unit IP for high-performance graphics/AI processors of 10 nm CMOS
An autonomously reconfigurable power output network (RPDN) for multi-core SoCs with dynamic current control
3D monolithic heterogeneous integration enables GaN and Si transistors on 300mm silicon wafers (111)
Low swing and column multiplexed bit line technology for low-Vmin, noise-resistant, high-density 1R1W 8T bit cell SRAM for 10nm FinFET CMOS
A dual-rail hybrid analog/digital LDO with dynamic current control for tunable high PSRR and high efficiency
A 435MHz, 600Kops/J anti-side channel attack encryption processor, suitable for 14nm CMOS secure RSA-4K public-key encryption
A 14nm CMOS 0.26% BER 10^28 anti-modeling challenge-response PUF, with Stability-Aware Adversarial Challenge Selection function
An anti-SCA AES engine with 6000 times time/frequency domain leakage suppression, using a nonlinear digital low leakage regulator, and cascaded with 14 nanometer CMOS computing countermeasures
SOT-MRAM CMOS compatible process integration with the heavy metal double bottom electrode and 10ns field-free SOT conversion with STT assistance
A 10nm SRAM design with gate-modulated self-folding write assist can reduce VMIN by 175 millivolts with minimal power expenditure
Comments
Post a Comment