Let Intel on Black Technology of AI on PC-Application of AVX512 instruction set in consumer processor
CPU squeezing toothpaste has always been a topic that the market cannot circumvent, and various segments of -Intel squeezing toothpaste are endless. But in fact, Intel has silently applied many supercomputing technologies to consumer-grade processors, such as AVX521 and its extended instruction set VNNI, continuously improving the AI performance in the processor, so that the overall performance of the processor has been further improved and optimization.
Many users have the perception that the current PC processor has entered a bottleneck in terms of performance and applications. For example, using a 16-core CPU to access the Internet is not much faster than a 4-core CPU. The PC processor has "excess performance". But in fact, such recognition is very obvious. Needless to say, users in the professional field are pursuing the ultimate performance of computers. With the rapid development of Internet technology, ordinary users have more and more demands for images and videos, and their requirements are also getting higher and higher. For example, converting voice into text and video The automatic optimization of images and so on, which have exponentially improved the processor performance requirements compared to text processing. If the application requirements and technology of supercomputing can be brought to the consumer market, users can clearly feel the improvement of PC performance, because supercomputing is a highly parallel, large-scale expansion of the computer, the number of cores, parallel performance are very High demands.
In fact, Intel has long-term foresighted insight into this trend and demand, and has been committed to this, and more importantly, Intel has always done so, not only in the next 5-10 years, but now has a large number of The supercomputing technology has been brought to the consumer market and brought to ordinary consumers. Why can we say that?
Intel is an absolute big player in the supercomputing market, especially the supercomputing CPU market. At present, the fastest 500 supercomputers use Intel CPUs accounting for nearly 95%. It can be said that the core technology of CPUs in supercomputing, Intel is definitely the leader who does not allow it. In addition, Intel’s Xeon Scalable processors for the latest supercomputing processors have many new features, not only in the number of cores, but also the faster CPU point-to-point interconnect bus UPI, node interconnect solution Omni-Path, the Intel Parallel Studio suite on the software can provide tools from the software development environment, performance tuning, high-performance math library to compiler, etc., providing developers and users with the most high-performance application software in an all-round way. With the comprehensive optimization of vectorization, there will be an unprecedented improvement in performance.
In the CPU core, the seemingly unfavorable Xeon processor supports the latest advanced vector extension AVX-512 instruction set. This is the latest wide vector data processing implementation of the X86 CPU. Intel provides a single 512-bit data and controls The instruction execution unit makes the width of the combined vector data that can be processed by the CPU reach 512 bits at a time, and expands to 32 512-bit MZM registers to ensure the temporary storage requirements of data processing. It also supports FMA fusion multiply-add operations, which is compared to the current The 256-bit vector processing capacity of AVX2 of mainstream products and competing products is doubled, and more importantly, through a large number of supplementary expansions, the speed of certain specific operations is greatly accelerated, making it more than doubled.
This powerful data processing capability requires extremely high application requirements to reflect its power. At present, AVX512 configuration files have been widely used in supercomputers and scientific computing fields to improve their computing efficiency. NAMD, Gromacs, lammps, Intel Media SDK, and Ospray custom rendering In various fields, the device can be accelerated by AVX512 to achieve faster or richer special effects computing, graphics, and various multimedia applications. This is currently only for professional users, and it is a supercomputing application. Intel has started to work. The current Core i9 X series processors all support the AVX-512 instruction set and maintain the same 2 512-bit FMA units as high-end servers.
At the same time, in the notebook, the 10th generation Intel Core IceLake series just launched last year also supports the AVX512 instruction set. In the future, the Intel Core product line will all support the AVX-512 instruction set and related latest extensions. It can be said that you do not need to wait In 5-10 years, you can now embrace supercomputing technology in your arms. This is also the foresight of Intel as a technology leader.
And the application of speech to text just mentioned has also been implemented by Intel in large numbers. Intel has advocated that AI inference has been widely used in speech recognition, image recognition and text recognition applications. VNNI based on the AVX512 instruction set is Intel’s latest AI inference acceleration instruction set, by turning the three instructions originally required for the int8 fusion multiply-add operation into one instruction execution, has greatly increased the rate of inference applications related to AI convolution calculation for int8 data types.
Use the VPDPWSSD instruction of VNNI to complete the int8 multiply and int32 accumulation operations that can only be completed by the last three instructions.
And through the tenth generation Intel Core X series, IceLake supports AVX512 VNNI, Intel has also brought the latest AI inference technology to the consumer market. Through the latest image recognition, classification, speech and text recognition applications and Intel OpenVino AI inference optimization framework, it will It will greatly improve the user's experience in text and image recognition applications, and complete some image processing tasks faster.
This is not all. Intel started with the sixth-generation Core processor and supported the TSX instruction set on some mainstream processor models. This is a transaction memory load expansion instruction. It is designed to handle high concurrent services in database transactions. When the data table is modified synchronously, it involves dealing with the problem of locking when the data is modified. Multi-thread concurrent modification of the data table often requires the program to lock. The program judges and arbitrates each modification of the data, but the lock itself is also formed by the program code. The operation of executing the lock will greatly reduce the concurrency and increase the CPU execution pressure. TSX is a coarse-grained lock that wraps the critical section that contains transactional operations; the hardware automatically detects the data conflicts in the operation to ensure the transaction The correctness of sexual operations and the exploration of parallelism between operations can dig out more opportunities for parallelism. Now, many simulator users have paid more and more attention to the support of the TSX instruction set because it can greatly improve the simulation of high-performance requirements. The efficiency of the device, such as the PS3 emulator, and this is also the watchdog secret of server-side database transactions. After using TSX, compared with other products and competitors that are not supported, the simple transaction throughput rate can be increased by up to 10 times, and such Professional instruction set is supported on X series processors and higher-end desktop processors. (See Intel ARK for details)
Innovative technology companies such as Intel have been ahead of the industry. Intel has brought a lot of enterprise-level and super-computing-level technical support to consumer users through a powerful combination of software and hardware. AI technology and performance are applied on the PC side, constantly digging up new experience for consumers, and laying a solid foundation for actively responding to the needs of the latest applications. I think, at present, only Intel can do it. With this black technology blessing, do you still think Intel's CPU is squeezing toothpaste?
Many users have the perception that the current PC processor has entered a bottleneck in terms of performance and applications. For example, using a 16-core CPU to access the Internet is not much faster than a 4-core CPU. The PC processor has "excess performance". But in fact, such recognition is very obvious. Needless to say, users in the professional field are pursuing the ultimate performance of computers. With the rapid development of Internet technology, ordinary users have more and more demands for images and videos, and their requirements are also getting higher and higher. For example, converting voice into text and video The automatic optimization of images and so on, which have exponentially improved the processor performance requirements compared to text processing. If the application requirements and technology of supercomputing can be brought to the consumer market, users can clearly feel the improvement of PC performance, because supercomputing is a highly parallel, large-scale expansion of the computer, the number of cores, parallel performance are very High demands.
In fact, Intel has long-term foresighted insight into this trend and demand, and has been committed to this, and more importantly, Intel has always done so, not only in the next 5-10 years, but now has a large number of The supercomputing technology has been brought to the consumer market and brought to ordinary consumers. Why can we say that?
Intel is an absolute big player in the supercomputing market, especially the supercomputing CPU market. At present, the fastest 500 supercomputers use Intel CPUs accounting for nearly 95%. It can be said that the core technology of CPUs in supercomputing, Intel is definitely the leader who does not allow it. In addition, Intel’s Xeon Scalable processors for the latest supercomputing processors have many new features, not only in the number of cores, but also the faster CPU point-to-point interconnect bus UPI, node interconnect solution Omni-Path, the Intel Parallel Studio suite on the software can provide tools from the software development environment, performance tuning, high-performance math library to compiler, etc., providing developers and users with the most high-performance application software in an all-round way. With the comprehensive optimization of vectorization, there will be an unprecedented improvement in performance.
In the CPU core, the seemingly unfavorable Xeon processor supports the latest advanced vector extension AVX-512 instruction set. This is the latest wide vector data processing implementation of the X86 CPU. Intel provides a single 512-bit data and controls The instruction execution unit makes the width of the combined vector data that can be processed by the CPU reach 512 bits at a time, and expands to 32 512-bit MZM registers to ensure the temporary storage requirements of data processing. It also supports FMA fusion multiply-add operations, which is compared to the current The 256-bit vector processing capacity of AVX2 of mainstream products and competing products is doubled, and more importantly, through a large number of supplementary expansions, the speed of certain specific operations is greatly accelerated, making it more than doubled.
This powerful data processing capability requires extremely high application requirements to reflect its power. At present, AVX512 configuration files have been widely used in supercomputers and scientific computing fields to improve their computing efficiency. NAMD, Gromacs, lammps, Intel Media SDK, and Ospray custom rendering In various fields, the device can be accelerated by AVX512 to achieve faster or richer special effects computing, graphics, and various multimedia applications. This is currently only for professional users, and it is a supercomputing application. Intel has started to work. The current Core i9 X series processors all support the AVX-512 instruction set and maintain the same 2 512-bit FMA units as high-end servers.
At the same time, in the notebook, the 10th generation Intel Core IceLake series just launched last year also supports the AVX512 instruction set. In the future, the Intel Core product line will all support the AVX-512 instruction set and related latest extensions. It can be said that you do not need to wait In 5-10 years, you can now embrace supercomputing technology in your arms. This is also the foresight of Intel as a technology leader.
And the application of speech to text just mentioned has also been implemented by Intel in large numbers. Intel has advocated that AI inference has been widely used in speech recognition, image recognition and text recognition applications. VNNI based on the AVX512 instruction set is Intel’s latest AI inference acceleration instruction set, by turning the three instructions originally required for the int8 fusion multiply-add operation into one instruction execution, has greatly increased the rate of inference applications related to AI convolution calculation for int8 data types.
Use the VPDPWSSD instruction of VNNI to complete the int8 multiply and int32 accumulation operations that can only be completed by the last three instructions.
And through the tenth generation Intel Core X series, IceLake supports AVX512 VNNI, Intel has also brought the latest AI inference technology to the consumer market. Through the latest image recognition, classification, speech and text recognition applications and Intel OpenVino AI inference optimization framework, it will It will greatly improve the user's experience in text and image recognition applications, and complete some image processing tasks faster.
This is not all. Intel started with the sixth-generation Core processor and supported the TSX instruction set on some mainstream processor models. This is a transaction memory load expansion instruction. It is designed to handle high concurrent services in database transactions. When the data table is modified synchronously, it involves dealing with the problem of locking when the data is modified. Multi-thread concurrent modification of the data table often requires the program to lock. The program judges and arbitrates each modification of the data, but the lock itself is also formed by the program code. The operation of executing the lock will greatly reduce the concurrency and increase the CPU execution pressure. TSX is a coarse-grained lock that wraps the critical section that contains transactional operations; the hardware automatically detects the data conflicts in the operation to ensure the transaction The correctness of sexual operations and the exploration of parallelism between operations can dig out more opportunities for parallelism. Now, many simulator users have paid more and more attention to the support of the TSX instruction set because it can greatly improve the simulation of high-performance requirements. The efficiency of the device, such as the PS3 emulator, and this is also the watchdog secret of server-side database transactions. After using TSX, compared with other products and competitors that are not supported, the simple transaction throughput rate can be increased by up to 10 times, and such Professional instruction set is supported on X series processors and higher-end desktop processors. (See Intel ARK for details)
Innovative technology companies such as Intel have been ahead of the industry. Intel has brought a lot of enterprise-level and super-computing-level technical support to consumer users through a powerful combination of software and hardware. AI technology and performance are applied on the PC side, constantly digging up new experience for consumers, and laying a solid foundation for actively responding to the needs of the latest applications. I think, at present, only Intel can do it. With this black technology blessing, do you still think Intel's CPU is squeezing toothpaste?

Comments
Post a Comment