Edge Inference Co-Processor Boosts Throughput, Cuts Power

According to Flex Logix Technologies, its InferX X1 edge inference co-processor delivers high throughput in edge applications with a single DRAM, resulting in much higher throughput/watt than existing solutions. Reportedly, the device’s performance advantage is very strong at low batch sizes that are required in edge applications where there is usually just one camera/sensor. Building on that statement, InferX X1 performance at small batch sizes is close to data center inference boards and is optimized for large models which need hundreds of billions of operations per image.  For example, for YOLOv3 real time object recognition, InferX X1 processes 12.7 fps of 2-Mpixel images at batch size = 1. Performance is roughly linear with image size, therefore frame rate approximately doubles for a 1-Mpixel image.  This is with a single DRAM.


InferX X1 will be available as chips for edge devices and on half-height, half-length PCIe cards for edge servers and gateways. It is programmed using the nnMAX Compiler which takes Tensorflow Lite or ONNX models. The internal architecture of the inference engine is hidden from the user.


The inaugural event will take place June 25-27 in San Jose, CA!

Embedded Technologies Expo & Conference (ETC), in the largest embedded and IoT market in North America, is the ONLY event focused on what is most important to designers and implementers – education and training. Attendees will experience over 100 hours of unparalleled education and training covering embedded systems, IoT, connectivity, edge computing, AI, machine learning, and more. Co-located with Sensors Expo & Conference, attendees will have the opportunity to see hundreds of leading exhibitors and network with thousands of industry peers and innovators.


The coprocessor supports integer 8, 16 and bfloat 16 numerics with the ability to mix them across layers, enabling easy porting of models with optimized throughput at maximum precision. InferX supports Winograd transformation for integer 8 mode for common convolution operations which accelerates throughput by 2.25x for these functions while minimizing bandwidth by doing on-chip, on-the-fly conversion of weights to Winograd mode. To ensure no loss of precision, Winograd calculations are done with 12-bit accuracy.


The InferX X1 will tape-out in Q3 2019 and samples of chips and PCIe boards will be available shortly after.  For more information, visit Flex Logix and/or email [email protected].  

Suggested Articles

Vocabulary speech recognition platform offers a fast and reliable voice interface without privacy concerns.

Flow switches operate on the well-known variable-area principle.

Advanced batch video and image processing streamlines workflows and is available as free trial.