UnifabriX Uses CXL To Improve HPC Performance
CXL guarantees to remake the best way computing techniques are architected. It runs on PCIe and might prolong the reminiscence on particular person CPUs, however its greatest promise is in offering community arbitrated reminiscence swimming pools that may allocate some greater latency reminiscence as required to CPUs or to software program outlined digital machines. CXL-based merchandise are beginning to seem out there in 2023.
CXL appears to remake knowledge facilities however the benefits of the next latency reminiscence to be used in excessive efficiency computing (HPC) purposes has not been evident, a minimum of till UnifabriX demonstrated bandwidth and capability benefits with their CXL-based good reminiscence node on the 2022 Tremendous Computing Convention (SC22). There’s a simply launched video exhibiting UnifabriX demonstrations for reminiscence and storage HPC purposes exhibiting HPC benefits.
UnifabriX says that the product is predicated upon its Useful resource Processing Unit (RPU). The RPU is in constructed into its CXL Good Reminiscence Node, proven beneath. It is a 2U rack-mounted server with serviceable EDSFF E3 media bays. The product accommodates as much as 64TB capability in DDR5/DDR4 reminiscence and NVMe SSDs.
The corporate says the product is compliant with CXL 1.1 and a couple of.0 and works on PCIe Gen5. In addition they says it’s CXL 3.0 prepared and helps each PCIe Gen5 and CXL enlargement. It additionally helps NVMe SSD entry by CXL (SSD CXL over Reminiscence). The product is supposed to be used in bare-metal and virtualized environments over a variety of purposes, together with HPC, AI and databases.
As with different CXL merchandise, the reminiscence node provides expanded reminiscence, however it could actually additionally present greater efficiency. Specifically, on the 2022 Tremendous Laptop Convention (SC22) the reminiscence node was used to operating an HPCG efficiency benchmark versus the benchmark with out assist from the reminiscence node. The outcomes are proven beneath.
For the standard HPCG benchmark, because the variety of CPU cores processing the benchmark will increase, initially the efficiency will increase roughly linearly with the variety of processor cores. Nonetheless, by about 50 CPU cores the efficiency flattens out with none efficiency enhancements because the variety of cores will increase. By the point you get to 100 cores obtainable, solely 50 cores are getting used. It is because there isn’t any extra reminiscence bandwidth obtainable.
If the reminiscence node is added to supply extra CXL reminiscence along with the reminiscence immediately related to the CPU cores, we see that scaling of efficiency with cores can proceed. The reminiscence node improves general HPCG efficiency by transferring decrease precedence knowledge from the CPU close to reminiscence to the CXL far reminiscence. This prevents saturating the close to reminiscence and permits steady scaling of efficiency with extra processor cores. As proven above the reminiscence node improved HPCG benchmark efficiency by greater than 26%.
The corporate has labored intently with Intel on its CXL answer and Intel mentions these outcomes in addition to different 3rd social gathering testing in its latest product transient about it Infrastructure Processing Unites (IPUs) (Intel Agilex FPGA Accelerators Convey Improved TCO, Efficiency and Flexibility to 4th Gen Intel Xeon Platforms).
Along with offering reminiscence capability and bandwidth enhancements, the reminiscence node may also present NVMe SSD entry by CXL as nicely. The corporate says that their plans are to incorporate reminiscence, storage and networking by the CXL/PCIe interface, therefore the title unifabriX. With networking included as nicely their packing containers might change prime of rack (TOR) options in addition to present reminiscence and storage entry.
The UnifabriX reminiscence node, using the corporate’s Useful resource Processing Unit, gives a path to beat direct join DRAM bandwidth limitations in HPC purposes utilizing shared CXL reminiscence.