Tencent’s tech team has optimized DeepSeek’s open-source DeepEP communication framework,andrea abeli sex video boosting its performance across different network environments, according to the Chinese AI startup. Testing showed a 100% improvement on RoCE networks and a 30% gain on InfiniBand (IB), offering more efficient solutions for AI model training. On GitHub, DeepSeek acknowledged the Chinese tech giant’s contribution had led to a “huge speedup.” DeepEP is a communication library tailored for a mixture of experts (MoE) and expert parallelism (EP), supporting high-throughput, low-latency GPU kernels and low-precision computing, including FP8. Tencent’s Starlink Networking team identified two main bottlenecks: underutilized dual-port NIC bandwidth and CPU control latency. After targeted optimizations, performance doubled on RoCE and improved by 30% on IB. The enhanced framework is now fully open-source and has been successfully deployed in training Tencent’s Hunyuan large model, demonstrating strong versatility within environments built on Tencent’s Starlink and H20 servers, Chinese tech media outlet iThome reported. [iThome, in Chinese]
(Editor: {typename type="name"/})
Ms. Frizzle spotted at Science Marches across the globe
Mark Zuckerberg just said he wants Facebook to save the world
The best early Prime Day book deals already live: Save on hardcovers, paperbacks, and Kindle books
A bike helmet with turn signal lights will make rides a whole lot safer
Zuckerberg removed a line about monitoring private messages from his Facebook manifesto
A new 'continent's' been found under the magical land of New Zealand
Best MacBook deal: Save $200 on 2024 M3 MacBook Air
List lovers rejoice, wearing Post
SpaceX will try to achieve 2 impressive feats on Monday
How CPUs are Designed, Part 4: Where is Computer Architecture and Design Headed?
接受PR>=1、BR>=1,流量相当,内容相关类链接。