SenseNova 5.5: China's First Real-Time Multimodal AI Model

SenseTime has launched SenseNova 5.5, an upgraded version of its LLM that incorporates SenseNova 5o, billed as China’s pioneering real-time multimodal model.

SenseNova 5o marks a significant advancement in AI interaction, offering capabilities comparable to GPT-4o’s streaming interaction features. This enhancement enables users to interact with the model much like they would with a human, making it ideal for real-time conversation and speech recognition applications.

According to SenseTime, its latest model surpasses competitors across multiple benchmarks:

Source: Read Post

Dr. Xu Li, Chairman of the Board and CEO of SenseTime, remarked, “This year marks a pivotal moment for large models as they transition from unimodal to multimodal capabilities. Addressing user demands, SenseTime is dedicated to enhancing interactivity.

“Driven by practical applications and technological strides in multimodal streaming interactions, we anticipate unprecedented transformations in human-AI interactions.”

SenseTime’s upgraded SenseNova 5.5 demonstrates a 30% performance boost over its predecessor, SenseNova 5.0, released just two months earlier. Key improvements include enhanced mathematical reasoning, English proficiency, and command-following capabilities.

In a bid to democratize access to advanced AI tools, SenseTime introduced a cost-effective edge-side large model, reducing device costs to as low as RMB 9.90 ($1.36) annually, potentially accelerating adoption across various IoT devices.

Additionally, the company launched “Project $0 Go,” offering a free onboarding package for enterprise users migrating from OpenAI. This initiative includes a 50 million tokens package and API migration consulting services, aimed at lowering barriers for businesses to leverage SenseNova.

SenseTime’s commitment to edge-side AI is exemplified in SenseChat Lite-5.5, which now features a 40% reduction in inference time (now at 0.19 seconds) and a 15% increase in inference speed (now 90.2 words per second).

Expanding its AI applications, SenseTime introduced Vimi, a controllable AI avatar video generator enabling precise control over facial expressions and upper body movements from a single photo, advancing entertainment and interactive capabilities.

The SenseTime Raccoon Series, AI-native productivity tools, received upgrades: Code Raccoon boasts a five-fold faster response time and 10% greater coding accuracy, while Office Raccoon expands with a consumer-facing webpage and WeChat mini-app.

SenseTime’s large model technology is already transforming industries: in finance, enhancing compliance, marketing, and investment research; in agriculture, cutting material usage by 20% and boosting crop yields by 15%; and in cultural tourism, improving travel planning and booking efficiency.

With over 3,000 government and corporate customers across technology, healthcare, finance, and programming sectors, SenseTime solidifies its position as a leading AI innovator.

https://twitter.com/rowancheung/status/1810183314466037966