- Can doctors trust AI diagnostic tools enough to delegate tasks?
Towards physician-centered oversight of conversational diagnostic AI
- Can seeing the document like a human dramatically boost a RAG system's IQ?
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
- Can AI reconstruct super-slow-motion 4D models from regular speed multi-camera video?
4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture
- An embarrassingly simple defense against LLM abliteration attacks
Defending AI systems against a new form of attack
- Questioning the role of "chains of thought"
Beyond semantics: The unreasonable effectiveness of reasonless intermediate tokens
- Zero-shot voice cloning without transcription
MiniMax-Speech: Intrinsic zero-shot text-to-speech with a learnable speaker encoder
- X-Transfer Attacks
Towards super transferable adversarial attacks on CLIP
- Big companies are hacking AI's top leaderboard
Goodhart's Law comes for LLMs
- Using AI agents to make more realistic 3D scenes
Scenethesis is an agentic framework for 3D scene generation
- Recursively summarizing enables long-term dialog memory in LLMs
The challenge of long-term memory in AI conversations