学术成果 - SPIN Lab

2025

SparseAlign: A Fully Sparse Framework for Cooperative Object Detection

Yunshuang Yuan, Yan Xia†, Daniel Cremers, Monika Sester

CVPR 2025

This paper proposes SparseAlign, a fully sparse framework for cooperative object detection. The framework achieves efficient multi-agent cooperative perception through sparse feature representation and alignment mechanisms. Our method significantly reduces communication overhead while maintaining detection accuracy, providing a novel solution for cooperative perception in autonomous driving and intelligent transportation systems.

论文PDF 代码 DOI

协同目标检测全稀疏框架自动驾驶

Localizing Events in Videos with Multimodal Queries

Gengyuan Zhang, Mang Ling Ada Fok, Jialu Ma, Yan Xia†, Daniel Cremers, Philip Torr, Volker Tresp, Jindong Gu

CVPR 2025

This paper investigates the problem of localizing events in videos using multimodal queries. We propose a novel approach that can handle queries from multiple modalities including text, images, and audio to accurately localize relevant events in videos. Our method achieves state-of-the-art performance on multiple benchmark datasets, providing important contributions to the fields of video understanding and retrieval.

论文PDF 代码 DOI

多模态查询视频事件定位视频理解

2024

Text2Loc: 3D Point Cloud Localization from Natural Language

Yan Xia*†, Letian Shi*, Zifeng Ding, João F. Henriques, Daniel Cremers

CVPR 2024

This paper presents Text2Loc, a novel method for 3D point cloud localization from natural language descriptions. We design a hierarchical Transformer architecture that can understand natural language descriptions and accurately localize target positions in large-scale 3D point cloud scenes. Our method achieves excellent performance on multiple indoor and outdoor datasets, opening new research directions for language-based 3D scene understanding.

论文PDF 代码 DOI

3D点云定位层次化Transformer 自然语言处理

2023

CASSPR: Cross Attention Single Scan Place Recognition

Yan Xia*†, Mariia Gladkova*, Rui Wang, Qianyun Li, Uwe Stilla, João F. Henriques, Daniel Cremers

ICCV 2023

This paper proposes CASSPR, a cross attention single scan place recognition method. The approach employs an innovative cross attention architecture that enables efficient place recognition from single LiDAR scans, significantly improving recognition accuracy and robustness in complex environments. Our method achieves state-of-the-art performance on multiple benchmark datasets, providing important technical support for robotic navigation and SLAM systems.

论文PDF 代码 DOI

地点识别交叉注意力激光雷达机器人导航

2021

SOE-net: A self-attention and orientation encoding network for point cloud based place recognition

Yan Xia, Yusheng Xu, Shuang Li, Rui Wang, Juan Du, Daniel Cremers, Uwe Stilla

CVPR 2021

This paper proposes SOE-net, a self-attention and orientation encoding network for point cloud based place recognition. The method captures geometric structure information of point clouds through innovative orientation encoding techniques, and combines self-attention mechanisms to learn long-range dependencies between points, significantly improving point cloud-based place recognition accuracy. SOE-net achieves excellent performance on multiple benchmark datasets, providing important technical support for mobile robot localization and navigation.

论文PDF 代码 DOI

地点识别自注意力机制方向编码点云处理

会议主题：Multi-modal Localization and Mapping (MuLMa)

主办单位：中国科学技术大学SPIN Lab实验室

会议详情

举办时间： 2025年10月

举办地点： 夏威夷

会议语言： 英文

组织者信息

Yan Xia

中国科学技术大学(USTC)

Yan Xia is now a tenure-track Associate Professor at University of Science and Technology of China (USTC). He was a senior researcher in the Chair of Computer Vision & Artificial Intelligence at Technical University of Munich(TUM) and a research scientist at MunichCenter for Machine Learning(MCML), working with Prof. Daniel Cremers. He obtained hisPhD degree from TUM in 2023 and was a visiting scholar in Visual Geometry Group(VGG) atUniversity of Oxford. His research interests include 3D vision, robotics, and autonomous driving.

Niclas Zeller

教授，卡尔斯鲁厄应用科技大学(HKA)

Niclas Zeller is a Professor at Karlsruhe University of Applied Sciences (HKA), at the facultyof Electrical Engineering and Information Technology. Before being appointed professor. Niclas worked as a senior computer vision & artificial intelligence researcher at Artisense andas an ADAS perception developer at Visteon in Karlsruhe. In 2018, Niclas obtained a PhDfrom TUM. His research focuses on SLAM and 3D perception based on different visual sensors.

Daniel Cremers

教授，慕尼黑工业大学(TUM)

Daniel Cremers is a Professor at TUM, where he is holding the Chair of Computer Vision & Artificial Intelligence. He is also a co-founder of Artisense, a deep-tech startup developing computer vision and Al solutions for robotics and autonomous driving. Daniel has served asan area chair for ICCV, ECCV, CVPR, ACCV, IROS, etc., and as a program chair for ACCV2014. In 2018, he was an organizer of ECCV in Munich. His publications received severa awards and have been cited more than 70000 times. In 2016, Daniel received the Leibniz Award, the biggest award in German academia.

Timothy D Barfoot

教授，多伦多大学(UofT)

Timothy D Barfoot is a Professor at the University of Toronto Institute for Aerospace Studies. He has been conducting research in the area of navigation of mobile robotics forover 20 years, both in industry and academia, for applications including space exploration,mining, military, and transportation. He is a Fellow of the lEEE Robotics and AutomationSociety.

Luca Carlone

副教授，麻省理工学院（MIT）

Luca Carlone is the Boeing Career Development Associate Professor with the Department of Aeronautics and Astronautics, Massachusetts Institute of Technology (MIT), Cambridge, MAUSA, and a Principal Investigator with the Laboratory for Information & Decision Systems, MlT. His research interests include nonlinear estimation and probabilistic inference,numerical and distributed optimization, and geometric computer vision applied to sensing, perception, and decision-making in single and multi-robot systems.

Frank Dellaert

教授，卡尔斯鲁厄应用科技大学(HKA)

Frank Dellaert is CTO at Verdant Robotics and is also a Professor at the School of InteractiveComputing at the Georgia Institute of Technology. His research focuses on probabilistic methods inrobotics and computer vision. He has applied Markov chain Monte Carlo sampling methodologies in avariety of novel settings, from multitarget tracking to the correspondence problem. Before that, withD. Fox and S. Thrun, he introduced the Monte Carlo localization method for robot localization, whichis now a standard and popular tool in mobile robotics.

Ayoung Kim

副教授，首尔国立大学（SNU）

Ayoung Kim is an Associate Professor in the Department of Mechanical Engineering at Seoul NationalUniversity (SNU), leading the Robust Perception for Mobile Robotics lab. Before joining SNU, she wasat the Department of Civil and Environmental Engineering of the Korea Advanced Institute of Scienceand Technology (KAIST) from 2014 to 2021. Her research interest is perceptual robot autonomy fornavigation and spatial representation learning.

空间智能机器人感知地理信息定位国际研讨会

2025年10月