2025

SparseAlign: A Fully Sparse Framework for Cooperative Object Detection

Yunshuang Yuan, Yan Xia†, Daniel Cremers, Monika Sester

CVPR 2025

This paper proposes SparseAlign, a fully sparse framework for cooperative object detection. The framework achieves efficient multi-agent cooperative perception through sparse feature representation and alignment mechanisms. Our method significantly reduces communication overhead while maintaining detection accuracy, providing a novel solution for cooperative perception in autonomous driving and intelligent transportation systems.

协同目标检测 全稀疏框架 自动驾驶
-

Localizing Events in Videos with Multimodal Queries

Gengyuan Zhang, Mang Ling Ada Fok, Jialu Ma, Yan Xia†, Daniel Cremers, Philip Torr, Volker Tresp, Jindong Gu

CVPR 2025

This paper investigates the problem of localizing events in videos using multimodal queries. We propose a novel approach that can handle queries from multiple modalities including text, images, and audio to accurately localize relevant events in videos. Our method achieves state-of-the-art performance on multiple benchmark datasets, providing important contributions to the fields of video understanding and retrieval.

多模态查询 视频事件定位 视频理解
-

2024

Text2Loc: 3D Point Cloud Localization from Natural Language

Yan Xia*†, Letian Shi*, Zifeng Ding, João F. Henriques, Daniel Cremers

CVPR 2024

This paper presents Text2Loc, a novel method for 3D point cloud localization from natural language descriptions. We design a hierarchical Transformer architecture that can understand natural language descriptions and accurately localize target positions in large-scale 3D point cloud scenes. Our method achieves excellent performance on multiple indoor and outdoor datasets, opening new research directions for language-based 3D scene understanding.

3D点云定位 层次化Transformer 自然语言处理
12

2023

CASSPR: Cross Attention Single Scan Place Recognition

Yan Xia*†, Mariia Gladkova*, Rui Wang, Qianyun Li, Uwe Stilla, João F. Henriques, Daniel Cremers

ICCV 2023

This paper proposes CASSPR, a cross attention single scan place recognition method. The approach employs an innovative cross attention architecture that enables efficient place recognition from single LiDAR scans, significantly improving recognition accuracy and robustness in complex environments. Our method achieves state-of-the-art performance on multiple benchmark datasets, providing important technical support for robotic navigation and SLAM systems.

地点识别 交叉注意力 激光雷达 机器人导航
25

2021

SOE-net: A self-attention and orientation encoding network for point cloud based place recognition

Yan Xia, Yusheng Xu, Shuang Li, Rui Wang, Juan Du, Daniel Cremers, Uwe Stilla

CVPR 2021

This paper proposes SOE-net, a self-attention and orientation encoding network for point cloud based place recognition. The method captures geometric structure information of point clouds through innovative orientation encoding techniques, and combines self-attention mechanisms to learn long-range dependencies between points, significantly improving point cloud-based place recognition accuracy. SOE-net achieves excellent performance on multiple benchmark datasets, providing important technical support for mobile robot localization and navigation.

地点识别 自注意力机制 方向编码 点云处理
89