This paper proposes SparseAlign, a fully sparse framework for cooperative object detection. The framework achieves efficient multi-agent cooperative perception through sparse feature representation and alignment mechanisms. Our method significantly reduces communication overhead while maintaining detection accuracy, providing a novel solution for cooperative perception in autonomous driving and intelligent transportation systems.
This paper investigates the problem of localizing events in videos using multimodal queries. We propose a novel approach that can handle queries from multiple modalities including text, images, and audio to accurately localize relevant events in videos. Our method achieves state-of-the-art performance on multiple benchmark datasets, providing important contributions to the fields of video understanding and retrieval.
This paper presents Text2Loc, a novel method for 3D point cloud localization from natural language descriptions. We design a hierarchical Transformer architecture that can understand natural language descriptions and accurately localize target positions in large-scale 3D point cloud scenes. Our method achieves excellent performance on multiple indoor and outdoor datasets, opening new research directions for language-based 3D scene understanding.
This paper proposes CASSPR, a cross attention single scan place recognition method. The approach employs an innovative cross attention architecture that enables efficient place recognition from single LiDAR scans, significantly improving recognition accuracy and robustness in complex environments. Our method achieves state-of-the-art performance on multiple benchmark datasets, providing important technical support for robotic navigation and SLAM systems.
This paper proposes SOE-net, a self-attention and orientation encoding network for point cloud based place recognition. The method captures geometric structure information of point clouds through innovative orientation encoding techniques, and combines self-attention mechanisms to learn long-range dependencies between points, significantly improving point cloud-based place recognition accuracy. SOE-net achieves excellent performance on multiple benchmark datasets, providing important technical support for mobile robot localization and navigation.