BrepLLM: Enabling Large Language Models to Understand Boundary Representations

Yilei Shi

BrepLLM: Enabling Large Language Models to Understand Boundary Representations

Abstract

Current token-sequence-based Large Language Models (LLMs) struggle to directly process 3D Boundary Representation (B-rep) models that contain complex geometric and topological information. To this end, we propose BrepLLM, the first multimodal framework that enables LLMs to directly parse and reason over raw B-rep data. BrepLLM adopts a two-stage training pipeline: cross-modal alignment pre-training and two-stage LLM fine-tuning. In the first stage, we design an adaptive UV sampling strategy to convert B-reps into graph representations that integrate geometric and topological information. Subsequently, we construct a hierarchical BrepEncoder to extract features from geometric elements (faces and edges) and topology, generating a global token and a sequence of node tokens. Then, via contrastive learning, we conduct an initial alignment between this global token and the text embeddings of a frozen CLIP text encoder (ViT-L/14). In the second stage, we integrate the pre-trained BrepEncoder into the LLM and employ a two-stage progressive strategy to align the sequence of node tokens: (1) training an MLP-based semantic mapping network that utilizes the prior knowledge of a 2D-VLM to align the B-rep representation to the 2D visual semantic space; (2) utilizing LoRA for parameter-efficient fine-tuning of the Q-Former and the LLM backbone network to achieve the final 3D-language generation capability. Furthermore, we construct the Brep2Text dataset, which contains 269,444 B-rep and text question-answer pairs. Experiments demonstrate that BrepLLM achieves SOTA performance on 3D object classification and captioning tasks. The project page is available at https://user-deng.github.io/BrepLLM/.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…