PhysGraph: Physically-Grounded Graph-Transformer Policies for Bimanual Dexterous Hand–Tool–Object Manipulation

UC San Diego

Abstract

Bimanual dexterous manipulation, particularly involving complex tool use, remains a formidable challenge in embodied AI due to the high-dimensional state space and sophisticated contact dynamics required to coordinate multi-fingered hands. Existing state-of-the-art (SOTA) methods typically rely on global policy that treat the system state as a flattened vector, thereby discarding the rich structural and topological information inherent to articulated hands. To address this, we present PhysGraph, a novel physically-grounded graph-transformer policy designed explicitly for challenging bimanual hand-tool-object manipulation. Unlike prior works, we formulate the bimanual system as a kinematic graph and introduce a per-link tokenization strategy that preserves fine-grained local state information. Crucially, we propose a physically-grounded bias generator that injects learning-based structural priors—including kinematic spatial distance, dynamic contact states, geometric proximity, and anatomical properties—directly into the attention mechanism. This allows the policy to explicitly reason about physical connectivity and interaction logic rather than learning it implicitly from sparse rewards. Extensive experiments on the OakInk2 dataset demonstrate that PhysGraph significantly outperforms baselines in manipulation precision and task success rates. Furthermore, the inherent topological flexibility of our architecture enables zero-shot generalization to unseen tool/object geometries, and embodiment-agnostic deployment across diverse robotic hands (Shadow, Allegro, Inspire).

Contributions

  • We propose PhysGraph, the first graph-transformer policy for high-DoF challenging bimanual dexterous tool-use, which explicitly models the hand-tool-object interactions as a dynamic kinematic graph and processes per-link tokenized multi-model observations.
  • We introduce a novel Physically-Grounded Bias Generator that injects learning-based structural priors into transformer attention, including spatial/topological bias, edge-type bias, geometric proximity bias, and anatomical priors via head-specific masking, enabling the policy to learn physically plausible tool-use manipulation precisely.
  • Extensive experiments on challenging bimanual tool-use tasks demonstrate that PhysGraph significantly outperforms SOTA baseline in success rate and motion fidelity, supports zero-shot generalization to unseen tool/object in different tasks, and is embodiment-agnostic to popular robotic dex-hands (Shadow, Allegro, Inspire).
  • Overview

    (a) Physical Graph & Tokenization: The bimanual workspace is modeled as a kinematic graph where nodes represent links of the left/right hands, tools, and objects. Nodes are connected by static edges (bones) and dynamic edges (contact). State-based multi-modal observations for each link are processed into parallel input tokens. (b) Physically-Grounded Bias Generator: This module computes four distinct biases, which are aggregated into a composite bias matrix. These biases are applied via Head-Specific Masking, allowing different attention heads to focus on specific physical relationships. (c) Graph Transformer Encoder: The tokenized inputs are processed by the Transformer encoder where the Multi-Head Attention (MHA) is modulated by the generated bias in (b). (d) Output Heads: The globally encoded [POL] token is passed to MLP heads to predict the policy action distribution and value function.

    Results

    Bimanual Tool-Use Tasks

    Zero-Shot Policy Generation

    Embodiment-Agnostic Validation

    BibTeX

    @misc{physgraph,
          title={PhysGraph: Physically-Grounded Graph-Transformer Policies for Bimanual Dexterous Hand-Tool-Object Manipulation}, 
          author={Runfa Blark Li and David Kim and Xinshuang Liu and Keito Suzuki and Dwait Bhatt and Nikola Raicevic and Xin Lin and Ki Myung Brian Lee and Nikolay Atanasov and Truong Nguyen},
          year={2026},
          eprint={2603.01436},
          archivePrefix={arXiv},
          primaryClass={cs.RO},
          url={https://arxiv.org/abs/2603.01436}, 
    }