OpenVLA OFT

Open Vision-Language-Action models with fine-tuning for robotic manipulation

View on GitHub

Overview

OpenVLA (Open Vision-Language-Action) is an open-source model for robotic manipulation that combines vision, language understanding, and action prediction. OFT enables efficient fine-tuning for specific tasks.

Key Features

  • Vision-language-action architecture
  • Efficient fine-tuning with OFT (Orthogonal Fine-Tuning)
  • Multi-task robot manipulation
  • Generalizable to new objects and environments