UI-TARS Desktop

开源多模态人工智能代理堆栈,连接尖端人工智能模型与代理基础设施。「The Open-sourced Multimodal AI Agent Stack connecting Cutting-edge AI Models and Agent Infra.」

Github星跟踪图

Introduction

English | 简体中文

TARS* is a Multimodal AI Agent stack, currently shipping two projects: Agent TARS and UI-TARS-desktop:

Table of Contents

News

  • [2025-06-25] We released a Agent TARS Beta and Agent TARS CLI - Introducing Agent TARS Beta, a multimodal AI agent that aims to explore a work form that is closer to human-like task completion through rich multimodal capabilities (such as GUI Agent, Vision) and seamless integration with various real-world tools.
  • [2025-06-12] - 🎁 We are thrilled to announce the release of UI-TARS Desktop v0.2.0! This update introduces two powerful new features: Remote Computer Operator and Remote Browser Operator—both completely free. No configuration required: simply click to remotely control any computer or browser, and experience a new level of convenience and intelligence.
  • [2025-04-17] - 🎉 We're thrilled to announce the release of new UI-TARS Desktop application v0.1.0, featuring a redesigned Agent UI. The application enhances the computer using experience, introduces new browser operation features, and supports the advanced UI-TARS-1.5 model for improved performance and precise control.
  • [2025-02-20] - 📦 Introduced UI TARS SDK, is a powerful cross-platform toolkit for building GUI automation agents.
  • [2025-01-23] - 🚀 We updated the Cloud Deployment section in the 中文版: GUI模型部署教程 with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.

Agent TARS

Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.
It primarily ships with a CLI and Web UI for usage.
It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools.

Showcase

Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline

https://github.com/user-attachments/assets/772b0eef-aef7-4ab9-8cb0-9611820539d8

For more use cases, please check out #842.

Core Features

  • 🖱️ One-Click Out-of-the-box CLI - Supports both headful Web UI and headless server) execution.
  • 🌐 Hybrid Browser Agent - Control browsers using GUI Agent, DOM, or a hybrid strategy.
  • 🔄 Event Stream - Protocol-driven Event Stream drives Context Engineering and Agent UI.
  • 🧰 MCP Integration - The kernel is built on MCP and also supports mounting MCP Servers to connect to real-world tools.

Quick Start

# Luanch with `npx`.
npx @agent-tars/cli@latest

# Install globally, required Node.js >= 22
npm install @agent-tars/cli@latest -g

# Run with your preferred model provider
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key

Visit the comprehensive Quick Start guide for detailed setup instructions.

Documentation

🌟 Explore Agent TARS Universe 🌟

UI-TARS Desktop

UI-TARS Desktop is a native GUI agent driven by UI-TARS and Seed-1.5-VL/1.6 series models, available on your local computer and remote VM sandbox on cloud.

Showcase

Instruction Local Operator Remote Operator
Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting.
Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub?

Features

  • 🤖 Natural language control powered by Vision-Language Model
  • 🖥️ Screenshot and visual recognition support
  • 🎯 Precise mouse and keyboard control
  • 💻 Cross-platform support (Windows/MacOS/Browser)
  • 🔄 Real-time feedback and status display
  • 🔐 Private and secure - fully local processing
  • 🛠️ Effortless setup and intuitive remote operators

Quick Start

See Quick Start

Contributing

See CONTRIBUTING.md.

License

This project is licensed under the Apache License 2.0.

Citation

If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:

@article{qin2025ui,
  title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
  author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
  journal={arXiv preprint arXiv:2501.12326},
  year={2025}
}

主要指标

概览
名称与所有者bytedance/UI-TARS-desktop
主编程语言TypeScript
编程语言 (语言数: 8)
平台
许可证Apache License 2.0
所有者活动
创建于2025-01-19 09:04:43
推送于2025-08-14 15:10:34
最后一次提交
发布数504
最新版本名称v0.2.3 (发布于 )
第一版名称@ui-tars/action-parser@1.0.0 (发布于 2025-01-22 21:29:46)
用户参与
星数16.3k
关注者数137
派生数1.4k
提交数682
已启用问题?
问题数419
打开的问题数229
拉请求数612
打开的拉请求数20
关闭的拉请求数46
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?