|
142 | 142 | <span class="size-8 flex flex-row justify-center items-center"> |
143 | 143 |
|
144 | 144 |
|
145 | | -<svg xmlns="http://www.w3.org/2000/svg" width="20px" height="20px" viewBox="0 0 24 24"><path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="M6 6.878V6a2.25 2.25 0 0 1 2.25-2.25h7.5A2.25 2.25 0 0 1 18 6v.878m-12 0q.354-.126.75-.128h10.5q.396.002.75.128m-12 0A2.25 2.25 0 0 0 4.5 9v.878m13.5-3A2.25 2.25 0 0 1 19.5 9v.878m0 0a2.3 2.3 0 0 0-.75-.128H5.25q-.396.002-.75.128m15 0A2.25 2.25 0 0 1 21 12v6a2.25 2.25 0 0 1-2.25 2.25H5.25A2.25 2.25 0 0 1 3 18v-6c0-.98.626-1.813 1.5-2.122"/></svg> |
| 145 | +<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 256 256" |
| 146 | + class="size-5"> |
| 147 | + <rect width="256" height="256" fill="none"></rect> |
| 148 | + <line x1="208" y1="128" x2="128" y2="208" fill="none" stroke="currentColor" stroke-linecap="round" |
| 149 | + stroke-linejoin="round" stroke-width="32"></line> |
| 150 | + <line x1="192" y1="40" x2="40" y2="192" fill="none" stroke="currentColor" stroke-linecap="round" |
| 151 | + stroke-linejoin="round" stroke-width="32"></line> |
| 152 | +</svg> |
146 | 153 |
|
147 | 154 |
|
148 | 155 | </span> |
@@ -463,7 +470,33 @@ <h1 class="pr-2">OpenApps</h1> |
463 | 470 | </div> |
464 | 471 | </div> |
465 | 472 | <div class="typography w-full flex-1 *:data-[slot=alert]:first:mt-0"> |
466 | | - <p>Learn how to set up OpenApps, run a GPT-5 agent and make changes to the envrionment.</p> |
| 473 | + <p>Digital agents open the possibility of AI systems to complete tedious tasks on your behalf. For example, <code>"add an event to my calendar"</code> or even more complex multi-step tasks. Yet, today's agents are still not reliable enough for many applications. To get there, we need lots of data for training and evaluation + lots of research to develop new recipes for training and deploying reliable agents.</p> |
| 474 | +<p>A few definitions to settle you in:</p> |
| 475 | +<div class="admonition note"> |
| 476 | +<p class="admonition-title">Digital (UI) Agent:</p> |
| 477 | +<p>completes tasks by directly interacting with apps in the same manner as humans (by clicking, scrolling, typing on your behalf)</p> |
| 478 | +</div> |
| 479 | +<div class="admonition note"> |
| 480 | +<p class="admonition-title">Reward:</p> |
| 481 | +<p>measures whether the agent completed the given task</p> |
| 482 | +</div> |
| 483 | +<p><img alt="landing" src="../images/pomdp.png" /></p> |
| 484 | +<h2 id="agents-under-the-hood">Agents under the hood</h2> |
| 485 | +<p>Digital agents are powered by a foundation model that can understand both text and image inputs. |
| 486 | +Agents receive a screenshot of the current apps (in the same manner a human sees them) and the task goal ("delete Brooklyn Bridge from my favorite places"); depending on how you configure the agent, the agent can also track past actions or observations.</p> |
| 487 | +<p>The agent then outputs an action such as <code>click</code> or <code>type</code> that directly affects the apps. Throughout the interaction we monitor whether the action has completed the task to terminate the loop or a <code>max_steps</code> is reached.</p> |
| 488 | +<p>You can view configs for <code>configs/agents/default.yaml</code> containing:</p> |
| 489 | +<ul> |
| 490 | +<li>list of actions</li> |
| 491 | +<li><code>use_axtree</code>: produces simplified text representation of each app states as an input</li> |
| 492 | +<li><code>use_screenshot</code>: provides screenshot of app as an input</li> |
| 493 | +<li><code>save_som</code>: if true, saves set of marks in <code>log_outputs/<timestamp>/set_of_marks_coordinates.json</code> json (see example below)</li> |
| 494 | +</ul> |
| 495 | +<p>You can view <code>UI-Tars-1.5-7B.yaml</code> as an example of native computer-use which uses screenshots to output click, type, actions with coordinates. For an example of a multimodal agent that accepts simplified text inputs see <code>GPT-5.1.yaml</code>.</p> |
| 496 | +<h2 id="openapps-building-blocks-for-digital-agent-research">OpenApps: building blocks for digital agent research</h2> |
| 497 | +<p>OpenApps offers an easy to use environment that runs on one CPU written in Python for stuyding digital agents. OpenApps comes with six configurable apps for generating limitless data for training and evaluating digital agents.</p> |
| 498 | +<h3 id="hands-on-with-openapps">Hands on with OpenApps</h3> |
| 499 | +<p>Learn how to set up OpenApps, run a GPT-5 agent and make changes to the envrionment.</p> |
467 | 500 | <iframe width="560" height="315" src="https://www.youtube.com/embed/gzNW_LXE7OE?si=qLh-r_CvheMIgIWd" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> |
468 | 501 | </div> |
469 | 502 | </article> |
@@ -561,6 +594,36 @@ <h1 class="pr-2">OpenApps</h1> |
561 | 594 | <div class="flex flex-col gap-2 p-4 pt-0 text-sm"> |
562 | 595 | <p class="text-muted-foreground bg-background sticky top-0 h-6 text-xs">On This Page</p> |
563 | 596 |
|
| 597 | + |
| 598 | + |
| 599 | + |
| 600 | + <a href="#agents-under-the-hood" |
| 601 | + class="text-muted-foreground hover:text-foreground data-[active=true]:text-foreground text-[0.8rem] no-underline transition-colors data-[depth=3]:pl-4 data-[depth=4]:pl-6" |
| 602 | + data-active="false" data-depth="2"> |
| 603 | + Agents under the hood |
| 604 | + </a> |
| 605 | + |
| 606 | + |
| 607 | + |
| 608 | + <a href="#openapps-building-blocks-for-digital-agent-research" |
| 609 | + class="text-muted-foreground hover:text-foreground data-[active=true]:text-foreground text-[0.8rem] no-underline transition-colors data-[depth=3]:pl-4 data-[depth=4]:pl-6" |
| 610 | + data-active="false" data-depth="2"> |
| 611 | + OpenApps: building blocks for digital agent research |
| 612 | + </a> |
| 613 | + |
| 614 | + |
| 615 | + <a href="#hands-on-with-openapps" |
| 616 | + class="text-muted-foreground hover:text-foreground data-[active=true]:text-foreground text-[0.8rem] no-underline transition-colors data-[depth=3]:pl-4 data-[depth=4]:pl-6" |
| 617 | + data-active="false" data-depth="3"> |
| 618 | + Hands on with OpenApps |
| 619 | + </a> |
| 620 | + |
| 621 | + |
| 622 | + |
| 623 | + |
| 624 | + |
| 625 | + |
| 626 | + |
564 | 627 |
|
565 | 628 | </div> |
566 | 629 | <div class="h-12"></div> |
|
0 commit comments