The quest to build the universal personal assistant (UPA) has been reignited, with both OpenAI and Google going head to head. But what would the ultimate personal assistant look like? Would it be an app, a browser, an OS, or even a physical robot?
Let’s think first about ideal product requirements for a universal personal assistant:
- Operator: Operates a computer the same way a human does—typing on the keyboard, moving/clicking the mouse, scrolling, etc.
- Integrator: Able to interact with tools/apps/OS via either GUI or APIs.
- Task Runner: Can run end-to-end tasks on multiple platforms (native OS, web) and across different apps.
- Scalable: Executes tasks faster than a human and in parallel.
- Private: Because the UPA might have root-level access on our device, local computation helps ensure privacy.
- Interactive: Requests confirmation or clarification from the user when needed.
- Respectful: Respects bot protection (CAPTCHAs, paywalls, robots.txt, firewalls, etc.).
- Headless: Allows the user and the assistant to work on the same computer simultaneously.
The technology to fulfill these requirements is rapidly emerging, which implies a new interface paradigm: from human → machine interaction to human → assistant → machine interaction. This approach can boost productivity and accessibility—particularly for people with disabilities.
Below, we’ll consider the main platforms on which a UPA might be built, evaluating each for:
- Third-Party Integrations
- Platform Integrations
- Compatibility
- Privacy
- Cost
- Scalability
- Mobile
- Overall Potential
We’ll then summarize and compare them in a simple scoring table.
1. Web App
Third-Party Integrations
- Integrations must happen via APIs. The user needs to authenticate and authorize the PA for various services. This can be cumbersome if you use many products.
- Payment information is often stored locally (e.g., OS keychain) or in browser storage, so the web app may not have direct access unless explicitly given permission.
- On the upside, most B2C and B2B products have a web application, so coverage is broad.
Platform Integrations
- Web apps do not have low-level access to hardware or platform-native apps.
- For example, a purely web-based PA cannot open your local code editor or run commands in a terminal unless there’s an exposed API or user plugin to facilitate that.
Compatibility
- The main benefit is cross-platform compatibility. Browsers run everywhere.
- Most tools have a web version, so it’s fairly open.
Privacy
- The assistant would typically run in the cloud. All that private data living on a remote server could concern some users.
Cost
- Building a single cross-platform web app is cheaper than creating separate native apps.
Scalability
- Tasks can be offloaded to a server, so the user’s local hardware isn’t heavily taxed. Parallel execution is feasible.
Mobile
- A web app can run on mobile browsers, so this approach works on smartphones and tablets as well.
2. Browser Extension
Third-Party Integrations
- A significant advantage is that users are often already logged into various services (e.g., Gmail, Drive, Trello) via the browser.
- The user doesn’t necessarily re-enter credentials; sessions/cookies handle that.
- However, the extension does require permission from the browser (e.g., “read data on all sites”) and the user must agree to those privileges. It’s not a zero-permission scenario, but it avoids re-typing logins for each new integration.
Platform Integrations
- Extensions have more privileges than regular web apps regarding browser data (e.g., local storage, bookmarks, etc.).
- However, they’re still limited to browser capabilities. Directly controlling native OS apps is harder unless additional bridging software is installed.
Privacy
- Since the browser handles authentication and payments, the extension itself might never see raw credentials.
- If the personal assistant uses local models (e.g., future small “on-device” LLMs), privacy is even better.
- Complex tasks, however, might need powerful remote models that the extension offloads to the cloud.
Scalability
- Each tab could run its own job. Parallelization is relatively straightforward.
- Running everything locally might create overhead if multiple instances of large models are required.
Cost
- Browser extensions can be simpler to develop than full native software. They leverage existing browser infrastructure.
Compatibility
- Extensions are typically limited to Chromium-based browsers (Chrome, Edge, Brave, etc.).
- Safari supports extensions too, but the development process is separate.
Mobile
- Chrome on mobile currently offers very limited extension support. Safari on iOS supports some extensions, but it’s more restrictive. Compatibility on mobile is not as strong as on desktop.
3. A Browser (Standalone)
Third-Party Integrations
- Like a web app, the user must authenticate each service, but the browser can use either API or GUI flows (clicking on login buttons, for example).
- This potentially supports a wider range of products than an API-only approach.
Platform Integrations
- A custom browser is a native app, so it has greater OS-level integration than a basic extension.
- But users must grant additional permissions for deeper integration with local apps. If you work with many apps, this can be cumbersome.
- The personal assistant can be deeply baked into the browser (e.g., voice commands, direct text interactions).
Privacy
- Similar benefits to an extension: the browser can handle auth and payments.
- A local AI model could be embedded (e.g., a future “on-device LLM”). Still, large or complex tasks might require server-based models.
Note: Currently, it’s rumored Google might incorporate smaller local LLMs (sometimes called “VLMs” or “Gemini Nano”), but that remains speculative.
Compatibility
- Developing a new browser for multiple platforms—Mac, Windows, iOS, Android, etc. Is not trivial, even using Chromium.
- Each OS version needs maintenance.
Scalability
- Browsers can run multiple tabs/processes in parallel, and the entire thing could run in headless mode.
- The assistant only needs to become visible when the user interacts or needs confirmations.
Cost
- Building a cross-platform native browser is complex and expensive.
Mobile
- A custom browser can work on mobile if you build versions for iOS/Android. This is doable, but each platform has its own store policies and technical requirements.
4. A Mobile App
Third-Party Integrations
- The user would authenticate via OAuth, meaning each integrated product needs an API.
- Many apps do not offer deep APIs, and mobile OSes sandbox apps, preventing direct control of other apps.
Platform Integrations
- A native mobile app does have good access to device hardware (camera, mic, etc.), but direct OS-level control is often restricted by sandboxing.
- Because a personal assistant is essentially an integrator, these restrictions can hinder advanced automations.
Compatibility
- Separate iOS and Android versions are required. The codebases will differ significantly.
Scalability
- Mobile apps often can’t run unlimited background processes. The OS may pause or kill them.
- Offloading tasks to the cloud is possible, but purely local solutions may struggle to parallelize large tasks.
Mobile
- Obviously, a mobile app can run on mobile devices, but constraints around sandboxing, battery usage, and background processing remain.
5. OS Integration
Third-Party Integrations
- A UPA integrated into the operating system itself has the most straightforward approach: it can leverage the OS keychain for credentials and truly unify web and native apps.
Platform Integrations
- Full OS-level access, not sandboxed. The assistant can open and manipulate all apps, services, and hardware resources.
Privacy
- The tech stack can run locally for maximum privacy. Credentials are handled by the OS.
- No need to share all data with a remote server, although remote tasks may still be desired for advanced AI computations.
Cost
- This is likely only feasible for major platform vendors (Apple, Microsoft, Google). Implementing an OS-level AI solution is expensive and typically out of reach for smaller companies.
Scalability
- The OS can run tasks in headless mode and notify the user only when necessary.
- Parallelization is limited by local hardware, but it’s still quite powerful.
Compatibility
- Only the company that controls the OS can do this effectively. So, this might remain exclusive to Apple, Google, Microsoft, etc.
- Third parties can’t easily create a universal OS-level integration for everyone.
Note: We’ve seen Apple’s “Intelligence” and Microsoft’s “Copilot”; neither is perfect yet. If only the OS vendors can build such an integration, innovation might be slower or less open to outside developers.
6. A New Hardware Device
Third-Party Integrations
- Typically relies on OAuth for web services; the user must add credentials for new products.
- Similar friction for each integration.
Platform Integrations
- New hardware implies a new OS or a heavily customized Android/Linux derivative.
- Pros and cons of a new platform: can be built from the ground up for an “assistant-first” experience, but limited existing software.
Privacy
- If the assistant runs on local hardware, data is more private.
- But to handle advanced tasks, you may still need remote AI models.
Cost
- Manufacturing new hardware is very expensive.
- Building a mass-market device with advanced AI would be a huge undertaking.
Scalability
- Could theoretically be designed for parallel background processes, but that requires robust on-device hardware, raising cost.
- Offloading to the cloud remains an option if connectivity is constant.
Compatibility
- A brand-new hardware/OS environment will have to integrate with existing services or rely on bridging apps/APIs.
- If the device runs Android (like some new wearable), it may reuse existing software to some extent.
Mobile
- Not directly applicable unless the new hardware is a phone or phone-like device.
7. A Robot
Many people’s dream assistant is a robot that can do house chores. Hardware exists, but “the brain” (AI) is still evolving. Such an assistant must operate in three environments: the real world, native OS, and the web.
- NVIDIA and Tesla (with Optimus) are exploring this domain.
- Currently, it’s easier to automate “white collar” digital tasks than “blue collar” physical tasks.
From a platform standpoint, this is similar to creating new hardware but with the added complexity of robotics and sensors.
Cost
- Extremely expensive; Tesla estimates Optimus could retail at around $30K.
- Plus, ongoing R&D and manufacturing costs.
Mobile
- N/A in the usual sense, though some robotic devices might use mobile-like hardware (ARM chips, etc.).
Evaluation
A simple scoring table below ranks each option across seven criteria (with an optional multiplier for certain criteria). Scores are on a 1–10 scale unless otherwise stated, then summed for a rough comparison. These numbers are subjective but help illustrate trade-offs.
| Platform | 3rd Party Integrations (×2) | Platform Integrations | Privacy | Compatibility | Cost | Scalability | Mobile | Total |
|---|---|---|---|---|---|---|---|---|
| Web App | 6 | 2 | 3 | 5 | 4 | 5 | 5 | 30 |
| Browser Extension | 8 | 3 | 4 | 4 | 5 | 4 | 2.5 | 30.5 |
| Browser | 8 | 4 | 5 | 2 | 2 | 4 | 5 | 30 |
| Mobile App | 4 | 3 | 4 | 1 | 2 | 1 | 5 | 20 |
| OS Integration | 10 | 5 | 5 | 1 | 1 | 4 | 5 | 31 |
| Hardware Device | 4 | 5 | 4 | 1 | 1 | 4 | 5 | 24 |
| Robot | 4 | 5 | 4 | 1 | 1 | 3 | 5 | 23 |
- Note on Browser vs. Browser Extension Mobile Scores:
- A standalone browser (with the assistant built in) can theoretically be packaged for iOS/Android, hence we give it a higher mobile score (5).
- A pure extension is poorly supported on mobile Chrome, hence 2.5. Safari has some extension support, but it remains limited.
Conclusion
From the table, OS Integration scores the highest in raw technical potential. However, only major OS vendors (Apple, Microsoft, Google) can realistically implement that at scale. For everyone else, web apps and browser extensions emerge as the most practical solutions. Both are strong “integrators” because:
- They allow cross-platform compatibility.
- They can interface with a wide variety of web-based products.
- They don’t require building entirely new OS or hardware ecosystems.
Between the two, a browser extension often enjoys more seamless authentication (thanks to existing sessions) and can bypass certain bot blockers by “simulating” a real user in a regular browser context. However, it’s less friendly on mobile devices. Meanwhile, a web app is fully cross-platform, including mobile, but might require repeated authentication for each service and can be blocked by strict bot protections.
In short, a web app + browser extension combo is the most viable for broad adoption, especially on desktop. On mobile, the web app may be more reliable than an extension until mobile browsers fully embrace extension frameworks. Over time, we might see OS-level personal assistants from big tech or entirely new device categories, such as wearables and robotics, once the underlying AI is mature enough.