Browser Tools
Direct browser control via MCP for AI-assisted automation.
Overview
Browser tools let AI assistants control your browser directly:
- Navigate to pages
- Click buttons and links
- Fill forms
- Extract content
- Take screenshots
When to Use Browser Tools
| Use Case | Approach |
|---|---|
| Exploring a new site | Use snapshot + direct tools |
| Repeated automation | Create a workflow |
| Testing/debugging | Direct tools |
| AI research tasks | Browser tools + extraction |
Core Workflow
1. Navigate
Navigate to https://github.com/trendingTool: navigate
2. Understand the Page
Take a snapshot of the current pageTool: snapshot
Returns YAML with element refs:
- main [ref=1]
- article "Repository" [ref=2]
- heading "repo-name" [ref=3]
- link "Star" [ref=4]3. Interact
Click the element with ref 4Tool: click with ref: 4
4. Extract Data
Extract the text from the page titleTool: extract with selector
Snapshot-Based Interaction
The snapshot tool returns element refs that provide reliable targeting:
Getting a Snapshot
// Tool call
{ "name": "snapshot" }
// Response
- Page URL: https://example.com
- heading "Welcome" [ref=1]
- button "Get Started" [ref=2]
- textbox "Email" [ref=3]Using Refs
Refs are more reliable than selectors for dynamic pages:
// Click by ref
{
"name": "click",
"arguments": {
"ref": 2,
"element": "Get Started button"
}
}
// Type by ref
{
"name": "type",
"arguments": {
"ref": 3,
"text": "user@example.com",
"submit": true
}
}Selector-Based Interaction
For workflows or when you know the selector:
CSS Selectors
{
"name": "click",
"arguments": {
"selector": "button.btn-primary"
}
}Text Selectors
{
"name": "click",
"arguments": {
"selector": "button:has-text('Submit')"
}
}Attribute Selectors
{
"name": "click",
"arguments": {
"selector": "[data-testid='submit-btn']"
}
}Scrolling in Containers
For nested scrollable areas (like chat lists):
1. Hover to Position
{
"name": "hover",
"arguments": {
"selector": ".message-list"
}
}2. Scroll from Position
{
"name": "scroll",
"arguments": {
"direction": "down",
"amount": 500
}
}The scroll uses the hover position, not viewport center.
Finding Selectors
Using capture_page
Get all interactive elements:
{ "name": "capture_page" }Returns:
# Page Capture: GitHub
**URL:** https://github.com/trending
## data-testid Selectors
| Selector | Count |
|----------|-------|
| `[data-testid="repo-card"]` | 25 |
## Interactive Elements
| Type | Text | Selector |
|------|------|----------|
| button | Star | `button.star-button` |
| link | Sign in | `a[href="/login"]` |Using Browser DevTools
- Right-click element → Inspect
- Find unique attributes
- Test selector in console:
document.querySelector('...')
Screenshots
Capture the current page:
{ "name": "screenshot" }Returns base64-encoded PNG that AI can analyze.
Example: Research Task
User: Find the top 3 trending Python repositories on GitHub
AI Actions:
- Navigate to GitHub trending
{ "name": "navigate", "arguments": { "url": "https://github.com/trending/python" }}- Snapshot the page
{ "name": "snapshot" }- Extract repository names
{ "name": "extract", "arguments": { "selector": ".repo-name" }}- Report findings
Example: Form Filling
User: Fill out the contact form on example.com
AI Actions:
- Navigate
{ "name": "navigate", "arguments": { "url": "https://example.com/contact" }}- Snapshot to find form fields
{ "name": "snapshot" }- Type into fields
{ "name": "type", "arguments": { "ref": 5, "text": "John Doe" }}
{ "name": "type", "arguments": { "ref": 6, "text": "john@example.com" }}
{ "name": "type", "arguments": { "ref": 7, "text": "Hello, I have a question..." }}- Submit
{ "name": "click", "arguments": { "ref": 8, "element": "Submit button" }}Limitations
What Browser Tools Can't Do
- Actions requiring native popups (file upload dialogs)
- Cross-origin iframe content (some cases)
- Browser-level actions (downloads, extensions)
- Sites with strong anti-automation
Workarounds
| Limitation | Workaround |
|---|---|
| File uploads | Use native automation or manual step |
| Anti-bot | Use real browser with human verification |
| iframes | Navigate to iframe src directly if possible |
Best Practices
1. Always Snapshot First
Before interacting with a new page:
Take a snapshot so I can see what elements are available2. Use Refs Over Selectors
Refs from snapshot are more reliable for dynamic pages.
3. Wait for Dynamic Content
If page loads dynamically:
- Navigate
- Wait a moment
- Snapshot
4. Verify Actions
After clicking/typing:
- Snapshot again
- Verify page changed as expected
5. Handle Errors Gracefully
If element not found:
- Re-snapshot
- Try alternative selector
- Check if page state changed
Troubleshooting
"Element not found"
- Snapshot the page to see current structure
- Verify selector matches
- Check if element is in iframe
- Element may be dynamically loaded
"Browser not connected"
- Check extension is installed
- Navigate to a webpage
- Click extension icon → Connect
Clicks not working
- Element may be covered by overlay
- Try scrolling element into view
- Check for hover states that must trigger first
- Use
hoverbeforeclick