# Testing `pi-steel` Prompts for proving the extension works end to end inside `pi`. One prompt per line. Phrased the way a person would actually ask. Run each at least three times — web agents are noisy. Load the extension: ```bash pi -e /Users/nikola/dev/steel/steel-pi/dist/index.js ``` Or from this repo: ```bash pi -e . ``` Unit tests: ```bash npm test ``` ## Navigation and page identity Open https://example.com and tell me the page title and the final URL. Open https://example.com, then go back, and tell me where you ended up. Open https://example.com, then open https://news.ycombinator.com, then go back, and confirm you are on example.com again. Open https://httpstat.us/404 and tell me exactly what you see and what the URL resolved to. Try to open http://this-domain-should-not-exist-123.invalid and report the exact error without guessing. ## Screenshots and PDFs Open https://example.com and save a full-page screenshot. Give me the artifact path. Open https://example.com and save both a screenshot and a PDF. Confirm the two files are distinct and tell me their paths. Open https://news.ycombinator.com and take a screenshot of just the top navigation bar. Tell me which selector you used. Open https://example.com and try to screenshot a selector that does not exist. When that fails, recover with a full-page screenshot and report both attempts. ## Scraping and extracting Open https://example.com, scrape the page as markdown, and quote the main heading back to me. Open https://news.ycombinator.com and give me the first five story titles with their links as structured data. Open https://news.ycombinator.com, extract the first five story titles, then scrape the page as markdown, and confirm each extracted title actually appears in the scrape. Open https://httpbin.org/forms/post and list every visible form field with its label and type. Open https://example.com and tell me the visible text content in under 200 characters. ## Finding and clicking Open https://news.ycombinator.com and find the login link. Give me the top selector candidates and why you chose each. Open https://news.ycombinator.com, click the login link, and tell me the new page title and URL. Open https://news.ycombinator.com, click the login link, then go back, and prove you are on the front page again. Open https://news.ycombinator.com and click a selector that definitely does not exist. Return the raw error and whether the URL changed. ## Forms and typing Open https://httpbin.org/forms/post, fill in the customer name and telephone fields only, and return both the intended values and what the page actually shows in those fields. Open https://duckduckgo.com, type "steel browser" into the search box, submit, and give me the first three result titles. Open https://httpbin.org/forms/post, try to fill a field that does not exist, and report the exact failure instead of pretending it worked. ## Scrolling and waiting Open https://news.ycombinator.com, scroll to the bottom, and tell me the last visible story title. Open https://news.ycombinator.com, scroll down two viewports, extract five currently visible story titles, and confirm they appear in the scraped markdown after scrolling. Open https://www.google.com/maps/search/beauty+salons+in+seattle+wa, then use steel_scroll with selector `div[role="feed"]` to move the results pane down and confirm the visible listings changed. Open https://news.ycombinator.com, then use steel_scrape with format `markdown` and quote the first two story links. Open https://news.ycombinator.com, then use steel_scrape with the default format and confirm it returns readable text rather than raw HTML. Open https://example.com and wait for `h1` to appear before reading the page title. Open https://example.com and wait for a selector that will never appear with a 3 second timeout. Report the timeout cleanly. ## Session reuse Pin a session, open https://example.com, then in the same session open https://news.ycombinator.com, and confirm both pages were handled by the same browser instance. Pin a session, open https://news.ycombinator.com, click the login link, then release the session and tell me what state you left it in. Run two navigations back to back without pinning, and tell me whether a new session was created for each or the session was reused. ## Truthfulness Open https://example.com and tell me the color of every visible button. If there are no visible buttons, say so explicitly instead of inventing any. Open https://news.ycombinator.com and tell me whether there is a "Buy now" button. Do not claim it exists unless you can point to tool evidence. Open https://example.com and list every image on the page with its alt text. If there are no images, say that. ## Recovery Open https://news.ycombinator.com, try to click "Sign out", and when it fails, fall back to clicking "login" and report both attempts. Open https://example.com, try to extract a "pricing table", and when there is none, say so and offer what is actually on the page instead. Open https://httpbin.org/delay/5 with a 2 second timeout, let it fail, then retry with a longer timeout and report both runs. ## End-to-end journeys Open https://news.ycombinator.com, capture the first five story titles, take a screenshot, click through to the first story's comments page, and give me the story title, the comments URL, and both artifact paths. Open https://example.com, save a screenshot and a PDF, then navigate to https://news.ycombinator.com, save another screenshot, and return all three artifact paths with the URL each came from. Open https://duckduckgo.com, search for "hacker news", click the first organic result, confirm the final URL is news.ycombinator.com, and return a screenshot of the landing page. ## WebVoyager tasks Borrowed verbatim from the WebVoyager benchmark (https://github.com/MinorJerry/WebVoyager). Real sites, one clear goal, one checkable answer. Good for comparing our agent to published numbers. ### Friendly sites (no login, no heavy bot walls) Find a recipe for a vegetarian lasagna that has at least a four-star rating and uses zucchini on https://www.allrecipes.com. Find a five-star rated chocolate chip cookie recipe that takes less than 1 hour to make on https://www.allrecipes.com and tell me how many reviews it has. Compare the prices of the latest models of MacBook Air available on https://www.apple.com. Search https://arxiv.org for the latest preprints about "quantum computing" and give me the top three titles with authors. Read the latest health-related news article published on https://www.bbc.com/news and summarize the key points. Find the pronunciation, definition, and a sample sentence for the word "serendipity" on https://dictionary.cambridge.org. Search https://www.coursera.org for a beginner-level course on Python programming suitable for someone with no programming experience, and give me the top result. Look up the current standings for the NBA Eastern Conference on https://www.espn.com. Search https://github.com for an open-source project related to "climate change data visualization" and report the project with the most stars. Find a pre-trained sentiment analysis model on https://huggingface.co and return its name, downloads, and last update date. Ask https://www.wolframalpha.com for the derivative of x^2 at x = 5.6 and report the answer it returns. Use https://www.google.com to find the initial release date of "Guardians of the Galaxy Vol. 3" and return the date plus the source snippet. ### Hard sites (bot walls, captchas, heavy JS) Search https://www.amazon.com for an Xbox Wireless controller in green color rated above 4 stars and return the top result with price and rating. Find the cheapest available hotel room on https://www.booking.com for a three night stay starting 1 January in Jakarta for 2 adults, and return the hotel name and price. On https://www.google.com/travel/flights, show me one-way flights from Chicago to Paris for next Saturday and return the three cheapest options. Find 5 beauty salons with ratings greater than 4.8 in Seattle, WA on https://www.google.com/maps and return names, ratings, and addresses. ## Output contract For anything above where you care about grading, append: > Return JSON only with: task, status (success | partial | failure), tools_used, observed (raw facts from tool output), artifacts, errors, notes (your conclusions). Do not claim success without tool evidence.