About This Site
How this budget explorer was built, what role AI played, and what you should keep in mind when using it.
What is the Pittsburgh Budget Explorer?
This site makes the City of Pittsburgh's FY2026 Operating and Capital Budgets easier to browse. The city publishes its budget as large PDF documents — hundreds of pages of tables, narratives, and financial data. This project extracts that data from the PDFs and presents it in a searchable, sortable, and visual format.
The goal is to make budget information more accessible to residents, journalists, researchers, and anyone who wants to understand how Pittsburgh plans to spend public money — without needing to manually dig through PDFs.
How was it built?
The project has two main parts: data extraction (getting numbers out of the PDFs) and the web application (this site, which presents the data). Both were built through a collaboration between a human developer and AI tools.
Data extraction
The budget PDFs don't contain neatly structured data — they contain images of tables laid out on pages. Extracting the data required building a custom spatial table extractor that reads the position of every word on each page, groups them into rows and columns based on their coordinates, and reassembles the table structure. This is more involved than simple text extraction because the tables use whitespace (not borders) to separate columns, and many tables span multiple pages or have complex multi-line headers.
The extractor processes over 100 PDF sections across both the operating and capital budgets. Its output is validated against a regression test suite of expected CSV files — one per table — to catch errors whenever the extraction logic is changed.
The web application
This site is a static website built with Astro and styled with Tailwind CSS. Charts are rendered with Observable Plot. The site is generated from the extracted CSV data — there is no backend server or database. Once built, the site is entirely static HTML, CSS, and JavaScript.
What role did AI play?
This project was built using AI-assisted development — specifically, Claude Code, an AI coding tool made by Anthropic. The AI was used extensively throughout the project:
- Writing code: The AI wrote the majority of the Python extraction code, the web application components, and the data transformation scripts.
- Debugging extraction issues: When tables didn't extract correctly, the AI analyzed the spatial layout of words on the page and adjusted the extraction logic.
- Building the website: Page layouts, navigation, charts, search functionality, and interactive data views were largely AI-generated.
- Content generation: The glossary definitions and some descriptive text on the site were drafted by AI based on the budget documents.
However, a human developer directed every step. The human decided what to build, reviewed the AI's output, identified extraction errors by visually comparing results against the source PDFs, and made editorial and design decisions. The AI did not independently decide what data to extract or how to present it — it worked under human direction and review.
Think of the AI's role as similar to a highly productive assistant: it can write code quickly and suggest approaches, but a human is responsible for the goals, quality checks, and final decisions.
Important caveats about accuracy
This site is not an official City of Pittsburgh publication.
The data shown here was mechanically extracted from the city's published budget PDFs. It has not been reviewed or endorsed by the City of Pittsburgh. For authoritative budget figures, always refer to the official budget documents.
Extracting tabular data from PDFs is inherently imperfect. While this project uses a regression test suite covering hundreds of tables to minimize errors, there are known risks:
- Misaligned columns: When table columns are very close together or values are unusually long, the extractor may assign a number to the wrong column. Dollar amounts could appear under the wrong fiscal year heading.
- Merged or split rows: Multi-line cell contents (like long department names) may be incorrectly split into separate rows or merged with adjacent rows.
- Missing data: Some tables with unusual formatting may not be extracted at all, or individual rows may be dropped if their layout doesn't match expected patterns.
- Number formatting: Parenthetical negative numbers, footnote markers, or stray characters near dollar amounts could cause values to be read incorrectly.
- Aggregation errors: Summary statistics, charts, and "what changed" comparisons on this site are computed from the extracted data. If the underlying extraction has errors, the summaries will reflect those errors.
If you spot something that looks wrong, you can verify it against the source PDF. The section pages on this site include page references that correspond to the original budget documents.
Source data and transparency
The source data for this site comes from two PDF documents published by the City of Pittsburgh's Office of Management and Budget:
- The FY2026 Operating Budget — covering department budgets, revenue, personnel, grants, trust funds, and financial forecasts.
- The FY2026 Capital Budget — covering the six-year capital improvement plan, including infrastructure, facilities, parks, and equipment projects.
No data on this site comes from sources other than these two documents. No figures have been manually entered, estimated, or editorially adjusted — every number shown was extracted programmatically from the PDFs.
The extraction code, test suite, and website source code are available for inspection. Transparency about how the data was processed is a core goal of this project.
Who made this?
This project was created by a Pittsburgh resident interested in making public budget data more accessible. It is an independent, volunteer effort and is not affiliated with or endorsed by the City of Pittsburgh government.