The transparency gap

Public school districts produce a staggering amount of public data. Budgets, board meetings, enrollment numbers, test scores, school improvement plans. Almost none of it is easy to find, and even less of it is easy to understand.

I sit on the board of the Redwood City School District. I've watched parents try to navigate BoardDocs portals, squint at PDF budget documents, and give up trying to figure out when the next board meeting is. These are engaged parents. If they can't find it, who can?

It shouldn't be this hard. The information is already public. It just needs to be organized and made searchable.

So I built rcsd.info, an independent open-source data portal that pulls together public records from about a dozen sources and presents them in one place, in English and Spanish.

What's on the site

As of today, rcsd.info covers:

  • 58 board meetings with agendas, minutes, 1,724 attachments, and timestamped video links
  • 49 full meeting transcripts, generated from YouTube audio using AssemblyAI
  • 12 school profiles with enrollment, demographics, bell schedules, SARC data, and parent resource links
  • District calendars for 2025-26 and 2026-27
  • Budget and performance data from the district's LCAP, interim reports, and the CA School Dashboard
  • Special education enrollment by school and grade

Everything is bilingual and works on a phone.

How the pipeline works

The stack is simple on purpose. No frameworks, no database, no CMS. Just Node.js scripts that:

  1. Scrape meeting agendas from BoardDocs and Simbli (the district has used two different portals over the years)
  2. Download YouTube video metadata and audio for board meetings
  3. Transcribe meeting audio with AssemblyAI's Universal 3 Pro, which handles the English/Spanish mix well
  4. Map agenda items to video timestamps using Claude Haiku
  5. Generate static HTML from JSON data files
  6. Deploy to Cloudflare Pages, with data assets on R2

Every script is idempotent. It skips work that's already been done. You can blow away the output and re-run from scratch. No manual steps, no special credentials beyond API keys for transcription.

There's also an MCP server (Model Context Protocol) so AI assistants like Claude can query the data directly. Ask it about school schedules, meeting agendas, lunch menus, whatever.

Data provenance

If you're going to publish data about a school district, you'd better be able to trace every number back to its source. That's non-negotiable here:

  • Every pipeline has a methodology doc explaining where the data comes from and what transformations are applied
  • AI-generated content (meeting summaries, timestamp mappings) is labeled as such
  • Source documents link to the originals on the district's own sites
  • All the code is public on GitHub

Parents, journalists, and board members need to trust this data. If something looks wrong, anyone can trace it to the source and file an issue.

What I'd like to build next

The site works today, but there's a long list of things I haven't gotten to:

  • CAASPP test score trends by school, grade, and demographic group
  • Better coverage of committees (DLAC, LCAP advisory, safety) that don't always post public minutes
  • Live transcription and agenda tracking during board meetings
  • Richer school profiles: teacher retention, class sizes, parent survey results
  • Cross-district comparisons with neighboring districts

Adapt it for your district

The whole reason this is open source is so someone else can use it. If your district runs BoardDocs or Simbli (most California districts do), the scraping scripts should work with minor changes:

  1. Fork the GitHub repo
  2. Update data/schools.json with your schools
  3. Point the scraper at your district's BoardDocs or Simbli instance
  4. Add your district's YouTube channel
  5. Run the pipeline

The harder part is the supplementary data: calendars, demographics, budget documents. That stuff varies a lot between districts and usually requires some manual wrangling.

Get involved

This is a one-person volunteer project. I'm a software engineer who happens to sit on a school board. If you want to help or have a data request: