Methodology and sources
New Port Richey Online runs on public records from the city. Nothing here is leaked, scraped behind a login, or guessed; every number traces back to a meeting video, an agenda PDF, or an adopted-budget book the city already publishes. This page lays out where the data comes from, how it gets cleaned and indexed, and where the system can be wrong.
Sources
Where the data comes from
- Swagit video archive
Every meeting the city posts a video for: City Council, the Community Redevelopment Agency (CRA), Work Sessions, and Special meetings. Goes back to August 2015. We pull the m3u8 audio streams via ffmpeg and run them through OpenAI Whisper for transcription, with Swagit-provided captions used when available.
- City transparency portal
Adopted budgets, annual financial reports (ACFRs), Capital Improvement Programs, and fee schedules. Adopted-budget PDFs from FY16-17 onward are parsed line-by-line into 76,860 structured budget rows.
- City Council page
Authoritative roster: mayor, deputy mayor, councilmembers, and their term start dates. Used to override the automated person dedup when it gets a name wrong.
- Boards and committees
Cultural Affairs, Environmental, Parks & Recreation, Land Development Review Board (LDRB), Pension Board, and the rest. We track when these come up across council meetings.
Pipeline
How meetings become pages
Each meeting passes through six stages, fully automated. Every stage is idempotent; re-running picks up only the meetings still missing that stage.
- Index — scrape the Swagit video listing for new meetings, plus the city site for new agenda PDFs.
- Transcribe — pull the m3u8 audio with ffmpeg, run OpenAI Whisper (large-v3, no fallback). Resulting transcripts are imperfect (machine speech recognition; expect typos, dropped words, and occasional name garbles) but searchable.
- Segment — split the meeting transcript into per-agenda-item chunks using the printed agenda titles as section markers, with an LLM pass to handle items that span the same time window.
- Extract — for each item, pull entities mentioned: people, businesses, addresses, ordinance numbers, dollar amounts, votes, and outcomes. This is what powers the chips on every meeting page.
- Summarize — generate a one-sentence editorial recap per meeting (the line you see under each meeting on the All Meetings page) and multi-pattern stories (the homepage cards) from the corpus.
- Dedup — collapse surface-form people rows ("Altman", "Mr. Altman", "Peter Altman", "Councilman Altman") into one canonical person via an LLM dedup pass plus a hand-curated roster override file for the cases the LLM gets wrong.
Limits
Where the data can be wrong
- Auto-transcripts are not verbatim. Speech-recognition errors are common, especially with names. Every transcript we surface is labeled as auto-generated and should not be quoted as official record. For exact wording, consult the meeting video or the city clerk.
- Person dedup is approximate. The LLM pass that merges name variants is conservative and sometimes leaves duplicates standing or merges similar names that are actually different people. The roster page is the most-curated; long-tail residents in Voices that keep coming back may have surface-form siblings still floating around.
- Budget data is from adopted budgets, not actual spending. The line items we have (76,860 of them) are what the council approved going into each fiscal year. Year-end actuals (in the city's annual financial report) sometimes differ.
- Some meetings have no transcript. A handful of older meetings have audio that won't transcribe cleanly (Granicus archive timeouts, garbled streams). Their pages still render, but without per-item transcript reveals or transcript-derived people quotes.
The numbers
What's in the corpus right now
Fine print
Independence
New Port Richey Online is an independent project. It is not affiliated with the City of New Port Richey, the Community Redevelopment Agency, Pasco County, or any government body. Every story page links to the underlying meeting, agenda, or budget so readers can verify any claim against its source.