ARCH

System Architecture

The pipeline is a split-agent, disk-persisted design. Raw data is always written to disk before reports are generated — this prevents context-window overflows and makes individual stages re-runnable independently.

Data Sources
GitHub API (gh CLI) engineers.json Linear MCP Sentry MCP Slack MCP Otter MCP Anthropic Billing

Agent 1
Data Pull
RAW FILES
JSON / MD on disk
Agent 2
Report Gen
HTML Reports
Per engineer
Master Synthesis
00_MASTER.html

Parallel Enrichment (runs once per sprint)
Linear MCP
+
Sentry MCP
+
Slack MCP
+
Otter MCP
Enrichment Agent
context-enrichment.md

Final Outputs
Sprint-Dashboard.html Engineer-90Day-Breakdown.html XX_Name-Deep-Dive.html (×26) 00_MASTER-Engineering-Synthesis.html
16
Repos checked per engineer
26
Engineers in system
90
Day rolling window
12
Charts per deep-dive report
~1,300
GitHub API calls (full run)
01

Weekly Checklist

This is the confirmed working deploy protocol as of March 15, 2026. The pipeline is fully automated — no manual data entry. Follow these steps top to bottom every Monday morning.

🔴
GitHub collection MUST run as a Claude agent — never background Bash

Background Bash commands lose macOS keychain authentication. Running collect_github.py in background Bash returns 0 PRs for every engineer silently. Always run it interactively through a Claude agent (ask Claude Code: "run python3 scripts/collect_github.py").

STEP 0 Pre-Run Checks 3 min
  • Confirm you are in the right directory: cd "/Users/lokicheema/Desktop/Hemut Files/Dashboard"
  • Confirm no orphaned collection processes: ps aux | grep collect_github | grep -v grep — if any found, kill them: pkill -f collect_github.py
  • Check engineers.json — any new hires or terminations this week? Update before running.
  • Check if GitHub data was already pulled today: python3 -c "import json; d=json.load(open('data/github.json')); print(d.get('generated_at',''))" — if today's date shows, skip to Step 2.
STEP 1 GitHub Data Collection (agent, not background Bash) 3–8 min

Ask Claude Code to run this command. Do NOT run it as a background Bash command or in a separate terminal.

python3 scripts/collect_github.py
  • Output shows timestamped progress per engineer: [HH:MM:SS] ✓ sumedh — 250 PRs, 4 repos, 8.2s
  • After completion, verify data was written:
# Check 1: file was updated today
ls -lh data/github.json

# Check 2: data has real PR counts (must be > 0)
python3 -c "import json; d=json.load(open('data/github.json')); print('Total PRs:', sum(v.get('total_prs',0) for v in d['engineers'].values()))"
  • Expected output: Total PRs: [some number above 0]. If 0, see Known Gotchas section (Issue G3).
  • Expected file size: >100 KB. If 19 KB, data is empty — see Issue G4.
⚠️
Skip this step if github.json was already pulled today

Check generated_at in data/github.json. If it matches today's date AND no new repos were added to engineers.json, skip directly to Step 2 with --skip-github.

STEP 2 Run Full Pipeline + Deploy 8–12 min

Standard weekly run (after fresh GitHub collection, or after any score_prs.py change):

cd "/Users/lokicheema/Desktop/Hemut Files/Dashboard"
./run.sh --skip-github --rebuild-cache --deploy

Fast re-run (github.json already pulled today, no scoring logic changes):

./run.sh --skip-github --deploy
# saves ~5 min and ~$0.08 vs --rebuild-cache

Fix-only re-run (after fixing generate_dashboard.py bugs, no new data needed):

./run.sh --skip-github --deploy
# no --rebuild-cache — re-uses scored data, only re-generates HTML

What the pipeline does (8–12 min total)

  • Step 2.1–2.7: CSV processing, PR scoring, data merge (~2 min)
  • Step 2.8: GPT-4o-mini scores PRs — silent for 3–5 min, this is normal
  • Step 2.9: Perplexity coaching queries — silent for 2–3 min, this is normal
  • Step 2.10: HTML generation (~1 min, generates 101+ files)
  • Step 2.11: Vercel deploy (~1 min)
⚠️
Long silences during steps 2.8 and 2.9 are normal

PR scoring (step 2.8) and coaching generation (step 2.9) run batch API calls with no intermediate output. Wait up to 5 min before concluding something is wrong. Only abort if no output for >15 min total.

STEP 3 QA — 5 Parallel Haiku Agents 5–8 min

This step is mandatory after every deploy. The dashboard is used in real 1:1 performance conversations — wrong numbers mean wrong assessments. See Section 13 for the full QA protocol and what to check.

  • Spin up 5 Haiku agents in parallel (one per batch)
  • Batch 1: sumedh.html, ashish.html, a2z2.html, avetis.html
  • Batch 2: federico.html, pranav.html, aaryan.html, harshiv.html
  • Batch 3: andrewfe.html, kalpan.html, darius.html, gabriele.html
  • Batch 4: tarun.html, tyler.html, justin.html, evan.html
  • Batch 5: sprint.html, team.html, spend.html
STEP 4 Fix Issues and Re-Deploy (if needed) 5–15 min
  • If QA found rendering bugs (float formatting, wrong counts, etc.): fix generate_dashboard.py
  • Re-deploy without re-scoring: ./run.sh --skip-github --deploy
  • If QA found data bugs (wrong PR counts, wrong spend): check source data files in data/ first
  • Re-run QA after every fix until all batches pass
Total time — Full pipeline run + QA ~20–25 min
Fast re-run (no new GitHub data, HTML fix only) ~12–15 min
02

Agent 1 Prompt Template — Data Pull

Agent 1 pulls all raw GitHub data for one engineer and saves it to disk. It makes live API calls and writes structured JSON/MD. Never combine Agent 1 and Agent 2 into a single agent — see Known Issues #4.

Variables to Fill

[ENGINEER_NAME]Full name (e.g. Sumedh Kane)
[GITHUB_HANDLE]Primary handle from engineers.json
[REPORT_NUMBER]01–26 (zero-padded)
[ROLE]From engineers.json role field
[MONTHLY_SALARY]Integer USD from engineers.json
[START_DATE]today − 90 days (ISO: YYYY-MM-DD)
[END_DATE]today (ISO: YYYY-MM-DD)
[KNOWN_DATA]Paste from prior week's report or engineers.json notes

Repos to Always Check

# All 16 repos in scope — always check all of them
Hemut-Prod
Voice-AI-Element
zones
RFP-Module
Web_Scrapers
hemut-sonar
tracking_backend
QuoteFlow_Backend
hemutchat
Hemut-News
SharedInfraLayer
MobileAppUpdated
Helm
Hemut-MCP
Hemut-Library
Hemut-Website

Skip Logins (Always Exclude)

# These are bots/service accounts — never count as engineer activity
hemut-swe          # Andrew FE Zhang's service account (zhang@hemut.com)
dependabot[bot]
use-tusk[bot]

Core Agent 1 Commands

# 1. Pull all merged PRs in window (run per repo)
gh pr list \
  --repo Hemut2025/[REPO] \
  --state merged \
  --author [GITHUB_HANDLE] \
  --json number,title,additions,deletions,mergedAt,body,labels,createdAt,headRefName \
  --limit 300 \
  | jq '[.[] | select(.mergedAt >= "[START_DATE]" and .mergedAt <= "[END_DATE]")]'

# 2. Per-PR detail: files changed
gh api repos/Hemut2025/[REPO]/pulls/[PR_NUMBER]/files

# 3. Per-PR detail: reviews received
gh api repos/Hemut2025/[REPO]/pulls/[PR_NUMBER]/reviews

# 4. Per-PR detail: commits
gh api repos/Hemut2025/[REPO]/pulls/[PR_NUMBER]/commits

# 5. CI runs for this engineer
gh api "repos/Hemut2025/[REPO]/actions/runs?actor=[GITHUB_HANDLE]&per_page=100"

# 6. Direct commits (not via PR)
gh api "repos/Hemut2025/[REPO]/commits?author=[GITHUB_HANDLE]&since=[START_DATE]T00:00:00Z&until=[END_DATE]T23:59:59Z&per_page=100"

# 7. Pre-window activity check (CRITICAL for tenure correction)
gh api "repos/Hemut2025/[REPO]/commits?author=[GITHUB_HANDLE]&until=[START_DATE]T00:00:00Z&per_page=1" \
  --jq '.[0].commit.author.date'

# 8. Reviews given by this engineer (not received)
gh api "repos/Hemut2025/[REPO]/pulls/[PR_NUMBER]/reviews" \
  | jq '[.[] | select(.user.login == "[GITHUB_HANDLE]")]'

# 9. Check rate limit before and after heavy batches
gh api rate_limit --jq '.resources.core | {remaining, reset}'
ℹ️
Sampling Rule

>100 PRs: sample every 5th PR for files/reviews/commits detail. 50–100 PRs: sample every 3rd PR. <50 PRs: pull detail for all PRs.

Output JSON Structure

{
  "engineer": {
    "name": "[ENGINEER_NAME]",
    "github": "[GITHUB_HANDLE]",
    "role": "[ROLE]",
    "monthly_salary": [MONTHLY_SALARY],
    "known_data": {}
  },
  "repos_checked": [],          // list of all 16 repos, even those with 0 PRs
  "errors": {},                  // any repos that 404'd or errored
  "prs": [],                      // full PR objects with files/reviews/commits embedded
  "ci_runs": [],                 // CI run objects per repo
  "reviews_given": [],           // PRs this engineer reviewed (authored by others)
  "direct_commits": [],          // commits not attached to a PR
  "pre_window_first_commit": null, // ISO date string if exists; null = new hire
  "pull_timestamp": "",           // ISO datetime when Agent 1 ran
  "window": {
    "start": "[START_DATE]",
    "end": "[END_DATE]"
  }
}

Completion Signal

# Agent 1 must end with this exact line:
AGENT 1 COMPLETE — [NUMBER] PRs across [REPOS] repos. Saved to data/[XX]_[Name]-raw-data.json
Save path convention

File saves to: Outputs/2026-03-14_Engineer-Deep-Dives/data/[REPORT_NUMBER]_[firstname-lastname]-raw-data.json
Example: data/01_sumedh-kane-raw.json

03

Agent 2 Prompt Template — Report Generation

Agent 2 reads only from disk — zero live API calls. It reads the JSON produced by Agent 1 and generates the full HTML deep-dive report.

🔴
Never make live API calls in Agent 2

Agent 2 is a pure read-from-disk → write-HTML transformer. Any live calls violate the split-agent contract and make the stage non-repeatable.

Input / Output

# INPUT
data/[XX]_[Name]-raw-data.json   # or .md for older format files

# OUTPUT
[XX]_[Name]-Deep-Dive.html       # saved to Outputs/2026-03-14_Engineer-Deep-Dives/

Required Report Sections

00Executive Summary (1-page verdict)
01Engineer Profile & KPI Strip
0290-Day PR Inventory
03Merit Assessment
04Code Quality Deep Dive
05Collaboration & Influence
06AI Tooling Analysis
07Work Pattern & Wellbeing
08Business Alignment & ROI
09Charts (all 12 via Chart.js CDN)
10Synthesis & Verdict

Required 12 Charts (Chart.js CDN)

# Chart.js CDN (always use this exact URL)
https://cdn.jsdelivr.net/npm/chart.js

# The 12 required charts:
1.  Weekly velocity line (13 weeks + team avg + frontier reference lines)
2.  Additions vs deletions stacked area chart
3.  Work type donut (Feature / Bug / Refactor / Infra / Chore / Test / Docs)
4.  Time-of-day heatmap (7 days × 24 hours, derived from commit timestamps)
5.  90-day commit calendar (13×7 HTML grid — NOT Chart.js, pure HTML/CSS)
6.  AI spend vs PR output dual-axis bar+line
7.  PR size histogram (bins: XS <50, S 50-200, M 200-500, L 500-2K, XL >2K lines)
8.  Depth/complexity trend over 13 weeks
9.  Cost efficiency trend ($/PR vs $9.15 frontier threshold)
10. Codebase coverage radar (one axis per repo)
11. PR lifecycle funnel (opened → reviewed → merged, with day counts)
12. Review quality scatter (reviews given vs review depth)

Design Constants

/* CSS variables — use these in every report */
--bg: #0f1724;
--surface: #1B2A4A;
--gold: #C9A84C;
--green: #2ecc71;
--red: #e74c3c;

/* Fonts (Google Fonts CDN) */
IBM Plex Sans   — body, UI
IBM Plex Mono   — code, metrics, chart labels

/* Google Fonts URL */
https://fonts.googleapis.com/css2?family=IBM+Plex+Mono:wght@400;500;600&family=IBM+Plex+Sans:wght@300;400;500;600;700&display=swap

Tenure Correction Rule — CRITICAL

⚠️
Always apply the tenure correction before calculating any velocity metric

Check pre_window_first_commit in the JSON.

If NOT null → engineer existed before the window. Any gap at the start of the 90-day window is a real dark period. Use the full 13-week denominator.

If null → engineer is new. Adjust all velocity metrics: PRs / (days_since_first_PR_in_window / 7), NOT PRs / 13 weeks. Reporting 0 weeks of activity as "inactive" is a critical error.

04

Enrichment Agent Prompt

The Enrichment Agent runs once per sprint via MCP tools. No gh CLI needed. It provides the context layer — Linear ticket linkage, Sentry errors, Slack signals, and Otter meeting mentions — that GitHub data alone cannot provide.

Output Files

data/linear-data.md           — teams, cycles, issue counts, ticket linkage per engineer
data/sentry-data.md           — open issues per project, error counts, top issues
data/meetings-calendar-data.md — relevant Otter transcripts with engineer mentions
data/context-enrichment.md    — combined synthesis of all three sources above

Linear Pulls (via MCP)

# MCP tool: mcp__linear__list_teams
# → Get all team IDs and names

# MCP tool: mcp__linear__list_cycles (per team)
# → For each engineering team: SWAG, ASH, CRODIE, GUDA, ADT
# → Identify current active cycle

# MCP tool: mcp__linear__list_issues (per team, limit 50)
# → For each team: pull 50 most recent issues
# → Current cycle completion rate = completedIssueCount / totalIssueCount

# Ticket linkage per engineer:
# Count issues where gitBranchName CONTAINS engineer's github_handle
# This is the primary ticket-linkage metric

# Save as: data/linear-data.md

Sentry Pulls (via MCP)

# MCP tool: mcp__sentry__search_issues

Organization: hemut

# Projects to check:
command-backend
command-frontend
hemut-quoteflow-backend

# Query: unresolved errors in the last 90 days
Query: "is:unresolved"
Date range: last 90 days

# Save: count of unresolved issues, top 10 by event count
# Save as: data/sentry-data.md

Slack Pulls (via MCP)

# MCP tool: mcp__slack__slack_search_public_and_private

# Primary channel query
Query: "in:#hemutdevs after:[DATE_30_DAYS_AGO]"

# What to look for and save:
# - Sprint retrospective updates or announcements
# - Engineer shoutouts or performance callouts
# - Blockers or incident reports
# - Any anomalies (sudden silence, repeated issues)
# - Messages mentioning specific engineers by @handle or name

# Save relevant messages with: sender, timestamp, channel, full text
# Save as: data/context-enrichment.md (Slack section)

Otter Pulls (via MCP)

# Step 1: List recent transcripts
# MCP tool: mcp__otter__otter_list_transcripts
# → Get last 20 transcripts

# Step 2: For each transcript with engineer names in summary
# MCP tool: mcp__otter__otter_get_transcript
# → Pull full transcript text

# Key patterns to extract and save:
# - Offer discussions (comp, equity, timeline mentions)
# - Performance mentions (positive or corrective)
# - Technical decisions or architectural discussions
# - Sprint planning decisions that affect workload

# Save as: data/meetings-calendar-data.md
05

Master Synthesis Agent Prompt

The Synthesis Agent reads ALL data files and produces the single master HTML. It does no live data pulls — everything comes from disk. This is the most context-heavy agent in the pipeline; give it a full context window.

Input Files (read all before generating)

# Raw data files (all 26)
data/01_sumedh-kane-raw.json
data/02_Ashish-Lathkar-raw-data.json
data/03_... through data/26_...

# Enrichment context
data/context-enrichment.md
data/linear-data.md
data/sentry-data.md
data/meetings-calendar-data.md

# Benchmark reference
data/elite-benchmarks.md         ← frontier benchmarks (update manually quarterly)

# Engineer configuration
/Users/lokicheema/Desktop/Hemut\ Files/Dashboard/engineers.json

Required Output Sections

00Executive Summary (5 key findings)
01Corrected Rankings Table (dashboard vs actual)
02Team Tiers (visual tier cards)
03Dashboard Accuracy ("we were wrong" section)
04Systemic Issues (4 patterns)
05Individual Cards (all 26, 5 groups)
06Action Items (red / yellow / green)
07FT Conversion Pipeline
08AI Spend Analysis (treemap + ROI)
09Production Health (Sentry)
10Team Composition

Section 04 — Systemic Issues (always include all 4)

  • Ticket linkage gaps — engineers without Linear branch name linkage (invisible in product tracking)
  • Review concentration — is one person reviewing most PRs? Bus factor risk.
  • AI tooling gaps — engineers with $0 AI spend vs peers. Are they blocked or unaware?
  • PR description discipline — % of PRs with <20 chars description. Affects reviewability and on-call debugging.

Output File

Outputs/2026-03-14_Engineer-Deep-Dives/00_MASTER-Engineering-Synthesis.html
06

engineers.json Maintenance

This file is the single source of truth for all engineer metadata. Everything downstream — Agent 1 prompts, report generation, the sprint dashboard, billing attribution — reads from it.

⚠️
Actual file location

/Users/lokicheema/Desktop/Hemut Files/Dashboard/engineers.json
This is NOT inside the Outputs folder. Don't move it.

Schema — Fields Per Engineer

{
  "id": "sumedh",                         // short key, lowercase, no spaces
  "name": "Sumedh Kane",
  "github_handle": "sumedhkane03",       // primary handle (verify by checking commits)
  "github_handle_alt": null,              // secondary handle if exists
  "cursor_email": "sumedh@hemut.com",    // email for Cursor/Claude billing attribution
  "claude_key_prefix": "sk-ant-...",     // prefix in Anthropic billing to match spend
  "role": "Founding Engineer",
  "monthly_salary": 5000,                // integer USD, 0 if unpaid/terminated
  "tier": 1,                              // 1-5 performance tier (1 = top)
  "status": "active",                     // active | terminated | unresolved | external
  "notes": ""                              // alt emails, billing quirks, anything unusual
}

Weekly Maintenance Checklist

  • New hire this week? Add entry with all fields filled.
  • Termination this week? Change status to "terminated", set monthly_salary to 0.
  • Salary change? Update monthly_salary.
  • GitHub handle confirmed? Update github_handle and add old one to github_handle_alt.
  • Tier change based on corrected metrics? Update tier.

Known Handle Quirks — Permanent Reference

# Federico Lora
Commits from: lora.fed.03@gmail.com (personal) + federico@hemut.com
Both emails → handle: lorafed
# Must search BOTH emails in billing attribution

# Pranav Guda — TWO ACCOUNTS
Primary: PranavGuda (pranav@hemut.com)
Dev/merge: hemoot (developer@hemut.com)
# Pull data for BOTH handles and merge in Agent 1

# Andrew FE Zhang — service account confusion
hemut-swe = zhang@hemut.com = Andrew FE Zhang's service account
Personal handle: zhemut
# Add hemut-swe to skip_logins. Use zhemut for pull.

# TWO ANDREW ZHANGS — completely different people
zhemut    = Andrew FE Zhang (founding eng, $5K/mo, zhemut GitHub)
zhangandrew2 = A2Z2 (summer intern, $500/mo, separate person)
# Never merge their data. Always confirm by email.

# Darius Mahjoob — bundled account history
Pre-Oct 2025 activity: under shared Hemut account, unattributable
Individual GitHub history starts: Oct 2025 under DMahjoob
# Use Linear join date as true start date proxy for pre-Oct work

# Bao Tran
GitHub: BaoT1301
Email: bao@hemut.com
# Separate from Pranav — different person entirely
07

Rate Limit Management

GitHub's authenticated rate limit is 5,000 requests/hour. A full 26-engineer run uses ~900–1,300 calls — well within budget. These numbers let you estimate safely.

5,000
GitHub API requests/hour (auth)
~1,300
Max calls for full 26-person run
6
Max parallel Agent 1s
3,700
Headroom after full run

Call Budget Per Engineer

Engineer Type PR Count Sampling Est. API Calls
Small <20 PRs All PRs, full detail 15–25 calls
Medium 20–100 PRs Every 3rd PR for detail 40–80 calls
Large >100 PRs Every 5th PR for detail 60–100 calls
Zero/Inactive 0 PRs Repo checks only 5–10 calls

Total Budget Breakdown (Full 26-Person Run)

GroupCountCalls eachTotal
Active engineers (medium/large)14–16~60–80~900–1,200
Partial/zero engineers10–12~5–10~100
TOTAL26~1,000–1,300

Parallelization Rules

# SAFE: Run up to 6 Agent 1s simultaneously
# UNSAFE: Running 7+ — will hit rate limits mid-pull

# If rate limited:
# → Agent will receive HTTP 403 with "rate limit exceeded"
# → Do NOT cancel. Wait for retry. Limit resets every hour on the hour.
# → Check reset time: gh api rate_limit --jq '.resources.core.reset'
# → Convert unix timestamp to local time to know when to retry

# Stagger if needed: start each Agent 1 with a 30-second gap
# This distributes the burst load

# Check remaining limit before starting a batch:
gh api rate_limit --jq '.resources.core | {remaining, reset}'

Sampling Rules

ℹ️
When to apply sampling

>100 PRs → sample every 5th PR for files/reviews/commits detail
50–100 PRs → sample every 3rd PR
<50 PRs → pull detail for all PRs

Sampling applies only to the per-PR deep-detail calls (files, reviews, commits). Always pull the full PR list metadata for every PR.

08

Weekly Sprint Dashboard Update

The Sprint Dashboard is the primary weekly tool — updated every sprint, even when you're not running full deep-dives. This is the 45-minute version of the workflow.

Files to Update

Outputs/2026-03-14_AI-Spend-Analytics/Sprint-Dashboard.html
Outputs/2026-03-14_AI-Spend-Analytics/Engineer-90Day-Breakdown.html

Pull Sprint PR Data

# PRs merged this sprint (run per repo, replace SPRINT_START with ISO date)
gh pr list \
  --repo Hemut2025/Hemut-Prod \
  --state merged \
  --json author,title,additions,deletions,mergedAt,labels \
  --limit 500 \
  | jq '[.[] | select(.mergedAt >= "SPRINT_START")]'

# Aggregate by author (total PRs per engineer this sprint)
  | jq 'group_by(.author.login) | map({login: .[0].author.login, count: length})'

# CI pass rate this sprint (per repo)
gh api "repos/Hemut2025/Hemut-Prod/actions/runs?per_page=100" \
  --jq '[.workflow_runs[] | {actor: .actor.login, conclusion: .conclusion, created_at: .created_at}]'

# CI pass rate formula:
# success_runs / (success_runs + failure_runs) — exclude skipped/cancelled

Pull AI Spend Data

# Anthropic Console:
console.anthropic.com → Billing → Usage → Export CSV
# Match rows by cursor_email or claude_key_prefix from engineers.json

# Cursor seats:
cursor.sh → Team Settings → Billing → Export usage
# Match by cursor_email from engineers.json

What to Update in SPRINT_DATA

// In Sprint-Dashboard.html, find the SPRINT_DATA constant and update:
const SPRINT_DATA = {
  engineers: [
    {
      id: "sumedh",
      sprint: {
        prs: 12,                // ← PRs merged this sprint
        aiSpend: 48.50,          // ← AI spend this sprint in USD
        ciPassRate: 0.94          // ← 0.0–1.0
      },
      ninety: {
        prs: 87,                // ← rolling 90-day PR total
        aiSpend: 312.00,         // ← rolling 90-day AI spend
        dpr: 6.7                 // ← dollars per PR (aiSpend / prs)
      },
      sprintNote: ""              // ← new signals or flags this sprint
    },
    // ... repeat for all engineers ...
  ]
}
⚠️
Update BOTH files — they have separate SPRINT_DATA blocks

Sprint-Dashboard.html and Engineer-90Day-Breakdown.html are independent HTML files with their own embedded data objects. Updating one does not update the other.

09

Output Folder Structure

All engineering analytics live in one canonical folder. Do not rename or move it — the folder name is the baseline date, not the current date.

Outputs/ └── 2026-03-14_Engineer-Deep-Dives/ ← ALL engineering analytics live here ├── 00_MASTER-Engineering-Synthesis.html ← Regenerate every sprint ├── 01_Sumedh-Kane-Deep-Dive.html ├── 02_Ashish-Lathkar-Deep-Dive.html ├── 03_A2Z2-Andrew-Zhang-Deep-Dive.html ├── 04_Avetis-Avagyan-Deep-Dive.html ├── 05_Federico-Lora-Deep-Dive.html ├── 06_Pranav-Guda-Deep-Dive.html ├── 07_Aaryan-Jadhav-Deep-Dive.html ├── 08_Harshiv-Thakkar-Deep-Dive.html ├── 09_Andrew-FE-Zhang-Deep-Dive.html ├── 10_Kalpan-Bariya-Deep-Dive.html ├── 11_Darius-Mahjoob-Deep-Dive.html ├── 12_Gabriele-Ghione-Deep-Dive.html ├── 13_Tarun-Vadapalli-Deep-Dive.html ├── 14_Tyler-Kim-Deep-Dive.html ├── 15–25_[Name]-Deep-Dive.html ← generate on-demand ├── 26_Justin-Jiang-Deep-Dive.html ├── WEEKLY-RUNBOOK.html ← this document ├── AGENT1-DATA-PULL-TEMPLATE.md ← copy-paste prompt for Agent 1 ├── AGENT2-REPORT-TEMPLATE.md ← copy-paste prompt for Agent 2 ├── DEEP-DIVE-PROMPT.md ← legacy single-agent prompt ├── README-HOW-TO-RUN.md ← original quick-start └── data/ ├── 01_sumedh-kane-raw.json ├── 02_Ashish-Lathkar-raw-data.json ├── ... 03–26 ... ├── context-enrichment.md ← enrichment agent output ├── linear-data.md ← Linear MCP pull ├── sentry-data.md ← Sentry MCP pull ├── meetings-calendar-data.md ← Otter MCP pull └── elite-benchmarks.md ← frontier benchmarks (update quarterly) ⚠ engineers.json ACTUALLY LIVES AT: /Users/lokicheema/Desktop/Hemut Files/Dashboard/engineers.json (NOT inside the Outputs folder — do not move it) Outputs/2026-03-14_AI-Spend-Analytics/ ← Sprint dashboard lives here (separate folder) ├── Sprint-Dashboard.html ← primary weekly tool └── Engineer-90Day-Breakdown.html
ℹ️
Versioning policy

Do NOT rename the folder each sprint. Overwrite files in place. The folder name 2026-03-14 is the baseline/creation date, not the update date. Use git history if you need to recover a prior version.

10

Known Issues & Lessons Learned

Hard-won lessons from Sprint 13. Each one cost time to discover. Don't repeat them.

ISSUE 01
Dashboard Undercounting — Root Cause Fixed

The original Sprint Dashboard tracked only Hemut-Prod and a few other repos. Missing repos caused 5× undercounts for Gabriele, 2× for Darius, 1.5× for Ashish and A2Z2. Engineers appeared low-output when they were actually active — just in repos the dashboard didn't know about.

✓ Fix: engineers.json now has the complete 16-repo list. Agent 1 checks all repos every run.
ISSUE 02
Tenure vs Dark Periods — Don't Penalize New Hires

Engineers who joined mid-window showed "dark periods" at the start of the 90-day window that were actually just their pre-hire period. Reporting a new hire as "50% inactive" is misleading and incorrect.

✓ Fix: Always pull pre_window_first_commit. If null = new hire, adjust velocity denominator. If not null = real dark period, report it.
ISSUE 03
The Bundled Account Problem — Pre-Individual-GitHub Era

Before Hemut issued individual GitHub accounts, the team used a shared account. Darius's work before Oct 2025 is under the bundled account and cannot be individually attributed via GitHub. His GitHub history under DMahjoob starts October 2025.

✓ Fix: Noted in engineers.json notes field. Use Linear join date as the true start date proxy for pre-Oct activity context.
ISSUE 04 — CRITICAL
Split-Agent Approach Is Required — Never Combine Into One

A single agent that both pulls data AND generates the report fails at >50 PRs. The combined context (all PR data + chart generation + HTML writing) exceeds the model's effective context window, causing mid-run errors that lose all previously pulled data. You then have to start from scratch.

✓ Fix: Always Agent 1 (save to disk) → Agent 2 (read from disk). Two separate agent invocations. Never combine them, no exceptions.
ISSUE 05
Parallelization Limit — 6 Max, Not Unlimited

Running >6 Agent 1s simultaneously causes GitHub rate limit hits mid-pull, causing agents to either fail or produce partial data. Partial data is worse than no data because it's hard to detect.

✓ Fix: Max 6 concurrent Agent 1s. Fire Agent 2 immediately when each Agent 1 completes — pipeline becomes naturally self-staggering.
ISSUE 06
hemut-swe Handle Is Andrew FE Zhang's Service Account

The hemut-swe GitHub account (zhang@hemut.com) belongs to Andrew FE Zhang but is a service/merge account. If you pull data for both zhemut and hemut-swe, you double-count his output. If you pull for neither, you miss his work. Use only zhemut.

✓ Fix: hemut-swe is in the skip_logins list. Agent 1 pulls only via zhemut for Andrew FE Zhang.
ISSUE 07
PR Description Quality — Report Distribution, Not Average

Several engineers have 0-character descriptions on routine PRs but excellent, detailed descriptions on complex ones. Reporting "0 avg description length" misrepresents their behavior. The bimodal distribution is actually a reasonable practice (don't over-document trivial changes).

✓ Fix: In the report, show distribution (% with <20 chars, % with 20–200 chars, % with >200 chars) rather than a single average.
ISSUE 08
CI Pass Rate Methodology Must Be Consistent

The Sprint Dashboard used a different CI pass rate calculation than the Agent 1 pulls, making the two documents contradict each other. The dashboard counted all run statuses; the agent excluded skipped/cancelled runs. This made the same engineer appear at different pass rates in different views.

✓ Fix: Standard formula everywhere — success_runs / (success_runs + failure_runs). Always exclude skipped and cancelled runs from both numerator and denominator.
11

Quick Reference — All 26 Handles

Print this and keep it next to your keyboard. Sorted by report number. Verify handles against engineers.json before each run.

TWO ANDREW ZHANGS — DO NOT CONFUSE. EVER.
zhangandrew2 = A2Z2 Andrew Zhang — SWE INTERN, T3, works on fleet/Hemut-Prod, $500/mo
zhemut = Andrew FE Zhang — FOUNDING ENGINEER, T1, works on RFP Module, $5,000/mo

engineers.json is the source of truth — always look up the handle there. Never resolve from memory. The handle swap caused the worst data integrity failure in dashboard history (146 PRs attributed to wrong person).

# Name GitHub Handle Email $/mo Status
01Sumedh Kanesumedhkane03sumedh@hemut.com$5,000Active
02Ashish LathkarAshishlathkar77ashish@hemut.com$5,000Active
03A2Z2 Andrew Zhangzhangandrew2a2z2@hemut.com$500Active
04Avetis AvagyanAvetis-Avavetis@hemut.com$5,000Active
05Federico Loralorafedfederico@hemut.com$5,000Active (DARK)
06Pranav GudaPranavGuda / hemootpranav@hemut.com$5,000Active
07Aaryan JadhavAaryanJ45aaryan@hemut.com$500Active
08Harshiv Thakkarharshiv49harshiv@hemut.com$5,000Active
09Andrew FE Zhangzhemutandrewzhang@hemut.com$5,000Active
10Kalpan Bariyakalpan-hemutkalpan@hemut.com$5,000Active
11Darius MahjoobDMahjoobdarius@hemut.com$500Active (remote)
12Gabriele Ghionegabrieleghioneuscgabriele@hemut.com$1,000Active
13Tarun VadapalliTeleVision05tarun@hemut.com$5,000Active
14Tyler Kimtjsooktyler@hemut.com$0Unresolved
15Justin Jiangjustinjiang37justin.jiang@hemut.com$1,000Active
16Evan AdamiEvanAd7evan@hemut.com$0Part-time
17Bao TranBaoT1301bao@hemut.com$0Unresolved
18Sahanasahanashetty11sahana@hemut.com$0TERMINATED
19Xavier5029xaviertian.alex98@gmail.com$0External
20Anianiruthsivakumar00ani@daydreams.digital$0External
21Alex Kruchtenalex@hemut.com$5,000Active (Bain)
22Len Moranlen@hemut.com$0Active (UCLA)
23Aidan Tacktack@hemut.com$1,500Active
24George Colegt-cgeorge@hemut.com$0Unresolved
25Janjmbassijan@hemut.com$0Unresolved
26Nishant Singhnishant@hemut.com$0Active

Status Key

Active
Current, paid or unpaid contributor
Active (DARK)
Active but no recent commits
TERMINATED
No longer with Hemut
Unresolved
Status unclear, needs confirmation
External
Contractor / outside contributor
12

Known Gotchas — Read Before Debugging

These are the failure modes discovered the hard way. Each one is a time sink if you hit it cold. Check this section before spending more than 5 minutes debugging any pipeline failure.

GOTCHA G1 — AUTH
gh auth status exits code 1 even with a valid token

On this machine, gh auth status exits code 1 due to a keyring warning about the hemoot account — even when the active token is fully functional. If you see "not authenticated" during collection, this is almost certainly a false negative.

✓ Correct check: gh auth token — exits 0 when a usable token exists. Never use gh auth status as an auth gate on this machine.
GOTCHA G2 — RACE CONDITION
Never run two collect_github.py processes simultaneously

Two simultaneous processes write to the same checkpoint files (data/github_{id}.json). The second process overwrites the first's valid data with blank data. Result: 0 PRs for all engineers, but the file is still present so you don't notice immediately.

✓ Before starting collection, run: ps aux | grep collect_github | grep -v grep. If any process running: pkill -f collect_github.py, then wait 5 seconds before starting.
GOTCHA G3 — GRAPHQL NODE LIMIT
collection returns 0 PRs for all engineers — the commits field is the culprit

The commits field in gh pr list --json triggers GitHub's GraphQL 500K node limit for repos with large PR histories. The entire query silently returns empty data — no error, no warning, just 0 results. This was the single largest data integrity failure in dashboard history.

✓ The fix is already in the current collect_github.py. Verify with: grep "changedFiles" scripts/collect_github.py — should show changedFiles,reviews NOT changedFiles,commits,reviews. Never add commits back to this query.
GOTCHA G4 — EMPTY DATA FILE
github.json shows 0 PRs after a seemingly successful collection

If collection appears to complete but the PR verification check shows 0, check the file size: ls -lh data/github.json. Expected size is >100 KB. A 19 KB file means collection ran but wrote empty data — typically caused by a race condition or auth failure that was not surfaced as an error.

✓ Fix: pkill -f collect_github.py, verify no processes remain, then re-run collection as an agent. Never re-run while another instance might still be active.
GOTCHA G5 — LINEAR ZEROS
Linear shows 0/0/0% on all engineer pages

This means the data merge step did not wire Linear data into github_scored.json. The merge step runs but may not populate the output file with Linear fields if it fails silently.

✓ Fix: Run python3 scripts/merge_all_sources.py manually, then ./run.sh --skip-github --deploy. If Linear data is still 0, verify data/linear_data.json is non-empty and contains completed (not done) fields.
GOTCHA G6 — SILENT PIPELINE STEPS
Deploy appears hung — it's not

Step 2.8 (GPT-4o-mini PR scoring) is silent for 3–5 minutes. Step 2.9 (Perplexity coaching) is silent for 2–3 minutes. These look like hangs but are batch API calls that produce no intermediate output.

✓ Only abort if there is zero output for more than 15 minutes total. The full pipeline takes 8–12 minutes; if it is still running at 12 minutes, wait until 15 before intervening.
GOTCHA G7 — FLOAT FORMATTING
Dollar amounts show as raw Python floats ($3,447.2700000000023)

This means a format_currency() call was missed at an HTML injection point in generate_dashboard.py. Raw Python floats leaking into HTML look deeply unprofessional and would be visible to engineers in 1:1s.

✓ Fix: In generate_dashboard.py, find the template injection point and wrap the value in format_currency() or f"${'{:,.2f}'.format(value)}". Then ./run.sh --skip-github --deploy.
GOTCHA G8 — SCORE CACHE STALE
After fixing score_prs.py, old scores are still showing

The scoring cache persists between runs to save API cost. After any change to score_prs.py logic, the cache contains scores computed with the old broken logic.

✓ Always use --rebuild-cache on the first deploy after any score_prs.py change. Cost: ~$0.06–0.08.
13

QA Protocol — Mandatory After Every Deploy

🔴
QA is not optional — this dashboard is used in 1:1 performance conversations

Wrong numbers in this dashboard mean wrong performance assessments that affect real people's comp and retention. A coaching item that cites a fabricated PR count or a spend number with floating-point noise will be read by an engineer as a factual statement about their work. Run QA after every single deploy.

QA Process — 5 Parallel Haiku Agents

Run these 5 batches simultaneously. Use Claude Haiku (not Sonnet/Opus) — it is fast, cheap, and sufficient for number-comparison verification. Each agent checks 3–4 engineer pages against the source data files.

Agent Pages to Check Source Data Files
Batch 1 sumedh.html, ashish.html, a2z2.html, avetis.html data/github_scored.json, data/ai_spend.json
Batch 2 federico.html, pranav.html, aaryan.html, harshiv.html data/github_scored.json, data/ai_spend.json
Batch 3 andrewfe.html, kalpan.html, darius.html, gabriele.html data/github_scored.json, data/ai_spend.json
Batch 4 tarun.html, tyler.html, justin.html, evan.html data/github_scored.json, data/ai_spend.json
Batch 5 sprint.html, team.html, spend.html data/github_scored.json, data/ai_spend.json, data/linear_data.json

What to Verify Per Engineer Page

  • Total PRs: matches data/github_scored.json → engineer → total_prs
  • AI spend: shows as $X,XXX.XX (no raw floats like $3,447.2700000000023)
  • Linear done/total: shows real numbers, not 0/0 with a "not synced" warning
  • Weighted rank: matches github_scored.jsonweighted_rank
  • Model mix: shows real percentages (not all 0%) — should sum to ~100%
  • Coaching items: cite real PR counts from source data, no fabricated course names, no engineer's own name as action subject
  • Sprint table: count in header (N) matches KPI tile count

⚠️ ENGINEER PAGE RULE — Never show on individual engineer pages

Individual engineer pages (public/engineers/*.html) must NOT display:
• Monthly or annual salary
• Tier number (T1/T2/T3)
• Output rank (#1 of 25, #14 of 25, etc.)
• Peer compensation comparisons

These metrics are shown to engineers in 1:1s. Displaying them hurts morale. Keep these on sprint.html, team.html, spend.html (manager-only views) instead.

Fast QA Prompt (copy-paste for Haiku agent)

# Paste this to a Claude Haiku agent (one per batch)
Use Haiku model. Read:
1. public/engineers/sumedh.html
2. public/engineers/ashish.html
3. public/engineers/a2z2.html
4. data/github_scored.json

For each engineer, verify:
- Total PRs in HTML matches github_scored.json total_prs
- AI spend is formatted as $X,XXX.XX (no raw floats)
- Linear done/total shows non-zero values
- Weighted rank matches github_scored.json weighted_rank
- Model mix percentages are not all 0%

Output: one line per engineer in format:
[NAME]: PASS or FAIL — [brief note if FAIL]

After QA — Fix and Redeploy

# Fix rendering bug → fast redeploy (no re-scoring needed)
./run.sh --skip-github --deploy

# Fix data bug → check source files in data/ first, then redeploy
# If score_prs.py was changed: --rebuild-cache required
./run.sh --skip-github --rebuild-cache --deploy
14

What's Built — Current System State (March 15, 2026)

Reference for understanding what each script does and where to look when something breaks.

Pipeline Scripts

Script What it does Run.sh step Notes
collect_github.py Pulls all PR data from GitHub API for all engineers Step 1 (manual, agent) REBUILT. Parallel batches (5 workers), per-engineer checkpointing, timestamped progress, --resume flag, auth uses gh auth token
score_prs.py Scores each PR by complexity using GPT-4o-mini Step 2.8 Caches results. Use --rebuild-cache after any logic change. Deployment PRs (staging→main) score fixed 1.5/XS.
merge_all_sources.py Merges GitHub, Linear, Sentry, AI spend into one data object per engineer Step 2.7 Produces github_scored.json. Linear field is completed (not done).
generate_coaching.py Generates Perplexity + GPT-4o-mini coaching items per engineer Step 2.9 NEW (Mar 15). 5-layer surgical prompt. Cached by gap signature. Labels items [INDIVIDUAL ACTION] vs [PROCESS FIX]. Must pass total_prs and ticket_linked_pct explicitly.
generate_dashboard.py Generates all HTML files in public/ Step 2.10 101+ HTML files. format_currency() must be applied at every dollar injection point.

Output Files in public/

public/
├── index.html                           ← dashboard home / nav hub
├── engineers/
│   ├── sumedh.html                      ← per-engineer sprint page (25 files)
│   ├── ashish.html
│   └── ... (one per engineer) ...
├── sprint.html                          ← team-wide sprint view
├── team.html                            ← org tier map, spend treemap
├── spend.html                           ← AI spend analytics
├── sentry.html                          ← production error health
├── linear.html                          ← issue completion rates
├── code-quality.html                    ← PR quality metrics
├── WEEKLY-RUNBOOK.html                  ← this document (also in Dashboard root)
└── archive/                             ← superseded legacy files

NOTE: public/**/*.html is gitignored — regenerated on every pipeline run.
Save committed docs (runbook, playbooks) to Dashboard/ root, not public/.

Quarterly Artifacts (Manual — Do NOT Auto-Regenerate)

⚠️
These are NOT generated by run.sh — they are manually generated quarterly

The 14 engineer deep-dive HTML reports and 00_MASTER-Engineering-Synthesis.html are quarterly artifacts saved to the Dashboard root. Never include them in the weekly pipeline. Never save them to public/ (gitignored). Their value is depth and curation — the weekly pipeline would overwrite them with thinner auto-generated versions.

Dashboard/
├── 00_MASTER-Engineering-Synthesis.html   ← quarterly org narrative
├── 01_Sumedh-Kane-Deep-Dive.html          ← quarterly deep-dive (14 files)
├── 02_Ashish-Lathkar-Deep-Dive.html
└── ... (one per engineer) ...

Key Data Files

File Contents Updated by
data/github.jsonRaw PR data for all engineerscollect_github.py (manual step 1)
data/github_scored.jsonScored + merged data (source of truth for HTML generation)score_prs.py + merge_all_sources.py
data/ai_spend.jsonPer-engineer AI spend from Anthropic billing CSVprocess_csv.py (Step 2.2)
data/linear_data.jsonLinear ticket completion rates per engineercollect_linear.py
data/coaching_cache.jsonCached Perplexity + GPT coaching outputs (keyed by gap signature)generate_coaching.py
engineers.jsonSingle source of truth for all engineer metadataManual — update on every hire/termination

Vercel Deployment

# Deploy is handled by run.sh --deploy flag
# Manual deploy if needed:
cd "/Users/lokicheema/Desktop/Hemut Files/Dashboard"
vercel --prod --yes

# Live URL:
https://public-delta-tawny.vercel.app/

# Runbook URL:
https://public-delta-tawny.vercel.app/WEEKLY-RUNBOOK.html

# After updating WEEKLY-RUNBOOK.html — copy to public/ and deploy:
cp WEEKLY-RUNBOOK.html public/WEEKLY-RUNBOOK.html
vercel --prod --yes
✓ Data current GitHub: Mar 30, 2026 8:20pm (today) AI Spend CSVs: Mar 30, 2026 8:20pm (today)