Personal · College · Side

Projects worth shipping.

Distributed real-time architecture, ML classification on imbalanced data, vector retrieval, and full-stack competition platforms. Each prioritizes reliability and measurable impact over surface polish.

2026
Real-Time · VR · Distributed Systems

Server-Authoritative Multiplayer VR Sandbox

The fundamental problem in multiplayer VR is state authority: when clients own their world, desync and divergence are inevitable. The challenge was building a sandbox where the server owns all terrain, physics, and object state, distributing updates to N Quest headsets at 72Hz without perceptible lag. Server-authoritative design from the ground up — no client-side state mutations.

Frame-rate sync
72Hz
Multi-Quest sync
N→N
Terrain streaming
CSV

Built on Unity 2022.3 with Netcode for GameObjects, using a strict server-authoritative model — clients send input intentions, never state mutations. Terrain data streams from CSV frame buffers parsed server-side and distributed via Netcode's NetworkVariable system, ensuring every connected Quest headset renders identical geometry. The authority pattern prevents desync across N clients without expensive reconciliation. Designed as a research-grade collaborative simulation platform — the architecture prioritizes deterministic state replication over cosmetic polish.

  • Unity 2022.3
  • Netcode for GameObjects
  • Meta Quest
  • Server-Authoritative Model
  • CSV Terrain Streaming
  • NetworkVariable Sync
GitHub · soon
2026
Full-Stack · Backend Systems · Competition Platform

Arcanum — AI-Era Puzzle Competition Platform

Build a competition platform that handles high-volume concurrent submissions, prevents abuse without blocking legitimate players, and returns semantic feedback fast enough that it feels instant. 10 progressive levels. 10,000+ submissions in the first 10 hours. 90+ concurrent players. The puzzle concept is the product skin — the backend systems are the actual engineering problem.

Submissions in 10h
10k+
Active players
90+
Progressive levels
10

React on Vercel with a Flask API on Render backed by Supabase PostgreSQL. All puzzle validation is server-side — no answer state reachable via client inspection. The semantic feedback engine encodes player guesses into dense embeddings and scores cosine similarity against the solution, returning a graduated warmth signal in place of binary feedback. Harder levels intentionally degrade signal resolution. A hint system with 24-hour refresh cycles gates progression without blocking it. Rate limiting and OTP-based account verification held under 10,000+ competitive submissions in the first session — abuse prevention was a design requirement, not an afterthought.

  • React
  • Flask
  • Supabase
  • PostgreSQL
  • Word Embeddings
  • Semantic Similarity
  • Rate Limiting
  • OTP Auth
  • Vercel
  • Render
GitHub · soon
2026
Full-Stack · Applied ML · Accessibility

Mnemo — Vocabulary Builder for Neurodivergent Learners

Vocabulary apps built around alphabetical rote memorization fail ADHD learners — flat lists collapse attention and treat all words as interchangeable. The challenge: design a vocab app where mnemonics carry the cognitive load, sourced from both AI generation and a community submission/voting layer, with per-user progress, saves, and notes that survive long enough to compound into real recall.

Mnemonic sources
AI + crowd
Progress + notes
Per-user
Community quality signal
Voting

Designed for ADHD attention patterns first — every screen privileges a single high-salience artifact (the mnemonic) over list density. Mnemonics are generated on demand via an LLM with prompt scaffolding that emphasizes vivid imagery, then surfaced alongside community submissions. A voting layer turns the crowd into the quality signal so weak mnemonics decay without explicit moderation. Per-user progress, saves, and notes are persisted in Postgres with row-level scoping; each word becomes a long-lived object the user can return to, not a flashcard that disappears after a session.

  • Next.js
  • React
  • TypeScript
  • Postgres
  • OpenAI API
  • Community Voting
  • Tailwind
GitHub · soon
2026
ML · RAG · Audio Systems

Canvas RAG — Conversational Coursework with Voice I/O

Canvas LMS scatters lectures, syllabi, readings, and assignments across modules with no unified query layer — students re-read the same material to find one sentence. The build: a conversational RAG layer over a student's entire Canvas catalog with bidirectional voice, so reviewing a course feels like asking a TA, not Ctrl-F across PDFs.

Bidirectional I/O
Voice
Full course corpus
Canvas
Retrieval eval
RAGAS

A FastAPI backend pulls coursework from Canvas, encrypts at rest with Fernet, and chunks documents into a ChromaDB vector store keyed per-course and per-user. Retrieval is augmented by GPT OSS for the conversational layer — keeping inference and embeddings out of third-party APIs where coursework privacy matters. Voice input runs through Whisper for transcription; responses are spoken back via TTS so the entire loop is hands-free. Retrieval quality is continuously evaluated using RAGAS metrics (faithfulness, answer relevancy, context precision) so regressions in chunking or prompt strategy are caught before they ship.

  • React
  • Python
  • FastAPI
  • ChromaDB
  • GPT OSS
  • Whisper
  • Fernet
  • RAGAS
GitHub · soon
2026
ML · NLP · Clinical Systems

Clinical Urgency Classification from EHR Notes

Emergency departments generate thousands of unstructured clinical notes daily. Triage depends on clinicians reading every note — a bottleneck that delays critical interventions. The task: classify free-text EHR notes into Stable, Deteriorating, or Critical urgency, on a dataset where critical cases represent less than 8% of samples.

Urgency taxonomy
3-class
Critical class ratio
<8%
Primary metric
F1

The core challenge was class imbalance — with Critical cases at <8%, accuracy alone is meaningless. The pipeline starts with text preprocessing and TF-IDF feature extraction, then embedding-based representations that capture semantic similarity between symptom descriptions. Classification uses Logistic Regression — chosen deliberately over deep models because interpretability matters in clinical settings and the dataset doesn't justify transformer-scale compute. Evaluation prioritizes F1, precision, and recall per class, with particular attention to recall on Critical where a false negative has real patient cost.

  • Python
  • scikit-learn
  • TF-IDF
  • Embeddings
  • Logistic Regression
  • Imbalanced Classification
  • Clinical NLP
GitHub · soon
2026
ML · Search · Vector Systems

Puzzle Feedback Engine — Vector Similarity Hint System

Traditional puzzle hint systems are binary — right or wrong. They tell you nothing about directionality. The challenge: build a feedback engine that understands how semantically close a player's attempt is to the solution and returns graduated "warmer / colder" signals.

Query latency
<50ms
Similarity search
pgvector
Distance metric
Cosine

Player inputs are encoded into dense vector embeddings and compared against the solution embedding using cosine similarity via Postgres pgvector. The distance maps to a graduated warmth scale — not binary correct/incorrect but a continuous proximity signal. A milestone-based hint system triggers at semantic distance thresholds. End-to-end runs in under 50ms — indistinguishable from a static lookup to the player.

  • Postgres
  • pgvector
  • Embeddings
  • Cosine Similarity
  • Node.js
  • Supabase
  • Semantic Search
GitHub · soon