Personal · College · Side

Projects worth shipping.

Distributed real-time architecture, ML classification on imbalanced data, vector retrieval, and full-stack competition platforms. Each prioritizes reliability and measurable impact over surface polish.

2026

Real-Time · VR · Distributed Systems

Server-Authoritative Multiplayer VR Sandbox

The fundamental problem in multiplayer VR is state authority: when clients own their world, desync and divergence are inevitable. The challenge was building a sandbox where the server owns all terrain, physics, and object state, distributing updates to N Quest headsets at 72Hz without perceptible lag. Server-authoritative design from the ground up — no client-side state mutations.

Frame-rate sync: 72Hz
Multi-Quest sync: N→N
Terrain streaming: CSV

Built on Unity 2022.3 with Netcode for GameObjects, using a strict server-authoritative model — clients send input intentions, never state mutations. Terrain data streams from CSV frame buffers parsed server-side and distributed via Netcode's NetworkVariable system, ensuring every connected Quest headset renders identical geometry. The authority pattern prevents desync across N clients without expensive reconciliation. Designed as a research-grade collaborative simulation platform — the architecture prioritizes deterministic state replication over cosmetic polish.

Unity 2022.3
Netcode for GameObjects
Meta Quest
Server-Authoritative Model
CSV Terrain Streaming
NetworkVariable Sync

GitHub · soon

2026

Full-Stack · Backend Systems · Competition Platform

Arcanum — AI-Era Puzzle Competition Platform

Build a competition platform that handles high-volume concurrent submissions, prevents abuse without blocking legitimate players, and returns semantic feedback fast enough that it feels instant. 10 progressive levels. 10,000+ submissions in the first 10 hours. 90+ concurrent players. The puzzle concept is the product skin — the backend systems are the actual engineering problem.

Submissions in 10h: 10k+
Active players: 90+
Progressive levels: 10

React on Vercel with a Flask API on Render backed by Supabase PostgreSQL. All puzzle validation is server-side — no answer state reachable via client inspection. The semantic feedback engine encodes player guesses into dense embeddings and scores cosine similarity against the solution, returning a graduated warmth signal in place of binary feedback. Harder levels intentionally degrade signal resolution. A hint system with 24-hour refresh cycles gates progression without blocking it. Rate limiting and OTP-based account verification held under 10,000+ competitive submissions in the first session — abuse prevention was a design requirement, not an afterthought.

React
Flask
Supabase
PostgreSQL
Word Embeddings
Semantic Similarity
Rate Limiting
OTP Auth
Vercel
Render

GitHub · soon

2026

Full-Stack · Applied ML · Accessibility

Mnemo — Vocabulary Builder for Neurodivergent Learners

Vocabulary apps built around alphabetical rote memorization fail ADHD learners — flat lists collapse attention and treat all words as interchangeable. The challenge: design a vocab app where mnemonics carry the cognitive load, sourced from both AI generation and a community submission/voting layer, with per-user progress, saves, and notes that survive long enough to compound into real recall.

Mnemonic sources: AI + crowd
Progress + notes: Per-user
Community quality signal: Voting

Designed for ADHD attention patterns first — every screen privileges a single high-salience artifact (the mnemonic) over list density. Mnemonics are generated on demand via an LLM with prompt scaffolding that emphasizes vivid imagery, then surfaced alongside community submissions. A voting layer turns the crowd into the quality signal so weak mnemonics decay without explicit moderation. Per-user progress, saves, and notes are persisted in Postgres with row-level scoping; each word becomes a long-lived object the user can return to, not a flashcard that disappears after a session.

Next.js
React
TypeScript
Postgres
OpenAI API
Community Voting
Tailwind

GitHub · soon

2026

ML · RAG · Audio Systems

Canvas RAG — Conversational Coursework with Voice I/O

Canvas LMS scatters lectures, syllabi, readings, and assignments across modules with no unified query layer — students re-read the same material to find one sentence. The build: a conversational RAG layer over a student's entire Canvas catalog with bidirectional voice, so reviewing a course feels like asking a TA, not Ctrl-F across PDFs.

Bidirectional I/O: Voice
Full course corpus: Canvas
Retrieval eval: RAGAS

A FastAPI backend pulls coursework from Canvas, encrypts at rest with Fernet, and chunks documents into a ChromaDB vector store keyed per-course and per-user. Retrieval is augmented by GPT OSS for the conversational layer — keeping inference and embeddings out of third-party APIs where coursework privacy matters. Voice input runs through Whisper for transcription; responses are spoken back via TTS so the entire loop is hands-free. Retrieval quality is continuously evaluated using RAGAS metrics (faithfulness, answer relevancy, context precision) so regressions in chunking or prompt strategy are caught before they ship.

React
Python
FastAPI
ChromaDB
GPT OSS
Whisper
Fernet
RAGAS

GitHub · soon

2026

ML · NLP · Clinical Systems

Clinical Urgency Classification from EHR Notes

Emergency departments generate thousands of unstructured clinical notes daily. Triage depends on clinicians reading every note — a bottleneck that delays critical interventions. The task: classify free-text EHR notes into Stable, Deteriorating, or Critical urgency, on a dataset where critical cases represent less than 8% of samples.

Urgency taxonomy: 3-class
Critical class ratio: <8%
Primary metric: F1

The core challenge was class imbalance — with Critical cases at <8%, accuracy alone is meaningless. The pipeline starts with text preprocessing and TF-IDF feature extraction, then embedding-based representations that capture semantic similarity between symptom descriptions. Classification uses Logistic Regression — chosen deliberately over deep models because interpretability matters in clinical settings and the dataset doesn't justify transformer-scale compute. Evaluation prioritizes F1, precision, and recall per class, with particular attention to recall on Critical where a false negative has real patient cost.

Python
scikit-learn
TF-IDF
Embeddings
Logistic Regression
Imbalanced Classification
Clinical NLP

GitHub · soon

2026

ML · Search · Vector Systems

Puzzle Feedback Engine — Vector Similarity Hint System

Traditional puzzle hint systems are binary — right or wrong. They tell you nothing about directionality. The challenge: build a feedback engine that understands how semantically close a player's attempt is to the solution and returns graduated "warmer / colder" signals.

Query latency: <50ms
Similarity search: pgvector
Distance metric: Cosine

Player inputs are encoded into dense vector embeddings and compared against the solution embedding using cosine similarity via Postgres pgvector. The distance maps to a graduated warmth scale — not binary correct/incorrect but a continuous proximity signal. A milestone-based hint system triggers at semantic distance thresholds. End-to-end runs in under 50ms — indistinguishable from a static lookup to the player.

Postgres
pgvector
Embeddings
Cosine Similarity
Node.js
Supabase
Semantic Search

GitHub · soon

← Back home