I'm a passionate data scientist currently working as an AI Engineer (contract) at Wolters Kluwer. I recently graduated from the University of Chicago with a passion for building practical, intelligent AI systems that actually work in production settings. I’m especially drawn to large language models, agent-based architectures, and AI research applicable to industry settings. I’ve worked on everything from fine-tuning and deploying LLMs at scale to designing modern retrieval pipelines and AI agents that interact with tools, code, and people. I’m particularly interested in the edge cases... where models fail, where embeddings break down, and where a clever system design can make all the difference. I enjoy solving hard problems at the edge of machine learning, data science, and software engineering.
Built an AI end-to-end portfolio chatbot from scratch using RAG and hybrid vector search, enabling interactive conversations about my experience and projects. Integrated custom embeddings and retrieval pipelines to showcase real-time, LLM-powered Q&A. Click here to test it out!
Conducted a computational semantic analysis with two peers of 44,000 Reddit articles from r/liberal and r/conservative using semantic embeddings, BERTopic, and entropy metrics to measure the rise of political echo chambers from 2011–2022. Found increasing intra-group semantic convergence and decreasing topic diversity varying in degree in both subreddits, revealing asymmetric patterns of ideological insularity over time. Click here to read the project paper.
Conducted a comparative evaluation of instruction-tuned LLMs with and without chain-of-thought reasoning (DeepSeek-R1 vs. Qwen), using self-consistency metrics, semantic embedding similarity, BLEU, and ROUGE-L scores. Demonstrated that CoT fine-tuning improves functional reasoning accuracy while standard instruction-tuning favors linguistic fluency, highlighting distinct performance trade-offs at the system level. The project paper was ranked #1 by the Head of the Cognitive Science Department among 70+ student submissions. Click here to read the project paper.
Developed a graph-based document retrieval system at Argonne National Lab using open-source LLMs to navigate government policy documents under strict data privacy constraints. Led the design and implementation of query augmentation, a hybrid retrieval pipeline (semantic + keyword), an agentic interface, and an LLM-assisted document reranking and evaluation system. Project won "Best in Clinic" award from the UChicago Data Science Institute out of 20 project submissions in 2024. Click here to watch technical presentation from Phase 1 of the project (June 2024).
Built a fine-grained dog breed classifier using DenseNet121 and transfer learning on the Stanford Dogs Dataset (20,580 images across 120 breeds). Adapted model architecture with custom Dense layers and dropout, achieving 82% cross-validation accuracy and 97.2% top-5 accuracy on unseen user-submitted dog images. Leveraged data augmentation, GlobalAveragePooling2D, and confusion matrix analysis to evaluate generalization performance across similar breed classes. Click here to read the project paper.
Wolters Kluwer | UpToDate
Wolters Kluwer | UpToDate
Consultant.AI
Insait.IO
Argonne National Laboratory
Argonne National Laboratory
University of Chicago Department of Computer Science
University of Chicago Department of Computer Science
University of Chicago