Hey there!
I'm a PhD student at UT Austin, advised by Amy Zhang, and an NSF CSGrad4US Fellow. During my PhD, I've spent a lot of time at Meta Fundamental AI Research (FAIR) working on multimodal generative models.

My research focuses on unified models that jointly generate text and images, along with designing effective rewards and datasets for multimodal training.
Previously I was an engineer at Meta working on various machine learning projects, including Meta's first generative AI product: AI-generated stickers.


News & Updates:

New Paper: Unified Text-Image Generation with Weakness-Targeted Post-Training

We post-train multimodal models to unify text and image generation in a single inference call, enabling the model to automatically transition between reasoning about an image and generating it. Our weakness-targeted synthetic dataset and reward function analysis leads to significant text-to-image performance improvements over the base model.

New Paper: Multi-Modal Language Models as Text-to-Image Model Evaluators

We present a text-to-Image (T2I) model evaluation method leveraging vision-language models as evaluator agents that generate image prompts and judge generated images. Our method’s T2I model rankings match existing benchmarks' rankings while using 80x less prompts and achieves higher correlations with human judgments.

Started PhD at University of Texas at Austin

Advised by Amy Zhang.

Joined the Generative AI Team at Meta

Joined the team that founded generative AI efforts at Meta. Working on text to image diffusion models and their product applications. The New York Times wrote about our team here.