About Me
I'm a Research Scientist on the
Seed team at
TikTok/
ByteDance, focusing on
speech synthesis. I completed my M.S. in Computer Science at
Columbia University.
My research interests include deep generative modeling, self-supervised representation and transfer learning, zero-shot learning, and knowledge distillation. I'm broadly interested in unifying neural audio generation and understanding to develop general auditory intelligence across speech, vision, and text modalities.
Publications
Seed-TTS: A family of high-quality vesatile speech generation models
Seed Team, ByteDance
arXiv:2406.02430, Jun. 2024 (
🔊 Demos)
VoiceShop: A unified speech-to-speech framework for zero-shot voice editing
Philip Anastassiou*, Zhenyu Tang*, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma
(*equal contribution)(*equal cont.)
arXiv:2404.06674, Apr. 2024 (
🔊 Demos)