Research Intern - Audio-Visual VoiceAI (Open Source)
•Today
| Hours | Full-time, Part-time |
|---|---|
| Location | Sunnyvale, California |
About this job
Job Description
Research Intern – Audio-Visual VoiceAI (Open Source)
We’re looking for a Research Intern to join WhissleAI and help advance our open-source work at the intersection of speech, vision, and structured understanding — inspired by projects like
- https://music.whissle.ai/
- advanced speech recognition asr.whissle.ai
- and recent multi-modal alignment research (example: https://aclanthology.org/2025.emnlp-main.845.pdf)
You’ll work on developing audio-visual foundation models that connect voice, context, and environment — enabling systems that can listen, see, and act coherently in real time. Most of this work is open-source and contributes directly to the broader research community.
Ideal candidate
- Undergrad, Master’s, or PhD student in CS, AI, or related field
- Prior research experience (conference/workshop publications a plus)
- Strong background in one or more of: multimodal learning, audio-visual representation learning, speech modeling, or self-supervised methods
- Experience with PyTorch, Hugging Face, or similar frameworks
What you’ll do
- Prototype and evaluate audio-visual alignment models
- Extend our open-source ASR and meta-speech pipelines
- Collaborate on papers, demos, and real-time VoiceAI applications
Location: Remote
- Type: Paid internship / research collaboration
Nearby locations
Nearby Job Titles
Radiologic Technologist Jobs Applications Software Developer Jobs Network Architect Jobs Retail Salesperson Jobs Nanny JobsNearby Locations
San Jose, CA Jobs Santa Clara, CA Jobs Sunnyvale, CA Jobs Palo Alto, CA Jobs California JobsNearby Companies
Care.com Jobs Apple Jobs Meta Jobs Kaiser Jobs Supermicro JobsNearby Categories
Full-time Jobs Part-time Jobs Gig Jobs Posting ID: 1185094136 Posted: 2025-11-19 Job Title: Research Intern Audio