Upload a short reference voice clip, provide its transcript, and generate speech in the same voice. Reference text is recommended; leaving it blank triggers automatic ASR and may be slower on first use.