The former Goldman Sachs VP talks about building ethical voice AI, why the free plan only gives you 10 minutes, and what it takes to make synthetic speech that doesn’t creep people out.
By Frankie|March 2026|10 min read
This interview has been edited for clarity and length.
Ankur Edkie spent eight years at Goldman Sachs building AI and blockchain products, rising to Vice President before doing what every finance person secretly fantasizes about: quitting to start something of his own. In October 2020, he co-founded Murf AI with Sneha Roy and Divyanshu Pandey — a text-to-speech platform that now serves 6 million users across 195+ countries and counts 300+ Fortune 2000 companies as clients.
Murf’s pitch: professional-quality AI voiceovers without hiring voice actors. Their latest Falcon model promises 55 milliseconds latency. I’ve been testing it, comparing voices, and digging into the pricing. Time to ask some uncomfortable questions.
Murf AI homepage: Ultra-realistic AI voice generator built for maximum speed and efficiency
Frankie: Ankur, you were a VP at Goldman Sachs. That’s not exactly a dead-end job. What possessed you to leave and start an AI voice company?
Ankur: I spent eight years there leading AI and blockchain technology products. It was incredible experience, but I kept seeing the same problem: enterprises spending absurd amounts on voice content. Training videos, marketing materials, compliance narrations — every time they needed audio, it was a multi-week production involving voice actors, studios, and post-production. I thought: the technology exists to automate this. Why isn’t anyone doing it well? I started Murf with Sneha and Divyanshu because we believed synthetic speech could be both high-quality AND ethically produced. That “AND” is crucial.
Frankie: Let’s dig into the ethics thing because voice AI has a massive trust problem. Deepfakes, voice cloning scams, celebrities finding AI copies of their voices online. How does Murf handle this?
Ankur: This is foundational for us, not an afterthought. We follow three commitments: consent-first creation, where every voice in our library is from an artist who actively chose to participate. Full artist control — they can withdraw their voice at any time. And royalty sharing that grows with usage of their voice avatar. AI should augment human creativity, not replicate it without permission. We rejected the approach of scraping voices from the internet, which some competitors unfortunately did.
Frankie: “Royalty sharing that grows with usage” — that’s a bold claim. Can you give me numbers?
Ankur: I can’t share exact percentages because they vary by contract, but the model is designed so that as a voice becomes more popular on our platform, the artist earns more. It aligns incentives — the artist wants their voice to sound great and be widely used, and we want the same thing. It’s a partnership, not a transaction.
Frankie: I have to talk about the free plan. Ten minutes total — not per month, total — and no downloads. Ankur, that barely covers testing one voice style. How is anyone supposed to evaluate your platform with that?
Ankur: I hear you, and I know this is a common complaint. The free plan is designed as a taste, not a full evaluation. Here’s our reasoning: voice generation has real compute costs. Every minute of audio costs us money to generate. Unlike a SaaS tool where marginal cost per user is near zero, each voice generation request hits our GPU infrastructure. That said, we’re looking at expanding the free tier because I agree — 10 minutes isn’t enough to make an informed purchase decision. It’s something we’re actively working on.
Frankie: Let’s talk about pricing surprises. I saw a marketing agency report spending $3,400/month on Murf after starting with a $79 Business plan. Hidden API costs, voice cloning fees, storage overages. Is “price creep” a known issue?
Ankur: I wouldn’t call it hidden — all our costs are documented. But I will admit our pricing page could do a better job of showing total cost of ownership for different use cases. API access, voice cloning, and storage are separate line items, and for an agency producing hundreds of voiceovers a month, those add up. We’re working on bundled pricing for high-volume users so there are fewer surprises. Transparency is something we take seriously, and if people are feeling surprised by their bills, that’s a problem we need to fix.
Murf AI pricing: From Free tier to Business at $66/month with voice cloning
Frankie: I tested your premium voices vs. basic voices. The quality gap is enormous. Premium voices sound almost human. Basic voices sound like they’re reading a hostage note. Why such a big difference?
Ankur: [Laughs] The “hostage note” comparison is new, but I get what you mean. Premium voices use our latest neural models with more training data and fine-tuning. Basic voices use older models that we keep for backward compatibility and lower-cost use cases. We’re gradually upgrading all voices to the new quality standard, but it takes time — each voice model requires extensive training and quality validation. Our Falcon model, launched in November 2025, represents where all our voices are heading: 55 milliseconds latency, natural prosody, emotional range.
Frankie: Speaking of emotional range — that’s still the weak spot, right? Users say they can’t convey sarcasm, urgency, or subtle emotional shifts. How do you make AI sound genuinely expressive?
Ankur: You’re touching on one of the hardest problems in speech synthesis. Emotion in voice isn’t just about pitch — it’s rhythm, emphasis, breathing patterns, micro-pauses. Our current system lets you adjust speed, pitch, and emphasis markers, but I’ll be honest: we’re not at the level where you can say “read this sarcastically” and get a believable result. That’s a frontier for the entire industry, not just Murf. We’re making progress with our latest models, but true emotional AI speech is probably 2-3 years away from being production-ready.
Frankie: 200+ voices across 30+ languages. But users in non-English markets tell me the quality drops off a cliff. French, Japanese, Hindi — they sound noticeably worse than English. True?
Ankur: Partially true, and it’s something we’re actively addressing. English has the most training data available, so English voices are naturally more refined. But we’ve invested heavily in linguistic modeling for other languages — we’ve achieved 99.38% pronunciation accuracy across our supported languages, which is industry-leading. The gap is narrowing. We engage local talent in every market for development and QA, because a French voice needs to be validated by French speakers, not just language models.
Frankie: What’s the most unexpected use case you’ve seen?
Ankur: A museum in Europe used Murf to create audio guides in 12 languages overnight. They used to spend $50,000 and six months producing multilingual audio tours. With Murf, they did it in a week for a fraction of the cost. The quality was good enough that visitors couldn’t tell the difference. That’s the power of this technology — it doesn’t just save money, it makes things possible that were previously impractical.
Frankie: ElevenLabs is the elephant in the room. They’re considered the gold standard for voice AI. How do you compete?
Ankur: ElevenLabs is excellent at what they do, and I respect their technology. But we play a different game. ElevenLabs focuses on creative and entertainment use cases. Murf is built for enterprise — learning and development, marketing, compliance, internal communications. We offer team collaboration, brand voice consistency, Canva and Google Slides integrations, and enterprise security. Our customers aren’t podcasters — they’re Fortune 2000 companies that need 500 voiceovers in 15 languages with brand-consistent quality. That’s a fundamentally different product.
Frankie: Your Falcon model — 55ms latency, 130ms time-to-first-audio across 33 locations. Those are real-time voice agent numbers. Is Murf moving into the conversational AI space?
Ankur: Absolutely. Falcon is our bridge from content creation to real-time voice applications. We’re seeing massive demand from companies building voice agents, IVR systems, and customer service bots. They need high-quality, low-latency voice synthesis that sounds natural at conversational speed. Falcon was built specifically for that. It’s a huge market expansion for us — from recorded content to live conversation.
Frankie: Last one. IIT Kharagpur, Goldman Sachs, now AI startup CEO. What surprised you most about the transition from big corporate to startup life?
Ankur: How much faster you can move, and how much that matters. At Goldman, getting a product change approved took weeks of meetings and stakeholder alignment. At Murf, if a customer tells me something’s broken at 9am, we can ship a fix by lunch. That speed is addictive. But the flip side is there’s no safety net. At Goldman, if a project failed, you moved to the next one. At a startup, every failure costs you runway. The stakes are personal in a way that corporate never was. I wouldn’t trade it for anything.
Frankie’s Take
Ankur Edkie is a sharp operator. The Goldman Sachs polish shows — he’s articulate, measured, and knows exactly when to acknowledge a weakness versus defend a decision. His answers on ethical voice cloning were the most substantive I’ve heard from any AI voice company. Consent-first creation, artist withdrawal rights, and usage-based royalties aren’t just talking points — they’re competitive differentiators in an industry with serious trust issues.
But let’s be real: the pricing needs work. A 10-minute free plan is practically useless for evaluation. The gap between basic and premium voice quality is too large. And the “surprise bill” problem for high-volume users is real. Murf’s technology is legitimately impressive — the Falcon model’s latency numbers are outstanding — but the packaging and pricing need to match the product quality.
The enterprise positioning is smart. While ElevenLabs chases creators and podcasters, Murf is quietly signing Fortune 2000 companies that need 500 voiceovers in 15 languages. That’s a higher-value, stickier customer base. If they can fix the pricing transparency and expand the free tier, they’ll be very hard to dislodge from their enterprise niche.