ALTO
Vision & Roadmap
Voice-first AI driving agent. The complete product vision — from the problem we're solving to the 7-day MVP sprint and beyond.
01 — Problem
Driving time is dead time
The average person spends 1-2 hours driving every day. That's 550 hours per year — 23 full days — sitting in a box doing nothing. No product solves this without putting a screen in your face.
| Product | Problem |
| Siri / Google Assistant | Reactive. No context, no memory, no follow-through. |
| CarPlay / Android Auto | Still a screen. Still tapping. IS the problem. |
| Podcasts / Music | Entertainment, not productivity. Nothing gets done. |
02 — Non-Negotiable
Safety is the foundation, not a feature
Everything we build sits on one absolute rule: the driver is never endangered. This is not a phone app you use while driving — it's a voice agent that replaces the need to touch your phone at all.
Zero Visual Demand
Eyes never leave the road
No screen to glance at. No notifications to read. No UI to navigate. The phone stays in your pocket or on the seat. All information is delivered by voice. All actions are confirmed by voice.
Zero Manual Input
Hands never leave the wheel
No tapping. No swiping. No buttons. Not even a wake word button — the AI listens continuously during driving mode or activates via steering wheel controls. 100% hands-free, always.
Cognitive Load Management
The AI adapts to the drive
Complex traffic? AI goes silent. Highway cruising? AI briefs you. Detects stress in your voice? Reduces information density. The driver's cognitive safety always comes first — productivity second.
Confirmation Before Action
Nothing happens without your voice
The AI suggests. You confirm. "Send this reply?" → "Yes." No auto-sending without approval. No irreversible actions without explicit voice consent. You are always in control.
This is what separates us from every other productivity tool. We don't put a screen in your car — we remove the need for one. The phone becomes invisible. That's not a limitation, it's the entire point. It's why Apple would want us in the App Store, why insurers would endorse us, and why regulators can't touch us.
03 — Product
Proactive AI co-driver
Not an assistant you command. A co-pilot that runs your life. It knows your emails, messages, calendar, tasks — and handles everything the second you start driving.
"Morning. 47 min to office. Three things — Mike replied to your proposal, he's in. Want me to confirm and loop in legal? Your 10am moved to 11, I blocked your calendar. Your mom texted about Sunday dinner."
"Yes to Mike. Tell my mom I'll be there at 6."
→ Email sent to Mike with legal CC'd. WhatsApp to Mom: "Bin um 6 da!"
"What's on my to-do list?"
"Three items. Contract review due today — I can read the key changes. Investor deck needs approval — send to Lisa? And you wanted to book a restaurant for Friday."
"Read the contract. Send the deck to Lisa. Book Rocca, Friday 8pm, two people."
→ Reading contract... Deck forwarded. Restaurant booked.
Proactive intelligence
Starts briefing when you start driving. Prioritizes by urgency. Adapts to drive length. Fills silence with value, yields when you talk.
Real execution
Doesn't just inform — it acts. Sends replies, moves meetings, creates tasks, books restaurants. You confirm by voice.
Adaptive voice
Work context: Jarvis — calm, efficient, zero fluff. Personal context: warmer, knows your people by name, casual.
Safe by design
Voice-only. You approve actions by voice. Hands on wheel, eyes on road. Always. The phone is the speaker, not the interface.
04 — Why This Wins
Feature comparison
| Feature | Siri | CarPlay | Alto |
| Proactive briefings | No | No | Auto-starts |
| Multi-step actions | No | No | Unlimited chains |
| Cross-app context | No | No | Email+Cal+Msg |
| Conversation memory | No | No | Full drive |
| Voice reply to WhatsApp | Buggy | Requires taps | Fully hands-free |
| Calendar management | Basic | Read only | Read+modify+create |
| Zero screen interaction | No | No | 100% voice |
05 — Business Model
How it makes money
Free Tier
$0
5 drives/month. Messaging triage only. No calendar, no tasks, no actions. Enough to feel the magic.
Pro
$13.99/mo
Unlimited drives. All integrations. Proactive briefings. Full agent power. The tier 90% land on. Covers Unipile bridge + API costs with margin.
Enterprise
Custom
Team accounts. CRM integration. Custom workflows. Admin dashboard. Field sales teams — massive ROI.
06 — MVP Sprint
7 days. 1 agent. Ship it.
A focused sprint to build the core voice agent — from mic input to executed actions. Every day has one clear deliverable.
07 — Day 1
Voice Pipeline
Microphone to speaker. The foundation of everything. Get audio flowing before anything else matters.
Capture
Mic
AVAudioEngine
Transcribe
Whisper
On-device STT
Think
GPT-4o-mini
Reasoning
Synthesize
ElevenLabs
Streaming TTS
Output
Speaker
Bluetooth/Car
- AVAudioEngine capture with noise cancellation
- Whisper STT — on-device, <500ms latency
- Audio streaming pipeline (capture → buffer → transmit)
- ElevenLabs TTS integration — streaming, not batch
- Bluetooth HFP audio routing
- End-to-end voice loop test (speak → hear response)
This is the hardest day. If the voice loop feels natural and fast, everything else is UI on top of it. If it's laggy, nothing else matters.
08 — Day 2
LLM Agent Core
The brain. Give the LLM tools, context, and the ability to chain actions.
- System prompt with Alto personality + driving context
- Tool-calling framework (function definitions → execution)
- Conversation memory (full drive session context)
- Action confirmation flow ("Send this?" → "Yes" → execute)
- Error handling + graceful fallbacks ("I couldn't reach Gmail, want me to try again?")
- Response streaming to TTS (don't wait for full response)
Framework
Tool Architecture
Each tool is a self-contained function: name, description, parameters, execute. The LLM decides which tools to call and in what order. Tools can chain — "read email, draft reply, send" is one agent turn.
Memory
Context Window
Every drive session maintains full conversation history. The agent knows what you asked 20 minutes ago. Cross-drive memory stored server-side — the agent knows your preferences, contacts, patterns.
09 — Day 3
WhatsApp via Unipile
The first real integration. Read messages, reply by voice, handle group chats. The demo moment.
- Unipile API authentication + webhook setup
- Fetch unread conversations (last N hours)
- Read message content to user via TTS
- Voice-to-reply: transcribe → compose → confirm → send
- Group chat handling (identify sender, summarize thread)
- Contact name resolution (match phone → name)
- Message queue for offline/delayed sends
"Any new WhatsApp messages?"
"Three conversations. Sarah sent 2 messages about the weekend trip — she's asking about hotels. Mike shared a link in the dev group. And your mom asked if you're coming for dinner Sunday."
"Reply to Sarah — tell her the Marriott downtown, I'll book it. Tell mom yes, around 6."
→ WhatsApp to Sarah: "Let's do the Marriott downtown — I'll book it." WhatsApp to Mom: "Ja, bin um 6 da!"
"Both sent. Want me to book the Marriott for the weekend?"
"Yes, Friday to Sunday, two people."
→ Searching hotels... Marriott Downtown available. Booking for 2 guests, Fri–Sun.
10 — Day 4
Gmail via Google API
Triage your inbox by voice. Read, reply, compose, archive — all hands-free.
- Google OAuth2 flow + token management
- Fetch inbox (unread, priority, categories)
- Email summarization (subject + sender + key content in 1 sentence)
- Voice reply: dictate → format → confirm → send
- Compose new email by voice
- Smart archive ("Mark the newsletters as read")
- Thread context (read full thread before replying)
| Action | Voice Command | API Call |
| Read inbox | "What emails do I have?" | messages.list + messages.get |
| Summarize | "Give me the highlights" | Batch get + LLM summarize |
| Reply | "Reply to Mike — sounds good, let's do Thursday" | messages.send (threadId) |
| Compose | "Email Lisa about the Q2 report" | messages.send (new) |
| Archive | "Archive everything from LinkedIn" | messages.batchModify |
| Search | "Find the contract from last week" | messages.list (q param) |
| Label | "Star the email from the investor" | messages.modify (labelIds) |
11 — Day 5
Calendar + Morning Briefing
Google Calendar integration plus the killer feature: proactive morning briefings when you start driving.
- Google Calendar API — read events for today/week
- Create events by voice ("Schedule lunch with Mike tomorrow at noon")
- Modify events ("Move my 2pm to 3pm")
- Conflict detection ("You already have something at 3 — want me to move it?")
- Morning briefing engine — aggregates calendar + email + messages
- Briefing priority algorithm (urgent first, then time-sensitive, then FYI)
- Drive-duration awareness ("You have 25 minutes — here's what matters")
The morning briefing is the moment the product becomes a habit. It's not "open an app" — it's "start driving and everything you need to know is spoken to you." That's the behavior change.
12 — Day 6
Detection, Interface, Onboarding
Driving detection to auto-start. Minimal UI. First-time setup flow.
Auto-Start
Driving Detection
Bluetooth connection + GPS speed + CoreMotion accelerometer. When 2 of 3 signals confirm driving, Alto activates. No button press. No "Hey Siri." You start driving, Alto starts working.
The Screen
Minimal UI
One screen. A pulsing circle — idle, listening, thinking, speaking. That's it. No text to read. No buttons to tap. The phone sits face-down or in your pocket. The UI exists for parked mode only.
First Run
Onboarding
Connect Google account. Connect WhatsApp via Unipile. Set your name. Pick voice preference. Done in 90 seconds. First drive auto-starts a mini demo briefing to show what Alto can do.
- Bluetooth HFP / CarPlay detection
- CoreMotion + GPS speed-based driving detection
- State machine: idle → driving → parked
- Minimal UI — single screen with pulse animation
- Onboarding flow — OAuth + Unipile + preferences
- First-drive demo briefing
- Background audio session management
13 — Day 7
Ship it. Film it. Share it.
Polish, test the full flow, record the demo video. If the demo is compelling, you have product-market fit signal.
- Full end-to-end flow test (start car → briefing → interactions → park → summary)
- Edge case handling (no internet, API failures, silence detection)
- Voice quality tuning (pacing, pauses, personality)
- Record 30-second demo (TikTok/Reels format — POV dashcam)
- Record 5-minute walkthrough (YouTube format — full commute)
- Create landing page with waitlist
- TestFlight build for 5 beta testers
The demo IS the product-market fit test. If people watch it and say "I need this" — you have something. If they say "that's cool" — iterate. The reaction to the demo determines everything that comes next.
14 — Architecture
System overview
How all the pieces connect. Voice in, actions out, everything in between.
Input
iOS Audio
AVAudioEngine
Agent
Claude / GPT
Reasoning + Tools
Tools
APIs
Gmail, Cal, Unipile
| Layer | Technology | Purpose |
| iOS App | SwiftUI + AVAudioEngine | UI + audio capture |
| STT | Whisper (on-device) | Speech-to-text, <500ms |
| LLM | GPT-4o-mini / Claude Haiku | Reasoning + tool calling |
| TTS | ElevenLabs Turbo v2.5 | Voice synthesis, streaming |
| Messaging | Unipile API | WhatsApp bridge |
| Email | Google Gmail API | Read, reply, compose |
| Calendar | Google Calendar API | Events, scheduling |
| Backend | Cloudflare Workers + D1 | User data, conversation logs |
| Auth | Google OAuth2 | Account linking |
| Storage | Cloudflare R2 | Audio files, attachments |
15 — V2
After the MVP ships
The features that turn users into evangelists. Ship within 4-6 weeks post-MVP.
Monthly stats card. Shareable. Your Spotify Wrapped for driving productivity. Emails handled, messages replied, time saved. The viral loop.
Viral
Retention
Content
Join meetings from the car. Auto-dial at meeting time. Mute in traffic. Take notes. Extract action items. "I was on the call while driving" — that's the tweet.
CallKit
Notes
Viral Demo
Detect stress in voice. Bad day? Slower pacing, skip non-urgent items. Energetic? Faster briefing. Nobody else does this.
Voice Analysis
Adaptive UX
Differentiator
Expand messaging beyond WhatsApp. iMessage via local device integration. Telegram via Bot API. Cover 90% of messaging.
iMessage
Telegram
Coverage
16 — V3
The moat deepens
The features that make switching impossible. Data flywheel kicks in.
Intelligence
Predictive Actions
Don't react. Predict. Gym on Tuesdays → "Heading to gym? Cleared your hour." Always email after meetings → "Draft the follow-up?" Raining + usually order food → "Your usual from the Thai place?" The AI learns your patterns and acts before you ask.
Physical
Hardware Puck
Small matte-black device. Magnetic dashboard mount. Far-field mic array, Bluetooth speaker, one button, USB-C. $49.99. Opens every car without CarPlay. Unboxing content. Review bait. Subscription Trojan horse.
Marketplace
Voice Personas
"The CEO" — ultra-efficient, zero fluff. "The Best Friend" — warm, jokes. "The Drill Sergeant" — zero tolerance. Creator-built personas. Community + content + revenue. Celebrity persona = instant virality.
17 — V4
Scale plays
B2B2C distribution. The plays that make VCs lose their minds.
B2B2C
Insurance Partnerships
Prove users aren't touching their phone while driving. That's data insurers pay for. "Use Alto → 15% off car insurance." The app becomes a money-saving tool. Insurers promote you to their customers. Free distribution at scale.
B2B
Enterprise
Field sales teams. Delivery drivers. Real estate agents. Anyone who drives for work. Team accounts, CRM integration, custom workflows, admin dashboard. Massive ROI — reclaim 500+ hours/year per employee.
18 — Integration Timeline
What ships when
Every integration mapped to a version and week.
| Integration | Version | Week | Dependencies |
| Whisper STT | MVP | 1 | iOS audio permissions |
| ElevenLabs TTS | MVP | 1 | API key, streaming |
| GPT-4o-mini Agent | MVP | 2 | System prompt, tools |
| WhatsApp (Unipile) | MVP | 3 | Unipile account |
| Gmail | MVP | 4 | Google OAuth |
| Google Calendar | MVP | 5 | Google OAuth (shared) |
| Driving Detection | MVP | 6 | CoreMotion, Bluetooth |
| Morning Briefing | MVP | 5 | Calendar + Gmail |
| Onboarding | MVP | 6 | All OAuth flows |
| Demo Video | MVP | 7 | Full pipeline working |
| Drive Report | V2 | 9 | Usage analytics |
| Meeting Dial-In | V2 | 10 | CallKit, calendar |
| Emotional Detection | V2 | 11 | Voice analysis model |
| iMessage | V2 | 12 | Local device API |
| Telegram | V2 | 12 | Bot API |
| Predictive Engine | V3 | 18 | 30+ drives of data |
| Hardware Puck | V3 | 22 | Hardware partner |
| Insurance API | V4 | 30 | Driving data pipeline |
19 — Content Arsenal
The product IS the content
You don't market this app. You film it working. Every feature is a TikTok. Every demo is a viral moment. Every integration launch is a content event.
You drive 550 hours a year.
You accomplish exactly zero.
Your phone can run your entire life.
You're told not to touch it.
Siri can set a timer.
Congratulations.
What if your phone just handled everything?
While you drive. Without touching it. Ever.
Ready-to-film
Scripts that stop the scroll
Hand these to a creator or film them yourself. Each one is engineered to hook in under 2 seconds.
"I replied to 14 emails on my commute. Without touching my phone."
0:00POV dashcam. Morning traffic. Phone sitting untouched on passenger seat. Counter appears: "0 emails handled."
0:03AI voice kicks in: "Morning. 3 urgent emails. Mike confirmed the deal — want me to loop in legal?"
0:08Driver: "Yes. Reply to Sarah too — tell her Friday works." Counter ticks up.
0:12AI: "Done. Your 2pm moved to 3. Updated your calendar. Mom texted about dinner Sunday."
0:18Driver: "Tell her I'll be there at 7." Counter keeps climbing.
0:22Montage: traffic, counter hitting 14. Phone never moves.
0:26Text overlay: "14 emails. 6 messages. 2 calendar changes. 0 screen touches."
0:28Cut to phone still on seat. Brand tagline: "Your phone does everything. You just drive."
HOOK: Counter overlay on dashcam — "14 emails" in first frame is the pattern interrupt. Nobody scrolls past that.
"My AI joined my Zoom while I was on the highway."
0:00Dashboard POV. Highway. Clock reads 9:58 AM.
0:03AI: "Your standup starts in 2 minutes. 6 people on the call. Sarah shared an agenda — want the summary?"
0:09Driver: "Summarize it. Dial me in."
0:12AI: "Connecting... You're in. Muted during traffic. Three topics: Q1 numbers, product launch, hiring."
0:18Meeting audio fades in. Someone asks: "Can we get your take on the launch timeline?"
0:22AI whispers: "Your last update said March 15. Budget approved yesterday."
0:26Driver answers with perfect context. Confident. No fumbling.
0:32Call ends. AI: "4 action items captured. Send to the team?"
0:36Driver: "Send it."
0:38Text overlay: "Joined a meeting. Got briefed. Nailed my part. Never touched the phone."
HOOK: "My AI joined my Zoom" is inherently unbelievable. That's the hook. People watch to verify if it's real.
"I gave an AI my commute for 7 days. Here's what happened."
0:00Face cam. "This app claims it can run your life while you drive. I'm testing it for a week."
0:05DAY 1: "It replied to my mom. In German. Correctly. I'm slightly scared."
0:12DAY 2: "It moved my dentist appointment because it saw my calendar conflict. I didn't ask it to."
0:20DAY 3: "My boss asked why I'm suddenly so responsive. I said 'new system.' I lied. It's an AI."
0:28DAY 5: "It reminded me about my dad's birthday. I forgot. It didn't. Flowers ordered."
0:36DAY 7: Counter card — "47 emails, 23 messages, 8 calendar changes, 3 reservations. All by voice."
0:44Face cam: "I can't go back. My commute is now the most productive hour of my day."
0:52Final card: stats summary + brand logo. "Your phone does everything. You just drive."
HOOK: "I gave an AI my commute" — the challenge format is algorithm bait. Day-by-day creates narrative arc. Each day is its own shareable clip.
The moments
5 conversations that sell the product instantly
Each one is a movie scene. Each one makes someone DM the link to their group chat. These aren't hypotheticals — they're what the product does.
The Monday Morning Blitz
"Good morning. 47 minutes to office. You have 23 unread emails — 4 need replies today. Your 9:30 got pushed to 10. The design team is waiting on your feedback from Friday. And your wife asked if you can pick up Noah at 4."
"Reply to the design team — approved with one change, make the header darker. Tell my wife yes to Noah. What are the 4 urgent emails?"
"First: client contract. They countered at 85K instead of 90K. The delta is in the payment terms, not the rate. Want me to read the relevant clause?"
"Read it."
"They want net-60 instead of net-30. Everything else is identical to your last proposal."
"Accept it. Send to legal for signature."
→ Design team email sent. WhatsApp to wife: "Klar, hole Noah um 4 ab." Contract acceptance forwarded to legal. 3 more urgent emails queued.
The Birthday Save
"Hey — your mom's birthday is in 2 days. Last year you sent tulips from Bloomon and a card. Want me to do the same? I can also draft a WhatsApp message."
"Same flowers but add chocolates. And yeah, write something nice."
"Done. Tulips + Belgian chocolate box, delivery Thursday morning. Here's the message: 'Alles Gute zum Geburtstag, Mama! Ich hab dich lieb. Kleines Paket kommt morgen.' Send it?"
"Perfect. Send Thursday morning at 8."
→ Bloomon order placed: €45. WhatsApp scheduled for Thursday 8:00 AM. Calendar reminder removed — already handled.
The Double-Book Save
"Heads up — Lisa just scheduled a product review for 2pm tomorrow, but you already have the investor call at 2. Want me to move one?"
"Move Lisa to 3. Tell her I have a conflict."
"Lisa's calendar is open at 3 and 4. I'll book 3pm and send her a note. Also — you have 30 minutes free before the investor call. Want me to prep a summary of last month's metrics?"
"Yeah, pull the numbers and have them ready."
→ Lisa moved to 3pm. Message sent. Investor prep doc queued for 1:30pm with key metrics from your dashboard.
The One-Command Chain
"Book dinner for Friday. Somewhere nice. Italian. Two people."
"Rocca has a table at 8pm. Good reviews, 12 minutes from your office. Want me to book it and tell Sarah?"
"Do it."
→ Reservation confirmed: Rocca, Friday 8pm, 2 people. WhatsApp to Sarah: "Freitag 20 Uhr Rocca?" Calendar event created with address and directions.
The Contract From The Car
"New email from Müller AG. Revised contract, 14 pages. Want me to compare it to your last version?"
"Yes. What changed?"
"Three changes. Liability cap lowered from 500K to 200K. Payment terms now net-45 instead of net-30. And they added a non-compete clause for 12 months."
"Push back on the liability cap and the non-compete. Accept the payment terms."
"Draft reply: 'We can agree to net-45 payment terms. However, we'd need to maintain the original 500K liability cap and cannot accept the non-compete as written. Happy to discuss scope limitations as an alternative.' Send to Müller?"
"Send it."
→ Email sent to Müller AG. Contract comparison saved. Lawyer CC'd for visibility.
Controversy plays
Hot takes that split the internet
Controversy is distribution. These are designed to make people argue in the comments — and every argument is a view.
"Siri is a voicemail box with a personality disorder."
DEFENSE — Name one complex multi-step task Siri handles end-to-end. Read an email, draft a reply, check your calendar for conflicts, and send it? Can't. It's 2026 and Siri still asks "which email?" after you've already told it.
"CarPlay is the problem pretending to be the solution."
DEFENSE — It's a screen. In your car. That you tap while driving. Apple literally made the distraction prettier and called it safety. The solution to phone distraction isn't a bigger phone — it's no phone.
"If you're not productive while driving, you're choosing to waste 23 days a year."
DEFENSE — 550 hours. That's the math. Voice-only, 100% hands-free, eyes on road. Safer than changing a Spotify playlist. The dangerous thing isn't the AI — it's the phone you're already grabbing at red lights.
The engine
3x per week. Every week. Forever.
The content machine never stops. Each post targets a different audience on a different platform in a different format.
Monday
The Demo
30-second product demo. Real car, real AI, real tasks getting done. Always end with the phone sitting untouched.
TIKTOK / REELS / SHORTS
Wednesday
The Take
One bold thought piece. Hot take thread. Industry insight. The kind of post that gets 200 quote tweets arguing.
TWITTER / LINKEDIN
Friday
The Build
Behind the scenes. New feature preview. Integration reveal. Bug war story. Building in public creates community.
TWITTER / YOUTUBE / TIKTOK
Every new integration is a launch event. Slack? "I triaged 40 Slack messages driving to work." Restaurant booking? "I booked dinner, messaged my date, and blocked my calendar — in one sentence." Each integration = new video, new audience, new use case. Constant content. Constant momentum.
20 — The Flywheel
Every drive makes switching impossible
| Time | What happens |
| Week 1 | AI knows nothing about you |
| Week 4 | AI knows your contacts, schedule patterns, preferences |
| Week 12 | AI predicts your needs before you think of them |
| Week 26 | AI has handled 1,000+ actions — switching feels impossible |
| Year 1 | AI knows your life better than you do |
The product gets better the more you use it. Memory, predictions, emotional calibration — they all compound. This creates a switching cost no competitor can overcome by copying features. They'd need YOUR data. That's a moat you can't buy.
21 — Risks
What could go wrong
Eyes open. Plan for the worst.
| Risk | Severity | Mitigation |
| Voice latency >2s | HIGH | On-device STT, streaming TTS, edge computing |
| Unipile API instability | HIGH | Queue + retry, local message cache, fallback notification |
| Google OAuth rejection | MEDIUM | Follow review guidelines, minimal scope request |
| Background audio killed by iOS | HIGH | Audio session category, BGTaskScheduler, CarPlay entitlement |
| User speaks during TTS | MEDIUM | Barge-in detection, immediate pause + listen |
| Car noise degrades STT | MEDIUM | Noise cancellation, confidence thresholds, clarification prompts |
| App Store review rejection | MEDIUM | Privacy documentation, microphone usage justification |
| LLM hallucination in actions | HIGH | Confirmation before every action, no auto-execute |
22 — KPIs
How we know it works
The numbers that matter. If these move, we're winning.
| Metric | Target | Measurement |
| Voice round-trip latency | <1.5 seconds | Speak → first TTS byte |
| Action success rate | >85% | Completed / attempted actions |
| Drives per week (active user) | 5+ | Weekly active drives |
| Day-7 retention | >70% | Users active 7 days after signup |
| Morning briefing completion | >60% | Users who listen to full briefing |
| Messages handled per drive | 3+ | Avg messages triaged per drive |
| NPS score | >50 | Monthly survey |