Animated data flow diagram

AI Voice Chat Agent with Contextual Memory (n8n, Gemini, ElevenLabs)

Version: 1.0.0 | Last Updated: 2025-05-16

Integrates with:

OpenAI Google Gemini ElevenLabs

Overview

Unlock Interactive Voice Conversations with this AI Agent

This AI agent empowers you to build sophisticated voice-based conversational interfaces. It's designed to listen to user audio input, accurately transcribe it to text using OpenAI's Whisper, and then leverage Google Gemini's powerful language model to understand intent and generate contextually relevant responses. The agent maintains conversation history using n8n's built-in memory management capabilities (specifically, Langchain's Window Buffer Memory for a defined session), allowing for more natural, multi-turn dialogues. Finally, it synthesizes the text response into natural-sounding speech using ElevenLabs and delivers it back to the user.

This AI Agent is perfect for solopreneurs and businesses looking to automate customer interactions, provide instant support, or create innovative voice-controlled applications.

Key Features & Benefits

  • End-to-End Voice Conversation: Handles audio input, AI-powered processing, and audio output seamlessly.
  • AI-Powered Speech-to-Text: Utilizes OpenAI Whisper for high-accuracy transcription of spoken language.
  • Intelligent Dialogue Management: Employs Google Gemini for understanding user queries and generating coherent, human-like responses.
  • Contextual Awareness: Features built-in memory (n8n Memory Manager & Langchain nodes connected to Window Buffer Memory) to remember previous conversation turns for a specific session key, enabling richer and more meaningful interactions.
  • High-Quality Text-to-Speech: Integrates with ElevenLabs to produce natural and engaging voice outputs.
  • Webhook Integration: Easily triggered by external systems or applications sending voice data (e.g., audio files) via HTTP POST requests.
  • Customizable & Extensible: Tailor the AI's persona, voice, and response style. Extend its capabilities by connecting to other n8n nodes and services to perform actions based on the conversation.

Use Cases

  • B2C E-commerce: Implement an interactive voice assistant for product inquiries, order status checks, or FAQs, providing instant, 24/7 customer support.
  • B2B SaaS: Create a voice-activated helpdesk assistant to guide users through software features or troubleshoot common issues, reducing support agent workload.
  • Automated voice-based data entry or feedback collection for marketing campaigns.
  • Building internal voice tools for solopreneurs to quickly interact with their systems hands-free.

Prerequisites

  • An n8n instance (Cloud or self-hosted).
  • OpenAI API Key (for Whisper Speech-to-Text).
  • Google Gemini API Key.
  • ElevenLabs API Key and a Voice ID from your ElevenLabs account.
  • The webhook URL will be generated by the 'Webhook' node in n8n upon first activation or can be predefined if you have a static webhook URL.

Setup Instructions

  1. Download the n8n workflow JSON file.
  2. Import the workflow into your n8n instance.
  3. Configure the 'OpenAI - Speech to Text' node: In the 'Credential for OpenAI API' field, select or create your OpenAI API credential.
  4. Configure the 'Google Gemini Chat Model' node: In the 'Credential for Google Gemini API' field, select or create your Google Gemini API credential.
  5. Configure the 'ElevenLabs - Generate Audio' (HTTP Request) node:
    • In the 'URL' parameter, replace {{voice id}} at the end of the URL with the actual Voice ID you obtained from your ElevenLabs Voice Library (e.g., https://api.elevenlabs.io/v1/text-to-speech/21m00Tcm4TlvDq8ikWAM).
    • For 'Authentication', ensure 'Generic Credential Type' is selected. For 'Generic Auth', choose 'HTTP Custom Auth'.
    • Click 'Create New Credential' for HTTP Custom Auth. Name it (e.g., 'ElevenLabs API Key'). For the credential details, use 'Header Auth' and add a header with Name: xi-api-key and Value: YOUR_ELEVENLABS_API_KEY.
  6. (Optional) Review the 'Window Buffer Memory' node. The Session Key is currently hardcoded (e.g., test-0dacb3b5-4bcd-47dd-8456-dcfd8c258204). For persistent memory for a specific user or session, ensure this key is consistent. For multi-user scenarios, you'd need to make this dynamic, potentially passed via the webhook.
  7. (Optional) Customize the prompt in the 'Basic LLM Chain' node (under 'Text' or 'Messages' > 'AIMessagePromptTemplate') to refine the AI's persona or instructions.
  8. Activate the workflow. The 'Webhook' node will provide a URL (e.g., under 'Production URL' or 'Test URL').
  9. Test the AI Agent: Send a POST request to the webhook URL using a tool like Postman or cURL. The request must be multipart/form-data and include an audio file (e.g., .wav, .mp3) in a field named voice_message.
  10. The workflow should process the audio, generate a voice response, and send it back as binary audio data.

Tags:

AI AgentVoice AutomationGoogle GeminiOpenAIElevenLabsNLPChatbotContextual MemoryCustomer Engagement

Want your own unique AI agent?

Talk to us - we know how to build custom AI agents for your specific needs.

Schedule a Consultation