Animated data flow diagram

Multimodal AI Assistant for WhatsApp using Google Gemini & n8n

Version: 1.0.0 | Last Updated: 2025-05-16

Integrates with:

WhatsApp Google Gemini Langchain Wikipedia

Overview

Unlock Intelligent WhatsApp Conversations with this Multimodal AI Agent

This n8n workflow transforms your WhatsApp into a smart, interactive communication channel. It acts as a versatile AI assistant, capable of receiving various message types—text, voice notes, videos, and images—from users. Using Google Gemini's advanced multimodal capabilities, it processes these inputs: transcribing audio, describing video content, and analyzing images. The core AI Agent, equipped with conversational memory and access to Wikipedia for factual information, then crafts relevant and helpful responses, sending them back to the user on WhatsApp. It's a powerful foundation for building sophisticated customer support bots, information assistants, or any interactive WhatsApp service.

Key Features & Benefits

  • Multimodal Input Processing: Handles text, audio (transcription), video (description), and image (analysis) messages seamlessly.
  • AI-Powered Understanding: Leverages Google Gemini Pro 1.5 for accurate content interpretation and generation.
  • Conversational Memory: Remembers previous interactions within a session for context-aware conversations (per user).
  • Knowledge-Augmented Responses: Integrates with Wikipedia to provide well-informed answers.
  • Automated WhatsApp Interaction: Listens for incoming messages and sends AI-generated replies automatically.
  • Flexible & Extensible: Built on n8n, allowing easy customization and integration with other business systems.
  • Ready-to-Deploy Foundation: Provides a strong starting point for various WhatsApp automation use cases.

Use Cases

  • Automating responses to frequently asked questions on WhatsApp, including those sent as voice notes.
  • Providing instant information or support by understanding user queries from text, images, or videos.
  • Building interactive WhatsApp assistants for product inquiries or service bookings.
  • Qualifying leads received via WhatsApp by understanding multimedia messages.
  • Transcribing voice notes and summarizing long text messages for quick review.

Prerequisites

  • An n8n instance (Cloud or self-hosted).
  • WhatsApp Business API account and configured n8n WhatsApp credentials (both Trigger and regular node).
  • Google Gemini API Key (specifically for a model like gemini-1.5-pro-002 or similar with multimodal capabilities).
  • Ensure your Google Gemini API key is configured in the respective n8n credentials for 'Google Gemini (PaLM) Api' and used in the HTTP Request nodes and Langchain Gemini nodes.

Setup Instructions

  1. Download the n8n workflow JSON file.
  2. Import the workflow into your n8n instance.
  3. Configure the 'WhatsApp Trigger' node with your WhatsApp Business API trigger credentials.
  4. Configure all 'WhatsApp' nodes (e.g., 'Get Audio URL', 'Respond to User') with your WhatsApp Business API credentials.
  5. Configure the 'Google Gemini Chat Model', 'Google Gemini Chat Model1', 'Google Gemini Chat Model2' nodes with your Google Gemini API credentials.
  6. Ensure the HTTP Request nodes ('Google Gemini Audio', 'Google Gemini Video') are also configured to use your Google Gemini API credentials (via 'Node Credential Type: googlePalmApi').
  7. Verify the model names in all Google Gemini nodes (e.g., gemini-1.5-pro-002) match what your API key supports.
  8. Customize the system prompt in the 'AI Agent' node if needed for your specific assistant persona.
  9. Review the 'Window Buffer Memory' node for session key configuration (defaults to whatsapp-tutorial-{{ $json.from }}).
  10. Activate the workflow. Ensure your n8n instance is reachable by WhatsApp webhooks if self-hosting.

Tags:

AI AgentWhatsApp AutomationGoogle GeminiMultimodal AIChatbotNLPCustomer SupportLangchainProductivity

Want your own unique AI agent?

Talk to us - we know how to build custom AI agents for your specific needs.

Schedule a Consultation