AI RAG Agent: Context-Aware Document Chunking from Google Drive to Pinecone via OpenRouter & Gemini

Version: 1.0.0 | Last Updated: 2025-05-16

Integrates with:

Core AI Power

7/10

Automation Level

6/10

Integration Reach

4 systems

Setup Simplicity

5/10

Adaptability

7/10

Overview

Unlock Enhanced RAG Performance with this AI Agent

This AI Agent streamlines the preparation of your documents from Google Drive for Retrieval Augmented Generation (RAG) systems. It fetches documents, intelligently splits them into manageable sections, uses an AI model via OpenRouter (e.g., Gemini) to generate a concise context for each section, and then embeds this context-enriched text into Pinecone using Google Gemini embeddings. This process significantly improves the relevance and accuracy of information retrieved by your LLM applications.

Key Features & Benefits

Automated Document Ingestion: Fetches documents directly from Google Drive.
Intelligent Sectioning: Splits documents into logical sections based on a custom separator.
AI-Powered Contextualization: Leverages an LLM via OpenRouter (configurable, e.g., Gemini) to analyze each section within the context of the entire document, generating a succinct summary to enhance its retrievability.
High-Quality Embeddings: Utilizes Google Gemini's text-embedding-004 model for creating robust vector embeddings.
Vector Storage Automation: Seamlessly inserts the contextualized and vectorized sections into your specified Pinecone index.
RAG Optimization: Designed to improve the quality of context provided to LLMs, leading to more accurate and relevant responses in RAG pipelines.
Customizable: Adaptable to various document structures and RAG system requirements.

Use Cases

B2C E-commerce: Enhance customer support chatbots by processing product catalogs and FAQs from Google Drive into a context-aware Pinecone index, enabling more precise answers to customer queries.
B2B SaaS: Build a powerful internal knowledge base or customer-facing help center by ingesting technical documentation, API guides, and case studies from Google Drive, making information easily searchable and contextually rich.
Automate the preparation of large document sets for AI-driven research, ensuring that retrieved information is highly relevant to the query context.
Streamline the data pipeline for sophisticated Q&A systems, content summarization tools, and intelligent co-pilots by providing them with contextually optimized data chunks.

Prerequisites

An n8n instance (Cloud or self-hosted).
Google Drive credentials with access to the target document(s).
Pinecone API Key and an existing Pinecone index (e.g., context-rag-test as used in the template).
OpenRouter API Key.
Google Cloud API Key with access to the Gemini API (specifically for models/text-embedding-004).

Setup Instructions

Download the n8n workflow JSON file.
Import the workflow into your n8n instance.
Configure 'Get Document From Google Drive' node:
- Authenticate your Google Drive account.
- Set the File ID parameter to the ID of the Google Document you wish to process.
- Ensure Options > Google File Conversion > Convert Google Docs to format is set to text/plain.
Configure 'OpenRouter Chat Model' node:
- Select or create new credentials for OpenRouter, providing your API Key.
- (Optional) Choose a specific model compatible with your OpenRouter account if different from the default.
Configure 'Embeddings Google Gemini' node:
- Select or create new credentials for Google Gemini (PaLM) API, providing your API Key.
- Ensure the Model Name is set (e.g., models/text-embedding-004).
Configure 'Pinecone Vector Store' node:
- Select or create new credentials for Pinecone, providing your API Key and Environment.
- In Pinecone Index, select List mode and choose or enter the name of your target Pinecone index (e.g., context-rag-test).
Review 'Split Document Text Into Sections' (Code Node):
- The current setup splits the document by —---------------------------—-------------[SECTIONEND]—---------------------------—-------------. If your document uses a different separator for sections, modify the split_text variable in the JavaScript code accordingly.
Review 'AI Agent - Prepare Context' node:
- The prompt is designed to generate context for each section. You can customize this prompt if needed for different contextualization strategies.
Activate the workflow.
Trigger the workflow manually (using 'When clicking ‘Test workflow’') to process your document.

Tags:

AI AgentRAGContextual ChunkingGoogle DrivePineconeGeminiOpenRouterVector DatabaseDocument Processing

Want your own unique AI agent?

Talk to us - we know how to build custom AI agents for your specific needs.

Request a Consultation