AI Document Chat Agent: Google Drive, OpenAI & Pinecone RAG
Integrates with:
Overview
Unlock Instant Answers from Your Documents with this AI Agent
This n8n workflow sets up a powerful Retrieval Augmented Generation (RAG) pipeline. First, it fetches a specified document from your Google Drive, breaks it into manageable chunks, and uses OpenAI to generate embeddings (numerical representations) for each chunk. These embeddings are then stored in a Pinecone vector database.
Once your document is processed, you can interact with it through a chat interface. When you ask a question, the agent embeds your query, searches Pinecone for the most relevant document chunks, and then feeds these chunks along with your question to an OpenAI chat model to generate a concise, context-aware answer.
Key Features & Benefits
- Automated Knowledge Ingestion: Seamlessly pulls documents from Google Drive.
- Intelligent Document Processing: Uses Langchain's Recursive Character Text Splitter to intelligently segment documents for optimal retrieval.
- Advanced AI Embeddings: Leverages OpenAI's state-of-the-art models to create rich embeddings for semantic understanding.
- Scalable Vector Storage: Utilizes Pinecone for efficient storage and fast retrieval of document embeddings.
- Conversational Q&A: Employs OpenAI's chat models (e.g., GPT-3.5-turbo, GPT-4) via Langchain's QA chain for natural language interaction.
- Build Custom Knowledge Bases: Easily turn your business documents, SOPs, FAQs, or research papers into queryable assets.
- Two-Part Workflow: Clear separation between data loading (manual trigger) and chat interaction (chat trigger).
Use Cases
- Querying internal documentation (e.g., SOPs, product specs, legal documents) for quick, specific answers.
- Building a customer support assistant trained on your FAQs and help documentation from Google Drive.
- Creating a research tool that allows you to chat with academic papers or lengthy reports.
- Streamlining employee onboarding by enabling new hires to ask questions about company policies stored in Drive.
- Analyzing and extracting key information from sets of documents without manual reading.
Prerequisites
- An n8n instance (Cloud or self-hosted).
- OpenAI API Key with access to embedding models (e.g.,
text-embedding-ada-002
) and chat models (e.g.,gpt-3.5-turbo
orgpt-4
). - Pinecone API Key, environment, and an existing Pinecone index. The index must be configured with 1536 dimensions to match OpenAI's
text-embedding-ada-002
model. - Google Drive credentials configured in n8n, with access to the target document.
Setup Instructions
- Download the n8n workflow JSON file (
ai-document-chat-agent-gdrive-pinecone-v1.0.0.json
). - Import the workflow into your n8n instance.
- Configure Google Drive File URL: In the 'Set Google Drive file URL' node, update the
value
field with the complete URL of your Google Drive file. The 'Google Drive' node is intended to use this value (its 'File ID' parameter should ideally be set to an expression like{{ $json.file_url }}
). If not, you might need to paste the URL directly into the 'Google Drive' node's 'File ID' field or set the expression manually. - Set OpenAI Credentials: Enter your OpenAI API Key in the 'OpenAI API' credential fields for the following nodes: 'Embeddings OpenAI', 'Embeddings OpenAI2', and 'OpenAI Chat Model'.
- Configure Pinecone: a. Ensure you have a Pinecone index created with 1536 dimensions. b. In both the 'Insert into Pinecone vector store' and 'Read Pinecone Vector Store' nodes, select or create your Pinecone API credentials and specify your Pinecone index name and environment.
- (Optional) Adjust Text Splitting: If needed, modify the
Chunk Size
andChunk Overlap
parameters in the 'Recursive Character Text Splitter' node to fine-tune how the document is segmented. - Load Your Document: Click the 'Test Workflow' button located at the bottom of the canvas (this triggers the 'When clicking 'Test Workflow' button' node). This process will fetch your document from Google Drive, split it, generate embeddings, and store them in your Pinecone index. Note: The 'Insert into Pinecone vector store' node is configured by default to clear the namespace on each run, useful for testing fresh.
- Chat with Your Document: Once data loading is complete, click the 'Chat' button at the bottom of the canvas (this activates the 'When clicking 'Chat' button below' trigger). A chat interface will appear, allowing you to ask questions about the content of your document.
- Activate the workflow to keep it running or if you plan to trigger it via other means.
Want your own unique AI agent?
Talk to us - we know how to build custom AI agents for your specific needs.
Schedule a Consultation