AI Invoice Data Extractor: Gmail to Google Sheets via LlamaParse & OpenAI
Integrates with:
Overview
Unlock Automated Invoice Processing with this AI Agent
This AI-driven workflow streamlines your accounts payable process by automatically fetching PDF invoices from your Gmail, intelligently parsing them, extracting key information, and organizing it neatly into a Google Sheet. It's designed to handle complex PDFs, including those with tables and figures, ensuring accurate data capture without manual intervention.
This agent acts as a dedicated AI for invoice data extraction. It first monitors your Gmail for new invoices based on your filters. When an invoice PDF is detected and not yet processed (checked via a 'invoice synced' label), it's sent to LlamaParse for robust conversion into a machine-readable format (markdown). Then, an OpenAI LLM (e.g., GPT-3.5-turbo) is prompted to extract specific fields like invoice number, dates, supplier details, line items, and totals. The extracted data is structured according to a predefined schema using a Langchain output parser and then appended to your designated Google Sheet. Finally, the processed email is labeled in Gmail to prevent duplicate entries.
Key Features & Benefits
- Automated Email Monitoring: Watches your Gmail for incoming invoices with PDF attachments based on custom filters.
- Advanced PDF Parsing: Leverages LlamaParse to accurately convert complex PDFs, including embedded tables and figures, into markdown text, crucial for structured data extraction.
- AI-Powered Data Extraction: Uses OpenAI (GPT models via Langchain) to intelligently identify and extract predefined data fields from the parsed invoice content.
- Structured Data Output: Employs a Langchain structured output parser with a customizable JSON schema to ensure data is consistently formatted for direct use in spreadsheets or other systems.
- Google Sheets Integration: Automatically appends extracted invoice data to a specified Google Sheet for reconciliation, reporting, or further analysis.
- Duplicate Prevention: Labels processed emails in Gmail (e.g., 'invoice synced') to avoid redundant operations and maintain data integrity.
- Customizable Extraction Logic: Easily adapt the data fields to be extracted by modifying the JSON schema in the 'Structured Output Parser' node and tailoring the prompt in the 'Apply Data Extraction Rules' (LLM Chain) node.
- Handles Processing Delays: Includes logic to wait and check LlamaParse job status, ensuring the workflow proceeds only when parsing is complete.
Use Cases
- B2B SaaS: Automate the extraction of data from supplier PDF invoices (e.g., software subscriptions, hosting fees) received via email, directly populating financial reconciliation sheets in Google Sheets.
- E-commerce: Streamline vendor invoice processing by automatically capturing details from PDF attachments in Gmail, like shipping costs or inventory purchases, for easier accounting and inventory management.
- Solopreneurs & Founders: Save hours of manual data entry by having this AI Agent watch for client invoices or expense receipts in PDF, extracting key details into a centralized Google Sheet for financial tracking and tax preparation.
- Automation Heads & CTOs: Implement a robust, AI-driven solution for accounts payable that intelligently handles varied invoice formats, reducing manual errors, improving processing speed, and freeing up team capacity.
Prerequisites
- An n8n instance (Cloud or self-hosted).
- Gmail account credentials (OAuth2 recommended for secure connection).
- LlamaParse API Key from LlamaIndex Cloud. You can sign up at https://cloud.llamaindex.ai/. The free tier typically allows a generous number of PDF parses per day.
- OpenAI API Key with access to a model like
gpt-3.5-turbo-1106
or newer. - Google Sheets API credentials (OAuth2) to allow writing to your spreadsheet.
- A label named
invoice synced
(or your preferred name) pre-created in your Gmail account to mark processed emails.
Setup Instructions
- Download the n8n workflow JSON file (
ai-invoice-data-extractor-v1.0.0.json
) and import it into your n8n instance. - Configure Gmail Trigger ('Receiving Invoices'):
- Connect your Gmail account using OAuth2 credentials in the 'Receiving Invoices' node.
- Adjust the
sender
filter (e.g.,invoices@example.com
) or other query parameters (q
) as needed. - Ensure the
downloadAttachments
option is enabled. - The workflow checks for emails NOT having the
invoice synced
label. Make sure this label exists in your Gmail.
- Configure LlamaParse HTTP Request Nodes:
- In the 'Upload to LlamaParse' node, select or create 'HttpHeaderAuth' credentials. Name it (e.g., 'LlamaIndex API'). Set Header Name to
Authorization
and Value toBearer YOUR_LLAMAPARSE_API_KEY
(replaceYOUR_LLAMAPARSE_API_KEY
with your actual key). - Verify the URLs in 'Get Processing Status' and 'Get Parsed Invoice Data' nodes match current LlamaParse API endpoints if they ever change.
- In the 'Upload to LlamaParse' node, select or create 'HttpHeaderAuth' credentials. Name it (e.g., 'LlamaIndex API'). Set Header Name to
- Configure OpenAI & Langchain Nodes:
- In the 'OpenAI Model' node, select your OpenAI credential and choose a suitable model (the template uses
gpt-3.5-turbo-1106
). Adjusttemperature
if needed (0 is good for factual extraction). - In the 'Structured Output Parser' node, review and customize the
jsonSchema
. This defines the exact data points the AI will attempt to extract (e.g., 'Invoice date', 'invoice number', 'Line items'). Modify this to match your specific needs. - In the 'Apply Data Extraction Rules' (LLM Chain) node, critically review and refine the prompt in the
text
parameter. This prompt guides the LLM on how to extract information from the markdown content (provided by LlamaParse) based on your schema. The existing prompt is a good starting point.
- In the 'OpenAI Model' node, select your OpenAI credential and choose a suitable model (the template uses
- Configure Google Sheets Node ('Append to Reconciliation Sheet'):
- Connect your Google Sheets account using OAuth2 credentials.
- In the
documentId
field, select your target Google Spreadsheet. - In the
sheetName
field, select the specific sheet (tab) where data should be appended. - The 'Map Output' node prepares data based on the 'Structured Output Parser'. Ensure your Google Sheet has columns corresponding to the keys in your
jsonSchema
for seamless data entry.
- Configure Gmail Action Node ('Add "invoice synced" Label'):
- Select your Gmail account.
- Ensure the correct
labelIds
for your 'invoice synced' label is selected to mark emails after successful processing. You can find this ID by listing labels or using the picker if available.
- Test Thoroughly: Send a test email with a PDF invoice that matches your Gmail trigger filters. Manually execute the workflow or wait for the trigger to pick it up. Check each step for errors and verify data in Google Sheets.
- Activate the workflow to begin fully automated invoice processing.
Want your own unique AI agent?
Talk to us - we know how to build custom AI agents for your specific needs.
Schedule a Consultation