Animated data flow diagram

AI Company Research Agent: Automated Data Enrichment

Version: 1.0.0 | Last Updated: 2025-05-16

Integrates with:

OpenAI Google Sheets SerpAPI

Overview

Unlock Automated Company Intelligence with this AI Agent

This n8n workflow acts as an AI-powered research assistant, designed to gather comprehensive information about companies. It takes a company name or domain (from a Google Sheet), then uses an AI agent (leveraging OpenAI's GPT-4o) equipped with web searching (via SerpAPI or optionally ScrapingBee) and website content extraction tools to find and structure key data points. The results are then automatically populated back into your Google Sheet.

This agent is highly adaptable. You can customize the research parameters by modifying the prompt given to the AI and adjusting the desired output structure in the 'Structured Output Parser' node. This allows you to tailor the research to your specific needs, whether it's for sales prospecting, market analysis, or competitive intelligence.

Find more detailed instructions and a video guide here.

Key Features & Benefits

  • AI-Driven Research: Leverages OpenAI's GPT-4o and a Langchain Agent to understand your research queries and orchestrate data collection.
  • Web Exploration: Utilizes SerpAPI (or ScrapingBee) for targeted Google searches and a sub-workflow to scrape and extract text content from websites.
  • Structured Data Output: Automatically extracts and organizes information into a predefined schema, including domain, LinkedIn URL, market (B2B/B2C), cheapest plan, API availability, free trial presence, enterprise plan details, and integrations.
  • Google Sheets Integration: Reads company inputs from and writes enriched data back to your Google Sheets, streamlining your data pipeline.
  • Batch Processing: Efficiently processes multiple companies by iterating through rows in your input sheet.
  • Customizable & Extensible: Easily modify the AI prompt and output schema to research different data points. The n8n platform allows further customization and integration with other tools.
  • Scheduled Automation: Can be set to run automatically on a schedule (e.g., every 2 hours) or triggered manually.

Use Cases

  • For B2B SaaS: Automate lead enrichment by finding company domains, LinkedIn profiles, technology stacks, pricing tiers, and API availability.
  • For B2C E-commerce: Identify competitor pricing strategies, product features, free trial offers, and key integrations to inform your market positioning.
  • For Solopreneurs/Founders: Quickly gather intelligence on potential partners, competitors, or market segments without manual research overhead.
  • For Heads of Automation: Implement a scalable solution for continuous company data gathering and enrichment, feeding CRMs or sales intelligence platforms.

Prerequisites

  • An n8n instance (Cloud or self-hosted).
  • OpenAI API Key with access to a suitable model (e.g., GPT-4o is used in the template).
  • Google Sheets API credentials (OAuth2).
  • SerpAPI API Key. (Alternatively, a ScrapingBee API key if you plan to use the ScrapingBee tool node).
  • A Google Sheet prepared for input and output. You can use this template to get started. Make a copy of it.

Setup Instructions

  1. Download the n8n workflow JSON file (ai-company-research-agent-v1.0.0.json).
  2. Import the workflow into your n8n instance.
  3. Configure Google Sheets Nodes:
    • 'Get rows to enrich' node: Set up your Google Sheets OAuth2 credentials. Enter the Spreadsheet ID and Sheet Name from your copied template (or your own sheet). Ensure it has columns for company input (e.g., company_input or input as per the Set node {{ $json.input }}), row_number, and enrichment_status (used to filter rows, e.g., find rows not yet processed).
    • 'Google Sheets - Update Row with data' node: Use the same Google Sheets credentials. Enter the Spreadsheet ID and Sheet Name. Verify the column mapping under 'Columns' > 'Value' matches your sheet and the data points from the 'AI Researcher Output Data' node. The row_number is used for matching.
  4. Configure AI Components:
    • 'OpenAI Chat Model' node: Enter your OpenAI API Key in the credentials section and select your desired model (e.g., gpt-4o).
    • 'AI company researcher' (Agent) node: Review and customize the prompt in the text parameter. This defines what information the AI should find.
    • 'SerpAPI - Search Google' tool node: Enter your SerpAPI API Key in the credentials section. If you prefer ScrapingBee, you can configure the 'Search Google with ScrapingBee' tool node instead and connect it to the agent (disconnect SerpAPI tool then).
    • 'Structured Output Parser' node: Ensure the inputSchema matches the data points you've instructed the AI to find in the agent's prompt. This defines the structure of the output.
  5. Review Sub-Workflows (Tools):
    • 'Get website content' tool: This sub-workflow uses standard HTTP Request and HTML nodes. It should work out-of-the-box for most public websites.
  6. Test the Workflow: Run it manually with one or two sample companies in your Google Sheet to ensure everything is configured correctly.
  7. Activate Workflow: Once tested, activate the workflow. You can use the 'Schedule Trigger' (defaulted to every 2 hours) for regular execution or trigger it manually via 'When clicking "Test workflow"'.

Tags:

AI AgentAutomationOpenAIData EnrichmentWeb ScrapingSerpAPIGoogle SheetsLead GenerationMarket Research

Want your own unique AI agent?

Talk to us - we know how to build custom AI agents for your specific needs.

Schedule a Consultation