Animated data flow diagram

AI Agent: Prompt-Based Object Detection with Gemini 2.0

Version: 1.0.0 | Last Updated: 2025-05-16

Integrates with:

Google Gemini HTTP Services

Overview

Unlock Precise Image Analysis with this AI Agent

This AI Agent harnesses the power of Google's Gemini 2.0 Vision model to perform prompt-based object detection. You provide an image and a natural language prompt (e.g., "Find all the rabbits in this image"), and the agent identifies the specified objects, extracts their bounding box coordinates, and then visually highlights them on the original image. This allows for highly flexible and targeted image analysis without needing to train custom models for every object type.

Key Features & Benefits

  • AI-Driven Object Detection: Leverages Google Gemini 2.0 (specifically gemini-2.0-flash-exp in this template) for cutting-edge, promptable vision capabilities.
  • Natural Language Queries: Identify objects using simple text prompts, making complex analysis accessible (e.g., "all cars parked out of bounds").
  • Visual Feedback: Automatically draws bounding boxes on the image to clearly show detected objects.
  • Accurate Localization: Includes logic to scale normalized coordinates from the AI model (which are 0-1000) to precise locations on the original image.
  • Customizable & Extensible: Easily adapt the prompt for different objects or integrate the detected data into other business processes within n8n.
  • No Custom Model Training: Get started quickly by describing what you're looking for, instead of lengthy model training cycles for specific objects.

Use Cases

  • Automate tagging of product images in e-commerce with specific attributes (e.g., 'red summer dress with floral pattern') for improved search and filtering.
  • Moderate user-generated content for B2C platforms by detecting specific unwanted objects or themes in uploaded images based on textual descriptions.
  • Enhance data enrichment services for B2B SaaS by identifying key elements in business-related images (e.g., 'all logos on this exhibition stand', 'safety violations on a construction site').
  • Automate visual quality control in manufacturing or logistics by prompting for anomalies or specific items in inspection photos (e.g., 'dented boxes', 'incorrectly assembled parts').

Prerequisites

  • An n8n instance (Cloud or self-hosted).
  • Google Cloud Project with the Generative Language API enabled.
  • Google Palm API Key (credentials of type googlePalmApi in n8n) with access to a Gemini model like gemini-2.0-flash-exp or a suitable newer version.
  • An image accessible via a URL for the test setup, or an alternative image input method for production.

Setup Instructions

  1. Download the n8n workflow JSON file.
  2. Import the workflow into your n8n instance.
  3. Configure the 'Gemini 2.0 Object Detection' HTTP Request node: Select or create your 'Google Palm API' credential. Ensure your Google Cloud account/project associated with this credential has the Generative Language API enabled and access to the specified Gemini model (e.g., gemini-2.0-flash-exp).
  4. Update the 'Get Test Image' HTTP Request node with the URL of the image you want to analyze. Alternatively, modify this part of the workflow to receive image data from a different trigger or node (e.g., webhook, form, local file).
  5. In the 'Gemini 2.0 Object Detection' node, customize the text field within the JSON body. This is your prompt to the AI (e.g., change "I want to see all bounding boxes of rabbits in this image." to describe the objects you're looking for).
  6. Review the response_schema in the 'Gemini 2.0 Object Detection' node. The current schema is {"type": "ARRAY", "items": {"type": "OBJECT", "properties": {"box_2d": {"type":"ARRAY", "items": { "type": "NUMBER" } }, "label": { "type": "STRING"}}}}. If you significantly change your prompting strategy or expect a different output structure from Gemini, you might need to adjust this schema accordingly for reliable JSON parsing.
  7. The 'Draw Bounding Boxes' node is pre-configured to draw up to 6 bounding boxes. If you anticipate detecting more objects, you'll need to add more 'draw' operations within this node or implement a loop if processing many detections dynamically.
  8. Activate the workflow. You can test it using the 'Test workflow' button to see it in action with the configured image and prompt.

Tags:

AI AgentGoogle GeminiObject DetectionImage AnalysisVision AIAutomationMultimodal AIPrompt EngineeringAI

Want your own unique AI agent?

Talk to us - we know how to build custom AI agents for your specific needs.

Schedule a Consultation