AI Agent: Prompt-Based Object Detection with Gemini 2.0
Integrates with:
Overview
Unlock Precise Image Analysis with this AI Agent
This AI Agent harnesses the power of Google's Gemini 2.0 Vision model to perform prompt-based object detection. You provide an image and a natural language prompt (e.g., "Find all the rabbits in this image"), and the agent identifies the specified objects, extracts their bounding box coordinates, and then visually highlights them on the original image. This allows for highly flexible and targeted image analysis without needing to train custom models for every object type.
Key Features & Benefits
- AI-Driven Object Detection: Leverages Google Gemini 2.0 (specifically
gemini-2.0-flash-exp
in this template) for cutting-edge, promptable vision capabilities. - Natural Language Queries: Identify objects using simple text prompts, making complex analysis accessible (e.g., "all cars parked out of bounds").
- Visual Feedback: Automatically draws bounding boxes on the image to clearly show detected objects.
- Accurate Localization: Includes logic to scale normalized coordinates from the AI model (which are 0-1000) to precise locations on the original image.
- Customizable & Extensible: Easily adapt the prompt for different objects or integrate the detected data into other business processes within n8n.
- No Custom Model Training: Get started quickly by describing what you're looking for, instead of lengthy model training cycles for specific objects.
Use Cases
- Automate tagging of product images in e-commerce with specific attributes (e.g., 'red summer dress with floral pattern') for improved search and filtering.
- Moderate user-generated content for B2C platforms by detecting specific unwanted objects or themes in uploaded images based on textual descriptions.
- Enhance data enrichment services for B2B SaaS by identifying key elements in business-related images (e.g., 'all logos on this exhibition stand', 'safety violations on a construction site').
- Automate visual quality control in manufacturing or logistics by prompting for anomalies or specific items in inspection photos (e.g., 'dented boxes', 'incorrectly assembled parts').
Prerequisites
- An n8n instance (Cloud or self-hosted).
- Google Cloud Project with the Generative Language API enabled.
- Google Palm API Key (credentials of type
googlePalmApi
in n8n) with access to a Gemini model likegemini-2.0-flash-exp
or a suitable newer version. - An image accessible via a URL for the test setup, or an alternative image input method for production.
Setup Instructions
- Download the n8n workflow JSON file.
- Import the workflow into your n8n instance.
- Configure the 'Gemini 2.0 Object Detection' HTTP Request node: Select or create your 'Google Palm API' credential. Ensure your Google Cloud account/project associated with this credential has the Generative Language API enabled and access to the specified Gemini model (e.g.,
gemini-2.0-flash-exp
). - Update the 'Get Test Image' HTTP Request node with the URL of the image you want to analyze. Alternatively, modify this part of the workflow to receive image data from a different trigger or node (e.g., webhook, form, local file).
- In the 'Gemini 2.0 Object Detection' node, customize the
text
field within the JSON body. This is your prompt to the AI (e.g., change "I want to see all bounding boxes of rabbits in this image." to describe the objects you're looking for). - Review the
response_schema
in the 'Gemini 2.0 Object Detection' node. The current schema is{"type": "ARRAY", "items": {"type": "OBJECT", "properties": {"box_2d": {"type":"ARRAY", "items": { "type": "NUMBER" } }, "label": { "type": "STRING"}}}}
. If you significantly change your prompting strategy or expect a different output structure from Gemini, you might need to adjust this schema accordingly for reliable JSON parsing. - The 'Draw Bounding Boxes' node is pre-configured to draw up to 6 bounding boxes. If you anticipate detecting more objects, you'll need to add more 'draw' operations within this node or implement a loop if processing many detections dynamically.
- Activate the workflow. You can test it using the 'Test workflow' button to see it in action with the configured image and prompt.
Want your own unique AI agent?
Talk to us - we know how to build custom AI agents for your specific needs.
Schedule a Consultation