Animated data flow diagram

AI Agent for Medoid-Based Anomaly Detection Setup (Crops Dataset)

Version: 1.0.0 | Last Updated: 2025-05-16

Integrates with:

Qdrant Voyage AI Python (Scipy)

Overview

Unlock Advanced Anomaly Detection Setup with this AI Agent

This AI Agent automates the critical setup phase for anomaly detection in image datasets, specifically demonstrated with a crops dataset. It intelligently identifies and establishes representative cluster centers (medoids) and their corresponding similarity thresholds within a Qdrant vector database. This foundation is crucial for accurately flagging unusual or anomalous data points (e.g., diseased crops, foreign objects) in subsequent analysis.

This agent is part 2 of a 3-part series for setting up a complete anomaly detection pipeline. It focuses on preparing your Qdrant collection by defining these crucial cluster centers and thresholds.

Key Features & Benefits

  • Dual Medoid Setup Strategies: Implements two robust methods for defining cluster centers:
    • Distance Matrix Approach: Calculates pairwise distances within each data cluster (e.g., specific crop type) to find the most central point (medoid) using Scipy for efficient matrix operations on data from Qdrant.
    • Multimodal Embedding Approach: Leverages Voyage AI to embed textual descriptions of "ideal" crop types and identifies the closest image vector in Qdrant as the medoid, offering a semantically rich centering method.
  • Automated Threshold Calculation: For each identified medoid, the agent determines a similarity threshold score. This score defines the boundary of "normalcy" for that cluster, enabling precise anomaly detection.
  • Qdrant Integration: Seamlessly interacts with your Qdrant vector database to fetch data, search points, calculate distance matrices (via Qdrant's API), and update point payloads with medoid information and thresholds.
  • AI-Driven Data Understanding: Utilizes Voyage AI for generating text embeddings from crop descriptions and finding semantically similar image vectors within Qdrant, enhancing the accuracy of medoid selection.
  • Data Preparation for Anomaly Detection: Prepares the dataset by identifying and tagging medoids and their thresholds, making the Qdrant collection ready for anomaly detection workflows (e.g., part 3 of this series).
  • Adaptable for Various Datasets: While demonstrated with agricultural crops, the principles and methods can be adapted for other image-based or vector-based anomaly detection tasks across different industries.

Use Cases

  • For B2C e-commerce: Prepare image datasets of products to identify defective items or inconsistent listings by setting up 'normal' product medoids and similarity thresholds in a vector database.
  • For B2B SaaS (e.g., agricultural tech, manufacturing QA): Establish baseline visual characteristics of healthy crops, functioning machinery, or quality products to automatically flag images showing signs of disease, pests, defects, or equipment malfunction.
  • Automate the setup for visual quality control systems by defining 'golden sample' medoids and thresholds for various items or stages.
  • Streamline the data preparation phase for any AI-driven visual anomaly detection system, ensuring robust and accurate model training by clearly defining cluster centers and boundaries.

Prerequisites

  • An n8n instance (Cloud or self-hosted).
  • Qdrant Cloud cluster URL and API Key (Free Tier suitable for initial testing). Ensure n8n has credentials configured for Qdrant.
  • Voyage AI API Key with access to the voyage-multimodal-3 model. Ensure n8n has credentials configured for Voyage AI.
  • A Qdrant collection populated with image embeddings (e.g., from the 'agricultural-crops' dataset, as used in this example, or your own dataset). The points should have a payload field for filtering (e.g., crop_name) and named vectors (e.g., 'voyage' for Voyage AI embeddings).

Setup Instructions

  1. Download the n8n workflow JSON file.
  2. Import the workflow into your n8n instance.
  3. Configure the 'Qdrant cluster variables' Set node: Update qdrantCloudURL with your Qdrant instance URL and collectionName with your target collection name.
  4. Ensure your Qdrant API credentials are correctly configured in n8n and selected in all HTTP Request nodes that communicate with Qdrant (e.g., 'Total Points in Collection', 'Cluster Distance Matrix').
  5. Ensure your Voyage AI API credentials (HTTP Header Auth) are correctly configured in n8n and selected in the 'Embed text' HTTP Request node.
  6. In the 'Medoids Variables' and 'Text Medoids Variables' Set nodes, you can adjust the furthestFromCenter parameter. This controls which point (e.g., the 1st, 5th most dissimilar) is used to define the cluster threshold.
  7. The 'Textual (visual) crop descriptions' Set node contains hardcoded descriptions. If using a different dataset or categories, update these descriptions or modify the workflow to generate/fetch them dynamically based on your crop_name (or equivalent) field.
  8. Verify that filter conditions in nodes like 'Cluster Distance Matrix' and 'Searching Score' correctly use your payload field (e.g., crop_name). Also, ensure the using parameter in search/query nodes matches your vector name in Qdrant (e.g., "voyage").
  9. This workflow is designed to be run to perform a one-time setup of medoids and thresholds in your Qdrant collection. Activate the workflow and test run it.
  10. After successful execution, your Qdrant points representing medoids will have new payload fields like is_medoid: true, is_medoid_cluster_threshold: <score>, is_text_anchor_medoid: true, and is_text_anchor_medoid_cluster_threshold: <score>.

Tags:

AI AgentAnomaly DetectionQdrantVoyage AIVector DatabaseData PreparationMachine LearningAutomationImage Analysis

Want your own unique AI agent?

Talk to us - we know how to build custom AI agents for your specific needs.

Schedule a Consultation