┌─────────────────────┐ │ Incoming Product │ │ (via API POST) │ └─────────┬──────────┘ │ ▼ ┌───────────────────────────┐ │ Validate SKU & Category │ └─────────┬─────────────────┘ │ ▼ ┌────────────────────────┐ │ Fetch/Create Product │ │ from Database │ └─────────┬─────────────┘ │ ▼ ┌────────────────────────────┐ │ Get Category Rules (Cache) │ └─────────┬──────────────────┘ │ ▼ ┌─────────────────────────────┐ │ AttributeQualityScorer │ │ (score_product method) │ └─────────┬───────────────────┘ │ ▼ ┌────────────────────────────────────────┐ │ Step 1: Check Mandatory Fields │ │ Step 2: Check Standardization │ │ Step 3: Check Missing Values │ │ Step 4: Check Consistency │ └─────────┬─────────────────────────────┘ │ ▼ ┌────────────────────────────────────────┐ │ Calculate Weighted Final Score │ │ - mandatory_fields * 0.4 │ │ - standardization * 0.3 │ │ - missing_values * 0.2 │ │ - consistency * 0.1 │ └─────────┬─────────────────────────────┘ │ ▼ ┌────────────────────────────────────────┐ │ Generate AI Suggestions (Optional) │ │ - Uses Gemini service │ │ - Suggest fixes for issues │ └─────────┬─────────────────────────────┘ │ ▼ ┌────────────────────────────────────────┐ │ Save AttributeScore in Database │ │ - final_score, breakdown, issues │ │ - suggestions, ai_suggestions │ └─────────┬─────────────────────────────┘ │ ▼ ┌────────────────────────────────────────┐ │ Return JSON Response to Client │ │ {success, product_sku, score_result} │ └────────────────────────────────────────┘ ┌─────────────────────┐ │ Product Description │ └─────────┬──────────┘ │ ▼ ┌─────────────┐ │ spaCy NER │ │ Extract: │ │ - Brand │ │ - Size │ │ - Product │ └─────┬───────┘ │ ▼ ┌───────────────────┐ │ AI Extraction │ │ (Gemini Service) │ └─────┬─────────────┘ │ ▼ ┌───────────────────┐ │ Return Attributes │ │ as Dict │ └───────────────────┘ FOR SEO: hybrid approach combining KeyBERT for keyword extraction, sentence-transformers for semantic analysis, and existing Gemini API for intelligent SEO suggestions. # SEO & Discoverability Implementation Summary ## 📋 What Was Implemented ### Core Feature: SEO & Discoverability Scoring (15% weight) A comprehensive SEO scoring system that evaluates product listings for search engine optimization and customer discoverability across 4 key dimensions: | Dimension | Weight | What It Checks | |-----------|--------|----------------| | **Keyword Coverage** | 35% | Are mandatory attributes mentioned in title/description? | | **Semantic Richness** | 30% | Description quality, vocabulary diversity, descriptive language | | **Backend Keywords** | 20% | Presence of high-value search terms and category keywords | | **Title Optimization** | 15% | Title length (50-100 chars), structure, no keyword stuffing | ## 🎯 Why This Approach? ### Technology Stack Chosen | Technology | Purpose | Why This Choice | |------------|---------|-----------------| | **KeyBERT** | Keyword extraction | Fast, accurate, open-source. Best for e-commerce SEO | | **Sentence-Transformers** | Semantic similarity | Lightweight, pre-trained models. Better than full LLMs | | **Google Gemini** | AI suggestions | Already in your stack. Provides context-aware recommendations | | **spaCy** | NLP preprocessing | Fast entity recognition, existing in your code | | **RapidFuzz** | Fuzzy matching | Existing dependency, handles typos well | ### Alternatives Considered & Rejected ❌ **OpenAI GPT** - Too expensive ($0.02/1k tokens), slower, overkill for this use case ❌ **SEMrush/Ahrefs** - $100-500/month, external API, limited customization ❌ **LLaMA 2** - Requires GPU, complex setup, slower inference ❌ **Full BERT models** - Too heavy, KeyBERT uses lighter sentence transformers ## 📊 Integration Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ API Request (views.py) │ └───────────────────────────┬─────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ AttributeQualityScorer (attribute_scorer.py) │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Mandatory Fields (34%) │ │ │ │ Standardization (26%) │ │ │ │ Missing Values (17%) │ │ │ │ Consistency (8%) │ │ │ │ ┌────────────────────────────────────────────────┐ │ │ │ │ │ SEO & Discoverability (15%) ← NEW │ │ │ │ │ │ ├─ Keyword Coverage (35%) │ │ │ │ │ │ ├─ Semantic Richness (30%) │ │ │ │ │ │ ├─ Backend Keywords (20%) │ │ │ │ │ │ └─ Title Optimization (15%) │ │ │ │ │ └────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────┘ │ └───────────────────────────┬─────────────────────────────────┘ │ ├──────────────────┐ │ │ ▼ ▼ ┌───────────────────┐ ┌──────────────────┐ │ SEOScorer │ │ GeminiService │ │ (seo_scorer.py) │ │ (AI Suggestions) │ │ │ │ │ │ ├─ KeyBERT │ │ Enhanced with │ │ ├─ SentenceModel │ │ SEO awareness │ │ └─ NLP Analysis │ │ │ └───────────────────┘ └──────────────────┘ │ ▼ ┌───────────────┐ │ JSON Response │ │ with SEO data