123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186 |
- ┌─────────────────────┐
- │ Incoming Product │
- │ (via API POST) │
- └─────────┬──────────┘
- │
- ▼
- ┌───────────────────────────┐
- │ Validate SKU & Category │
- └─────────┬─────────────────┘
- │
- ▼
- ┌────────────────────────┐
- │ Fetch/Create Product │
- │ from Database │
- └─────────┬─────────────┘
- │
- ▼
- ┌────────────────────────────┐
- │ Get Category Rules (Cache) │
- └─────────┬──────────────────┘
- │
- ▼
- ┌─────────────────────────────┐
- │ AttributeQualityScorer │
- │ (score_product method) │
- └─────────┬───────────────────┘
- │
- ▼
- ┌────────────────────────────────────────┐
- │ Step 1: Check Mandatory Fields │
- │ Step 2: Check Standardization │
- │ Step 3: Check Missing Values │
- │ Step 4: Check Consistency │
- └─────────┬─────────────────────────────┘
- │
- ▼
- ┌────────────────────────────────────────┐
- │ Calculate Weighted Final Score │
- │ - mandatory_fields * 0.4 │
- │ - standardization * 0.3 │
- │ - missing_values * 0.2 │
- │ - consistency * 0.1 │
- └─────────┬─────────────────────────────┘
- │
- ▼
- ┌────────────────────────────────────────┐
- │ Generate AI Suggestions (Optional) │
- │ - Uses Gemini service │
- │ - Suggest fixes for issues │
- └─────────┬─────────────────────────────┘
- │
- ▼
- ┌────────────────────────────────────────┐
- │ Save AttributeScore in Database │
- │ - final_score, breakdown, issues │
- │ - suggestions, ai_suggestions │
- └─────────┬─────────────────────────────┘
- │
- ▼
- ┌────────────────────────────────────────┐
- │ Return JSON Response to Client │
- │ {success, product_sku, score_result} │
- └────────────────────────────────────────┘
- ┌─────────────────────┐
- │ Product Description │
- └─────────┬──────────┘
- │
- ▼
- ┌─────────────┐
- │ spaCy NER │
- │ Extract: │
- │ - Brand │
- │ - Size │
- │ - Product │
- └─────┬───────┘
- │
- ▼
- ┌───────────────────┐
- │ AI Extraction │
- │ (Gemini Service) │
- └─────┬─────────────┘
- │
- ▼
- ┌───────────────────┐
- │ Return Attributes │
- │ as Dict │
- └───────────────────┘
- FOR SEO:
- hybrid approach combining KeyBERT for keyword extraction,
- sentence-transformers for semantic analysis,
- and existing Gemini API for intelligent SEO suggestions.
- # SEO & Discoverability Implementation Summary
- ## 📋 What Was Implemented
- ### Core Feature: SEO & Discoverability Scoring (15% weight)
- A comprehensive SEO scoring system that evaluates product listings for search engine optimization and customer discoverability across 4 key dimensions:
- | Dimension | Weight | What It Checks |
- |-----------|--------|----------------|
- | **Keyword Coverage** | 35% | Are mandatory attributes mentioned in title/description? |
- | **Semantic Richness** | 30% | Description quality, vocabulary diversity, descriptive language |
- | **Backend Keywords** | 20% | Presence of high-value search terms and category keywords |
- | **Title Optimization** | 15% | Title length (50-100 chars), structure, no keyword stuffing |
- ## 🎯 Why This Approach?
- ### Technology Stack Chosen
- | Technology | Purpose | Why This Choice |
- |------------|---------|-----------------|
- | **KeyBERT** | Keyword extraction | Fast, accurate, open-source. Best for e-commerce SEO |
- | **Sentence-Transformers** | Semantic similarity | Lightweight, pre-trained models. Better than full LLMs |
- | **Google Gemini** | AI suggestions | Already in your stack. Provides context-aware recommendations |
- | **spaCy** | NLP preprocessing | Fast entity recognition, existing in your code |
- | **RapidFuzz** | Fuzzy matching | Existing dependency, handles typos well |
- ### Alternatives Considered & Rejected
- ❌ **OpenAI GPT** - Too expensive ($0.02/1k tokens), slower, overkill for this use case
- ❌ **SEMrush/Ahrefs** - $100-500/month, external API, limited customization
- ❌ **LLaMA 2** - Requires GPU, complex setup, slower inference
- ❌ **Full BERT models** - Too heavy, KeyBERT uses lighter sentence transformers
- ## 📊 Integration Architecture
- ```
- ┌─────────────────────────────────────────────────────────────┐
- │ API Request (views.py) │
- └───────────────────────────┬─────────────────────────────────┘
- │
- ▼
- ┌─────────────────────────────────────────────────────────────┐
- │ AttributeQualityScorer (attribute_scorer.py) │
- │ ┌──────────────────────────────────────────────────────┐ │
- │ │ Mandatory Fields (34%) │ │
- │ │ Standardization (26%) │ │
- │ │ Missing Values (17%) │ │
- │ │ Consistency (8%) │ │
- │ │ ┌────────────────────────────────────────────────┐ │ │
- │ │ │ SEO & Discoverability (15%) ← NEW │ │ │
- │ │ │ ├─ Keyword Coverage (35%) │ │ │
- │ │ │ ├─ Semantic Richness (30%) │ │ │
- │ │ │ ├─ Backend Keywords (20%) │ │ │
- │ │ │ └─ Title Optimization (15%) │ │ │
- │ │ └────────────────────────────────────────────────┘ │ │
- │ └──────────────────────────────────────────────────────┘ │
- └───────────────────────────┬─────────────────────────────────┘
- │
- ├──────────────────┐
- │ │
- ▼ ▼
- ┌───────────────────┐ ┌──────────────────┐
- │ SEOScorer │ │ GeminiService │
- │ (seo_scorer.py) │ │ (AI Suggestions) │
- │ │ │ │
- │ ├─ KeyBERT │ │ Enhanced with │
- │ ├─ SentenceModel │ │ SEO awareness │
- │ └─ NLP Analysis │ │ │
- └───────────────────┘ └──────────────────┘
- │
- ▼
- ┌───────────────┐
- │ JSON Response │
- │ with SEO data
|