harshit.pathak
/
content_quality_tool


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186
							         ┌─────────────────────┐
         │   Incoming Product  │
         │   (via API POST)    │
         └─────────┬──────────┘
                   │
                   ▼
      ┌───────────────────────────┐
      │  Validate SKU & Category  │
      └─────────┬─────────────────┘
                │
                ▼
       ┌────────────────────────┐
       │  Fetch/Create Product  │
       │  from Database         │
       └─────────┬─────────────┘
                 │
                 ▼
      ┌────────────────────────────┐
      │  Get Category Rules (Cache) │
      └─────────┬──────────────────┘
                │
                ▼
     ┌─────────────────────────────┐
     │  AttributeQualityScorer      │
     │  (score_product method)      │
     └─────────┬───────────────────┘
               │
               ▼
 ┌────────────────────────────────────────┐
 │  Step 1: Check Mandatory Fields       │
 │  Step 2: Check Standardization        │
 │  Step 3: Check Missing Values         │
 │  Step 4: Check Consistency            │
 └─────────┬─────────────────────────────┘
           │
           ▼
 ┌────────────────────────────────────────┐
 │ Calculate Weighted Final Score         │
 │  - mandatory_fields * 0.4             │
 │  - standardization * 0.3              │
 │  - missing_values * 0.2               │
 │  - consistency * 0.1                  │
 └─────────┬─────────────────────────────┘
           │
           ▼
 ┌────────────────────────────────────────┐
 │  Generate AI Suggestions (Optional)    │
 │  - Uses Gemini service                  │
 │  - Suggest fixes for issues            │
 └─────────┬─────────────────────────────┘
           │
           ▼
 ┌────────────────────────────────────────┐
 │  Save AttributeScore in Database       │
 │  - final_score, breakdown, issues     │
 │  - suggestions, ai_suggestions        │
 └─────────┬─────────────────────────────┘
           │
           ▼
 ┌────────────────────────────────────────┐
 │      Return JSON Response to Client    │
 │  {success, product_sku, score_result} │
 └────────────────────────────────────────┘


       ┌─────────────────────┐
       │ Product Description │
       └─────────┬──────────┘
                 │
                 ▼
          ┌─────────────┐
          │  spaCy NER  │
          │ Extract:    │
          │ - Brand     │
          │ - Size      │
          │ - Product   │
          └─────┬───────┘
                │
                ▼
        ┌───────────────────┐
        │ AI Extraction      │
        │ (Gemini Service)   │
        └─────┬─────────────┘
              │
              ▼
       ┌───────────────────┐
       │ Return Attributes │
       │ as Dict           │
       └───────────────────┘


FOR SEO:

hybrid approach combining KeyBERT for keyword extraction, 
sentence-transformers for semantic analysis, 
and existing Gemini API for intelligent SEO suggestions.


# SEO & Discoverability Implementation Summary

## 📋 What Was Implemented

### Core Feature: SEO & Discoverability Scoring (15% weight)

A comprehensive SEO scoring system that evaluates product listings for search engine optimization and customer discoverability across 4 key dimensions:

| Dimension | Weight | What It Checks |
|-----------|--------|----------------|
| **Keyword Coverage** | 35% | Are mandatory attributes mentioned in title/description? |
| **Semantic Richness** | 30% | Description quality, vocabulary diversity, descriptive language |
| **Backend Keywords** | 20% | Presence of high-value search terms and category keywords |
| **Title Optimization** | 15% | Title length (50-100 chars), structure, no keyword stuffing |

## 🎯 Why This Approach?

### Technology Stack Chosen

| Technology | Purpose | Why This Choice |
|------------|---------|-----------------|
| **KeyBERT** | Keyword extraction | Fast, accurate, open-source. Best for e-commerce SEO |
| **Sentence-Transformers** | Semantic similarity | Lightweight, pre-trained models. Better than full LLMs |
| **Google Gemini** | AI suggestions | Already in your stack. Provides context-aware recommendations |
| **spaCy** | NLP preprocessing | Fast entity recognition, existing in your code |
| **RapidFuzz** | Fuzzy matching | Existing dependency, handles typos well |

### Alternatives Considered & Rejected

❌ **OpenAI GPT** - Too expensive ($0.02/1k tokens), slower, overkill for this use case  
❌ **SEMrush/Ahrefs** - $100-500/month, external API, limited customization  
❌ **LLaMA 2** - Requires GPU, complex setup, slower inference  
❌ **Full BERT models** - Too heavy, KeyBERT uses lighter sentence transformers  

## 📊 Integration Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                     API Request (views.py)                   │
└───────────────────────────┬─────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│          AttributeQualityScorer (attribute_scorer.py)        │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Mandatory Fields (34%)                                │   │
│  │ Standardization (26%)                                 │   │
│  │ Missing Values (17%)                                  │   │
│  │ Consistency (8%)                                      │   │
│  │ ┌────────────────────────────────────────────────┐   │   │
│  │ │ SEO & Discoverability (15%) ← NEW              │   │   │
│  │ │  ├─ Keyword Coverage (35%)                      │   │   │
│  │ │  ├─ Semantic Richness (30%)                     │   │   │
│  │ │  ├─ Backend Keywords (20%)                      │   │   │
│  │ │  └─ Title Optimization (15%)                    │   │   │
│  │ └────────────────────────────────────────────────┘   │   │
│  └──────────────────────────────────────────────────────┘   │
└───────────────────────────┬─────────────────────────────────┘
                            │
                            ├──────────────────┐
                            │                  │
                            ▼                  ▼
              ┌───────────────────┐  ┌──────────────────┐
              │  SEOScorer        │  │ GeminiService    │
              │  (seo_scorer.py)  │  │ (AI Suggestions) │
              │                   │  │                  │
              │ ├─ KeyBERT        │  │ Enhanced with    │
              │ ├─ SentenceModel  │  │ SEO awareness    │
              │ └─ NLP Analysis   │  │                  │
              └───────────────────┘  └──────────────────┘
                            │
                            ▼
                    ┌───────────────┐
                    │  JSON Response │
                    │  with SEO data