architecture.txt 9.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186
  1. ┌─────────────────────┐
  2. │ Incoming Product │
  3. │ (via API POST) │
  4. └─────────┬──────────┘
  5. ┌───────────────────────────┐
  6. │ Validate SKU & Category │
  7. └─────────┬─────────────────┘
  8. ┌────────────────────────┐
  9. │ Fetch/Create Product │
  10. │ from Database │
  11. └─────────┬─────────────┘
  12. ┌────────────────────────────┐
  13. │ Get Category Rules (Cache) │
  14. └─────────┬──────────────────┘
  15. ┌─────────────────────────────┐
  16. │ AttributeQualityScorer │
  17. │ (score_product method) │
  18. └─────────┬───────────────────┘
  19. ┌────────────────────────────────────────┐
  20. │ Step 1: Check Mandatory Fields │
  21. │ Step 2: Check Standardization │
  22. │ Step 3: Check Missing Values │
  23. │ Step 4: Check Consistency │
  24. └─────────┬─────────────────────────────┘
  25. ┌────────────────────────────────────────┐
  26. │ Calculate Weighted Final Score │
  27. │ - mandatory_fields * 0.4 │
  28. │ - standardization * 0.3 │
  29. │ - missing_values * 0.2 │
  30. │ - consistency * 0.1 │
  31. └─────────┬─────────────────────────────┘
  32. ┌────────────────────────────────────────┐
  33. │ Generate AI Suggestions (Optional) │
  34. │ - Uses Gemini service │
  35. │ - Suggest fixes for issues │
  36. └─────────┬─────────────────────────────┘
  37. ┌────────────────────────────────────────┐
  38. │ Save AttributeScore in Database │
  39. │ - final_score, breakdown, issues │
  40. │ - suggestions, ai_suggestions │
  41. └─────────┬─────────────────────────────┘
  42. ┌────────────────────────────────────────┐
  43. │ Return JSON Response to Client │
  44. │ {success, product_sku, score_result} │
  45. └────────────────────────────────────────┘
  46. ┌─────────────────────┐
  47. │ Product Description │
  48. └─────────┬──────────┘
  49. ┌─────────────┐
  50. │ spaCy NER │
  51. │ Extract: │
  52. │ - Brand │
  53. │ - Size │
  54. │ - Product │
  55. └─────┬───────┘
  56. ┌───────────────────┐
  57. │ AI Extraction │
  58. │ (Gemini Service) │
  59. └─────┬─────────────┘
  60. ┌───────────────────┐
  61. │ Return Attributes │
  62. │ as Dict │
  63. └───────────────────┘
  64. FOR SEO:
  65. hybrid approach combining KeyBERT for keyword extraction,
  66. sentence-transformers for semantic analysis,
  67. and existing Gemini API for intelligent SEO suggestions.
  68. # SEO & Discoverability Implementation Summary
  69. ## 📋 What Was Implemented
  70. ### Core Feature: SEO & Discoverability Scoring (15% weight)
  71. A comprehensive SEO scoring system that evaluates product listings for search engine optimization and customer discoverability across 4 key dimensions:
  72. | Dimension | Weight | What It Checks |
  73. |-----------|--------|----------------|
  74. | **Keyword Coverage** | 35% | Are mandatory attributes mentioned in title/description? |
  75. | **Semantic Richness** | 30% | Description quality, vocabulary diversity, descriptive language |
  76. | **Backend Keywords** | 20% | Presence of high-value search terms and category keywords |
  77. | **Title Optimization** | 15% | Title length (50-100 chars), structure, no keyword stuffing |
  78. ## 🎯 Why This Approach?
  79. ### Technology Stack Chosen
  80. | Technology | Purpose | Why This Choice |
  81. |------------|---------|-----------------|
  82. | **KeyBERT** | Keyword extraction | Fast, accurate, open-source. Best for e-commerce SEO |
  83. | **Sentence-Transformers** | Semantic similarity | Lightweight, pre-trained models. Better than full LLMs |
  84. | **Google Gemini** | AI suggestions | Already in your stack. Provides context-aware recommendations |
  85. | **spaCy** | NLP preprocessing | Fast entity recognition, existing in your code |
  86. | **RapidFuzz** | Fuzzy matching | Existing dependency, handles typos well |
  87. ### Alternatives Considered & Rejected
  88. ❌ **OpenAI GPT** - Too expensive ($0.02/1k tokens), slower, overkill for this use case
  89. ❌ **SEMrush/Ahrefs** - $100-500/month, external API, limited customization
  90. ❌ **LLaMA 2** - Requires GPU, complex setup, slower inference
  91. ❌ **Full BERT models** - Too heavy, KeyBERT uses lighter sentence transformers
  92. ## 📊 Integration Architecture
  93. ```
  94. ┌─────────────────────────────────────────────────────────────┐
  95. │ API Request (views.py) │
  96. └───────────────────────────┬─────────────────────────────────┘
  97. ┌─────────────────────────────────────────────────────────────┐
  98. │ AttributeQualityScorer (attribute_scorer.py) │
  99. │ ┌──────────────────────────────────────────────────────┐ │
  100. │ │ Mandatory Fields (34%) │ │
  101. │ │ Standardization (26%) │ │
  102. │ │ Missing Values (17%) │ │
  103. │ │ Consistency (8%) │ │
  104. │ │ ┌────────────────────────────────────────────────┐ │ │
  105. │ │ │ SEO & Discoverability (15%) ← NEW │ │ │
  106. │ │ │ ├─ Keyword Coverage (35%) │ │ │
  107. │ │ │ ├─ Semantic Richness (30%) │ │ │
  108. │ │ │ ├─ Backend Keywords (20%) │ │ │
  109. │ │ │ └─ Title Optimization (15%) │ │ │
  110. │ │ └────────────────────────────────────────────────┘ │ │
  111. │ └──────────────────────────────────────────────────────┘ │
  112. └───────────────────────────┬─────────────────────────────────┘
  113. ├──────────────────┐
  114. │ │
  115. ▼ ▼
  116. ┌───────────────────┐ ┌──────────────────┐
  117. │ SEOScorer │ │ GeminiService │
  118. │ (seo_scorer.py) │ │ (AI Suggestions) │
  119. │ │ │ │
  120. │ ├─ KeyBERT │ │ Enhanced with │
  121. │ ├─ SentenceModel │ │ SEO awareness │
  122. │ └─ NLP Analysis │ │ │
  123. └───────────────────┘ └──────────────────┘
  124. ┌───────────────┐
  125. │ JSON Response │
  126. │ with SEO data