فهرست منبع

Changes in title

Student Yadav 3 ماه پیش
والد
کامیت
1384a9bbf0
1فایلهای تغییر یافته به همراه306 افزوده شده و 148 حذف شده
  1. 306 148
      core/services/gemini_service.py

+ 306 - 148
core/services/gemini_service.py

@@ -1860,8 +1860,6 @@
 
 
 
-
-
 
 
 
@@ -2159,13 +2157,13 @@ class GeminiAttributeService:
             return result
     
     def _build_comprehensive_prompt(
-        self,
-        product: Dict,
-        issues: List[str],
-        rules: List[Dict],
-        scores: Dict
-    ) -> str:
-        """Build comprehensive prompt for all quality aspects with title structure analysis"""
+    self,
+    product: Dict,
+    issues: List[str],
+    rules: List[Dict],
+    scores: Dict
+) -> str:
+        """Build comprehensive prompt with MAXIMUM anti-hallucination enforcement and mandatory multi-element titles"""
         sku = product.get('sku', 'UNKNOWN')
         logger.debug(f"SKU {sku}: Building comprehensive prompt")
         
@@ -2186,147 +2184,310 @@ class GeminiAttributeService:
         import random
         quality_score_target = random.uniform(90.2, 95.9)
         
-        prompt = f"""Analyze this e-commerce product and provide comprehensive quality improvements including detailed title structure analysis.
-
-Note: quality_score_prediction should be in range of 90 to 95 
-
-PRODUCT DATA:
-SKU: {product.get('sku')}
-Category: {product.get('category')}
-Title: {product.get('title', '')[:250]}
-Description: {product.get('description', '')[:400]}
-Attributes: {json.dumps(product.get('attributes', {}), ensure_ascii=False)}
-
-QUALITY SCORES (out of 100):
-- Mandatory Fields: {scores.get('mandatory_fields', 0):.1f}
-- Standardization: {scores.get('standardization', 0):.1f}
-- Missing Values: {scores.get('missing_values', 0):.1f}
-- Consistency: {scores.get('consistency', 0):.1f}
-- SEO: {scores.get('seo_discoverability', 0):.1f}
-- Title Quality: {scores.get('title_quality', 0):.1f}
-- Description Quality: {scores.get('description_quality', 0):.1f}
-
-CATEGORY RULES:
-Mandatory Attributes: {', '.join(mandatory_attrs)}
-Valid Values: {json.dumps(valid_values_map, ensure_ascii=False)}
-
-ISSUES FOUND:
-Attributes ({len(attribute_issues)}):
-{chr(10).join(f"  • {i}" for i in attribute_issues[:8])}
-
-Title ({len(title_issues)}):
-{chr(10).join(f"  • {i}" for i in title_issues[:5])}
-
-Description ({len(desc_issues)}):
-{chr(10).join(f"  • {i}" for i in desc_issues[:5])}
-
-SEO ({len(seo_issues)}):
-{chr(10).join(f"  • {i}" for i in seo_issues[:5])}
-
-CATEGORY-SPECIFIC TITLE STRUCTURE GUIDELINES:
-
-For T-Shirts:
-Recommended sequence: Brand + Gender + Product Type + Key Feature + Material + Size + Color + Pack Size
-Element explanations:
-- Brand: Builds trust and improves SEO ranking
-- Gender: Targets specific audience (Men's/Women's/Unisex)
-- Product Type: Core identifier (T-Shirt, Tee, Polo)
-- Key Feature: Differentiator (Slim Fit, V-Neck, Graphic)
-- Material: Search relevance (Cotton, Polyester, Blend)
-- Size: Conversion factor (S/M/L/XL or Specific measurements)
-- Color: Visual match (Black, White, Navy Blue)
-- Pack Size: Value indicator (Pack of 3, Single)
-
-Examples:
-✓ Good: "Nike Men's Slim Fit Cotton T-Shirt, Black, Large"
-✓ Good: "Hanes Women's V-Neck Polyester Blend T-Shirt Pack of 3, White, Medium"
-✗ Bad: "Nice T-Shirt for Men" (missing brand, features, specifics)
-✗ Bad: "SUPER COMFORTABLE AMAZING TSHIRT BLACK" (all caps, no structure)
-
-For Food:
-Recommended sequence: Brand + Product Name + Flavor/Variety + Size/Weight + Type + Pack Size
-Element explanations:
-- Brand: Recognition and trust (Kellogg's, Organic Valley)
-- Product Name: Core identity (Corn Flakes, Whole Milk)
-- Flavor/Variety: Taste appeal (Original, Chocolate, Strawberry)
-- Size/Weight: Practical info (18 oz, 1 Gallon, 500g)
-- Type: Dietary needs (Organic, Gluten-Free, Low-Fat)
-- Pack Size: Bulk value (Box, 6-Pack, Family Size)
-
-Examples:
-✓ Good: "Kellogg's Corn Flakes Cereal, Original Flavor, 18 oz Box"
-✓ Good: "Organic Valley Whole Milk, 1 Gallon, Grass-Fed"
-✗ Bad: "Delicious Cereal" (missing brand, specifics, size)
-✗ Bad: "Food Product 500g" (generic, no appeal)
-
-For Chairs:
-Recommended sequence: Brand + Type + Key Feature + Material + Color + Additional Features
-Element explanations:
-- Brand: Quality assurance (Herman Miller, IKEA)
-- Type: Category search (Office Chair, Desk Chair, Gaming Chair)
-- Key Feature: Differentiator (Ergonomic, High Back, Swivel)
-- Material: Durability info (Mesh, Leather, Fabric)
-- Color: Aesthetic match (Black, Gray, White)
-- Additional Features: Conversion boost (Adjustable Arms, Lumbar Support)
-
-Examples:
-✓ Good: "Herman Miller Aeron Ergonomic Office Chair, Mesh Fabric, Black, Adjustable Arms"
-✓ Good: "IKEA Markus Swivel Desk Chair, Leather, Gray, High Back"
-✗ Bad: "Comfortable Chair" (missing brand, type, features)
-✗ Bad: "Chair for Office Black Color" (awkward structure, no features)
-
-CRITICAL INSTRUCTION - TITLE STRUCTURE ANALYSIS:
-You MUST analyze the current product title and identify which elements are present or missing based on the category-specific structure above. For each element in the recommended sequence, indicate:
-- "present": The element exists in the title with the actual value found
-- "missing": The element is not in the title
-- "value": The actual text/value found for that element (if present)
-
-Return ONLY this JSON structure:
-{{
-  "title_structure_analysis": {{
-    "category": "T-Shirts/Food/Chairs",
-    "recommended_sequence": ["Brand", "Gender", "Product Type", "Key Feature", "Material", "Size", "Color", "Pack Size"],
-    "current_title_breakdown": {{
-      "Brand": {{"status": "present/missing", "value": "Nike" or null, "explanation": "why it matters"}},
-      "Gender": {{"status": "present/missing", "value": "Men's" or null, "explanation": "targets audience"}},
-      "Product Type": {{"status": "present/missing", "value": "T-Shirt" or null, "explanation": "core identifier"}},
-      "Key Feature": {{"status": "present/missing", "value": "Slim Fit" or null, "explanation": "differentiator"}},
-      "Material": {{"status": "present/missing", "value": "Cotton" or null, "explanation": "search relevance"}},
-      "Size": {{"status": "present/missing", "value": "Large" or null, "explanation": "conversion factor"}},
-      "Color": {{"status": "present/missing", "value": "Black" or null, "explanation": "visual match"}},
-      "Pack Size": {{"status": "present/missing", "value": null, "explanation": "value indicator"}}
-    }},
-    "completeness_score": 75,
-    "missing_elements": ["Size", "Pack Size"],
-    "structure_quality": "good/fair/poor",
-    "structure_notes": "Brief assessment of title structure quality"
-  }},
-  "corrected_attributes": {{
-    "attr_name": "corrected_value"
-  }},
-  "missing_attributes": {{
-    "attr_name": "suggested_value"
-  }},
-  "improved_title": "optimized title following recommended sequence with all elements",
-  "improved_description": "enhanced description (50-150 words, features, benefits, specs, use cases)",
-  "seo_keywords": ["keyword1", "keyword2", "keyword3"],
-  "improvements": [
+        # Extract ALL data sources comprehensively
+        available_attrs = product.get('attributes', {})
+        title = product.get('title', '')
+        description = product.get('description', '')
+        category = product.get('category', '')
+        
+        # Helper function to safely extract values
+        def safe_extract(sources, keys):
+            """Extract first non-empty value from multiple sources and keys"""
+            for source in sources:
+                if not source:
+                    continue
+                for key in keys:
+                    val = source.get(key) if isinstance(source, dict) else None
+                    if val and str(val).strip() and str(val).lower() not in ['null', 'none', 'n/a', 'na', '']:
+                        return str(val).strip()
+            return None
+        
+        # Extract from title by parsing common patterns
+        def extract_from_title(title_text, pattern_type):
+            """Extract information from title text"""
+            if not title_text:
+                return None
+            title_lower = title_text.lower()
+            
+            if pattern_type == 'brand':
+                # Brand is usually first word(s) before product type
+                words = title_text.split()
+                if words:
+                    return words[0]
+            elif pattern_type == 'size':
+                # Look for size patterns: 50ml, 30ml, L, M, S, XL, etc.
+                size_match = re.search(r'\b(\d+(?:\.\d+)?(?:ml|oz|g|kg|l|lb))\b', title_text, re.IGNORECASE)
+                if size_match:
+                    return size_match.group(1)
+                size_match = re.search(r'\b(XXS|XS|S|M|L|XL|XXL|XXXL)\b', title_text, re.IGNORECASE)
+                if size_match:
+                    return size_match.group(1)
+            elif pattern_type == 'color':
+                # Common colors
+                colors = ['black', 'white', 'blue', 'red', 'green', 'yellow', 'pink', 'purple', 'brown', 'grey', 'gray', 'beige', 'navy', 'orange']
+                for color in colors:
+                    if color in title_lower:
+                        return color.title()
+            elif pattern_type == 'gender':
+                if "women" in title_lower or "women's" in title_lower:
+                    return "Women's"
+                elif "men" in title_lower or "men's" in title_lower:
+                    return "Men's"
+                elif "unisex" in title_lower:
+                    return "Unisex"
+            
+            return None
+        
+        # Comprehensive extraction with multiple fallback sources
+        brand = safe_extract(
+            [available_attrs, {'title_extract': extract_from_title(title, 'brand')}],
+            ['brand', 'Brand', 'BRAND', 'manufacturer', 'Manufacturer', 'title_extract']
+        )
+        
+        gender = safe_extract(
+            [available_attrs, {'title_extract': extract_from_title(title, 'gender')}],
+            ['gender', 'Gender', 'GENDER', 'target_gender', 'title_extract']
+        )
+        
+        material = safe_extract(
+            [available_attrs],
+            ['material', 'Material', 'MATERIAL', 'fabric', 'Fabric']
+        )
+        
+        size = safe_extract(
+            [available_attrs, {'title_extract': extract_from_title(title, 'size')}],
+            ['size', 'Size', 'SIZE', 'volume', 'Volume', 'weight', 'Weight', 'title_extract']
+        )
+        
+        color = safe_extract(
+            [available_attrs, {'title_extract': extract_from_title(title, 'color')}],
+            ['color', 'Color', 'COLOR', 'colour', 'Colour', 'title_extract']
+        )
+        
+        product_type = safe_extract(
+            [available_attrs, {'category': category}],
+            ['product_type', 'type', 'Type', 'category', 'Category', 'product_category']
+        )
+        
+        # Extract key features from title and description
+        feature_keywords = ['puff sleeve', 'shirred', 'slim fit', 'regular fit', 'long lasting', 
+                        'resurfacing', 'moisturizing', 'hydrating', 'anti-aging', 'brightening',
+                        'eau de parfum', 'eau de toilette', 'retinol', 'ceramides', 'niacinamide']
+        
+        key_features = []
+        combined_text = f"{title} {description}".lower()
+        for feature in feature_keywords:
+            if feature in combined_text:
+                # Capitalize properly
+                key_features.append(' '.join(word.capitalize() for word in feature.split()))
+        
+        key_feature = ', '.join(key_features[:2]) if key_features else None
+        
+        # Create explicit data inventory
+        data_inventory = {
+            'Brand': brand,
+            'Gender': gender,
+            'Product Type': product_type or category,
+            'Key Feature': key_feature,
+            'Material': material,
+            'Size': size,
+            'Color': color
+        }
+        
+        # Filter to only available data
+        available_data = {k: v for k, v in data_inventory.items() if v}
+        missing_data = [k for k, v in data_inventory.items() if not v]
+        
+        # Create detailed inventory display
+        inventory_display = "\n".join([
+            f"  ✅ {k}: \"{v}\"" for k, v in available_data.items()
+        ])
+        
+        missing_display = "\n".join([
+            f"  ❌ {k}: NOT AVAILABLE - MUST NOT USE" for k in missing_data
+        ])
+        
+        prompt = f"""You are a strict e-commerce data validator. Generate ONLY factual product improvements.
+
+    🚫 ABSOLUTE PROHIBITIONS (WILL CAUSE FAILURE):
+    1. NEVER invent sizes (M, L, XL, S, etc.) if not in data below
+    2. NEVER invent materials (Cotton, Polyester, etc.) if not in data below
+    3. NEVER invent features (Slim Fit, Regular, etc.) if not in data below
+    4. NEVER use generic terms like "Long Lasting", "Standard", "Classic" unless in original data
+    5. The improved_title MUST contain AT LEAST 3 elements from available data
+    6. If only 1-2 elements available, reuse product type with key features from description
+
+    Note: quality_score_prediction should be in range of 90 to 95 
+
+    ═══════════════════════════════════════════════════════════
+    PRODUCT DATA - THIS IS YOUR ONLY SOURCE OF TRUTH:
+    ═══════════════════════════════════════════════════════════
+    SKU: {product.get('sku')}
+    Category: {category}
+    Title: {title}
+    Description: {description[:500]}
+    All Attributes: {json.dumps(available_attrs, ensure_ascii=False)}
+
+    ═══════════════════════════════════════════════════════════
+    EXTRACTED DATA INVENTORY - USE ONLY THESE VALUES:
+    ═══════════════════════════════════════════════════════════
+    {inventory_display if inventory_display else "  (No attributes extracted)"}
+
+    {missing_display}
+
+    TOTAL AVAILABLE: {len(available_data)} elements
+    TOTAL MISSING: {len(missing_data)} elements
+
+    ⚠️ CRITICAL: Your improved_title can ONLY use values shown above with ✅
+
+    ═══════════════════════════════════════════════════════════
+    QUALITY SCORES (out of 100):
+    ═══════════════════════════════════════════════════════════
+    - Mandatory Fields: {scores.get('mandatory_fields', 0):.1f}
+    - Standardization: {scores.get('standardization', 0):.1f}
+    - Missing Values: {scores.get('missing_values', 0):.1f}
+    - Consistency: {scores.get('consistency', 0):.1f}
+    - SEO: {scores.get('seo_discoverability', 0):.1f}
+    - Title Quality: {scores.get('title_quality', 0):.1f}
+    - Description Quality: {scores.get('description_quality', 0):.1f}
+
+    CATEGORY RULES:
+    Mandatory Attributes: {', '.join(mandatory_attrs)}
+
+    ═══════════════════════════════════════════════════════════
+    ISSUES FOUND:
+    ═══════════════════════════════════════════════════════════
+    Attributes ({len(attribute_issues)}):
+    {chr(10).join(f"  • {i}" for i in attribute_issues[:8])}
+
+    Title ({len(title_issues)}):
+    {chr(10).join(f"  • {i}" for i in title_issues[:5])}
+
+    Description ({len(desc_issues)}):
+    {chr(10).join(f"  • {i}" for i in desc_issues[:5])}
+
+    SEO ({len(seo_issues)}):
+    {chr(10).join(f"  • {i}" for i in seo_issues[:5])}
+
+    ═══════════════════════════════════════════════════════════
+    TITLE CONSTRUCTION RULES:
+    ═══════════════════════════════════════════════════════════
+
+    RULE 1: MINIMUM LENGTH REQUIREMENT
+    - improved_title MUST contain AT LEAST 3 distinct elements
+    - If fewer than 3 elements available, extract more from description
+    - Single-word titles are STRICTLY FORBIDDEN
+
+    RULE 2: ELEMENT ORDERING (use available elements in this order)
+    For CLOTHING/DRESSES:
+    Brand → Gender → Product Type → Key Feature → Material → Size → Color
+    
+    For SKINCARE:
+    Brand → Product Type → Key Benefit → Skin Type → Key Ingredient → Size
+    
+    For PERFUME:
+    Brand → Product Name → Fragrance Type → Gender → Size → Concentration
+
+    RULE 3: EXTRACTION PRIORITY
+    1. Use explicit attribute values first (✅ marked above)
+    2. Extract from title if obvious (e.g., "Puff Sleeve" from "Puff Sleeve Dress")
+    3. Extract from description if clear (e.g., "Hydrating" from "delivers hydration")
+    4. NEVER invent if not extractable
+
+    ═══════════════════════════════════════════════════════════
+    EXAMPLES OF CORRECT BEHAVIOR:
+    ═══════════════════════════════════════════════════════════
+
+    Example 1 - DRESS:
+    Available: Brand="Blue Vanilla", Product Type="Dress", Key Feature="Puff Sleeve Shirred", Color="Blue"
+    Missing: Size, Material, Gender
+    ✅ CORRECT: "Blue Vanilla Dress Puff Sleeve Shirred Blue"
+    ❌ WRONG: "Blue Vanilla M Blue" (too short, invented size)
+    ❌ WRONG: "Blue Vanilla Dress Slim Fit Cotton M Blue" (invented Slim Fit, Cotton, M)
+
+    Example 2 - SKINCARE:
+    Available: Brand="CeraVe", Product Type="Moisturising Cream", Key Benefit="Hydrating", Key Ingredient="Ceramides", Size="50ml"
+    Missing: Skin Type, Material
+    ✅ CORRECT: "CeraVe Moisturising Cream Hydrating Ceramides 50ml"
+    ❌ WRONG: "CeraVe" (too short)
+    ❌ WRONG: "CeraVe Cream Hydrating Dry Skin 50ml" (invented "Dry Skin" - though in description, not in attributes)
+
+    Example 3 - PERFUME:
+    Available: Brand="Calvin Klein", Product Name="Euphoria", Fragrance Type="Eau de Parfum", Gender="Women", Size="50ml"
+    Missing: Concentration, Color
+    ✅ CORRECT: "Calvin Klein Euphoria Eau de Parfum Women 50ml"
+    ❌ WRONG: "Calvin Klein Euphoria Eau de Parfum Long Lasting" (invented "Long Lasting", missing size)
+
+    ═══════════════════════════════════════════════════════════
+    RESPONSE FORMAT:
+    ═══════════════════════════════════════════════════════════
+
+    Return ONLY this JSON structure:
+
     {{
-      "component": "attributes/title/description/seo",
-      "issue": "specific issue",
-      "suggestion": "how to fix",
-      "priority": "high/medium/low",
-      "confidence": "high/medium/low"
+    "data_validation": {{
+        "available_elements": {list(available_data.keys())},
+        "available_count": {len(available_data)},
+        "missing_elements": {missing_data},
+        "can_build_valid_title": true/false,
+        "reason": "explanation if cannot build valid title"
+    }},
+    "title_construction": {{
+        "elements_used": ["element1", "element2", "element3"],
+        "values_used": ["value1", "value2", "value3"],
+        "element_count": 3,
+        "construction_logic": "Explain how you built the title using ONLY available data"
+    }},
+    "improved_title": "MUST BE 3+ ELEMENTS, USING ONLY ✅ VALUES ABOVE",
+    "improved_description": "enhanced description (50-150 words, based ONLY on available product data)",
+    "seo_keywords": ["keyword1", "keyword2", "keyword3"],
+    "corrected_attributes": {{
+        "attr_name": "corrected_value (ONLY if data exists to correct)"
+    }},
+    "missing_attributes": {{
+        "attr_name": "Cannot suggest - no source data available"
+    }},
+    "improvements": [
+        {{
+        "component": "attributes/title/description/seo",
+        "issue": "specific issue",
+        "suggestion": "how to fix (state if data unavailable)",
+        "priority": "high/medium/low",
+        "confidence": "high/medium/low",
+        "requires_external_data": true/false
+        }}
+    ],
+    "quality_score_prediction": {quality_score_target:.1f},
+    "summary": "2-3 sentences on improvements, noting data limitations",
+    "hallucination_verification": {{
+        "passed": true/false,
+        "invented_data": [],
+        "all_data_sourced": true/false,
+        "title_meets_minimum_length": true/false
+    }}
     }}
-  ],
-  "quality_score_prediction": {quality_score_target:.1f},
-  "summary": "Brief 2-3 sentence summary of key improvements needed"
-}}
 
-CRITICAL: Keep response under 7000 tokens. Focus on top 5 most impactful improvements and complete title structure analysis."""
+    ═══════════════════════════════════════════════════════════
+    FINAL VERIFICATION BEFORE RESPONDING:
+    ═══════════════════════════════════════════════════════════
+    □ Does improved_title contain AT LEAST 3 elements?
+    □ Is EVERY element in improved_title present in "✅ Available" list?
+    □ Did I avoid ALL values marked with "❌ NOT AVAILABLE"?
+    □ Did I check that I didn't invent sizes (M, L, XL)?
+    □ Did I check that I didn't invent materials (Cotton, Polyester)?
+    □ Did I check that I didn't invent generic features (Long Lasting, Standard)?
+    □ Is my title longer than just 1-2 words?
+
+    If you cannot build a valid title with at least 3 elements from available data,
+    set "can_build_valid_title": false and explain why in the response."""
+        
+        logger.debug(f"SKU {sku}: Prompt built with maximum enforcement, final length: {len(prompt)} characters")
+        logger.debug(f"SKU {sku}: Available data elements: {list(available_data.keys())}")
+        logger.debug(f"SKU {sku}: Missing data elements: {missing_data}")
         
-        logger.debug(f"SKU {sku}: Prompt built, final length: {len(prompt)} characters")
         return prompt
+
+
     
     def _parse_response(self, response_text: str, sku: str = 'UNKNOWN') -> Dict:
         """Enhanced JSON parsing with fallback strategies"""
@@ -2518,6 +2679,3 @@ CRITICAL: Keep response under 7000 tokens. Focus on top 5 most impactful improve
         
         logger.info(f"Generated {len(suggestions)} fallback suggestions")
         return suggestions
-    
-
-