Commit Graph

2 Commits

Author SHA1 Message Date
wasrusgen
fe472b0827 catalog: filter junk + background refresh + clear endpoint
FILTERING (catalog.py _save_results):
- CATEGORY_KEYWORDS: must contain category word ('холодильник', 'варочн', 'духов', etc.)
- CATEGORY_MIN_PRICE: filters parts/accessories (fridge >20k, hood >5k, etc.)
- PART_BLACKLIST: 'фильтр', 'лампочк', 'термодатчик', 'шланг', 'тэн', 'компрессор', etc.
- Previously had Asko light bulb (155₽), Miele dryer filter (376₽), Siemens cooktop in fridge category — all now filtered out

ASYNC REFRESH (main.py):
- POST /api/catalog/refresh queues background task, returns immediately
  (was sync, taking 3+ min → Cloudflare tunnel was killing connection)
- New GET /api/catalog/refresh_status for progress polling
- Concurrent refresh blocked (one at a time)

CLEAR ENDPOINT:
- POST /api/catalog/clear?cat=fridge clears one category
- POST /api/catalog/clear clears entire catalog (start over)

NEXT: clear current dirty data, re-seed fridge with filters
2026-05-12 07:09:33 +03:00
wasrusgen
9e652c4a34 catalog: models cache in Sheets — AI picks from real list, no SKU hallucination
NEW MODULE app/catalog.py:
- refresh_catalog(cats, sources, per_brand, delay) — runs parsers for seed brand+category pairs
- list_catalog(cat, tier, brand) — reads from Sheets
- list_for_ai(cats, tiers) — compact text for AI prompt context
- SEED_BRANDS_BY_TIER + CATEGORY_QUERIES — 22 brands × 8 cats = 176 combos
- Saves top-2 relevant results per (brand × cat), filters by brand presence in title
- Dedup by title hash within (cat, brand) bucket

SHEETS:
- ensure_sheet(name, headers) — auto-creates Catalog tab on first refresh
- Schema: id, category, brand, tier, model_name, search_query, price_min/max, image_url, source, url, last_seen_at

ENDPOINTS:
- POST /api/catalog/refresh?cat=X&per_brand=N — manual refresh (1 cat ~2-5 min)
- GET /api/catalog/list?cat=&tier=&brand= — read with filters
- GET /api/catalog/preview_ai?cats=fridge — debug what AI receives

AI PROMPT:
- Rule #0: if catalog passed in user prompt — MUST select only from there
- _build_catalog_context: filters by checklist.budget_preset → tier subset
  (luxe→premium, premium→premium+middle, middle→middle, budget→middle+budget)

_handle_podbor:
- Loads catalog subset, appends to user_prompt as 'ДОСТУПНЫЙ КАТАЛОГ МОДЕЛЕЙ'
- AI 'выбирай ТОЛЬКО из этого списка' rule reinforced

NEXT: trigger refresh manually for 1 category (~3 min), then real podbor test
to verify AI uses catalog models instead of hallucinating SKUs
2026-05-12 06:32:39 +03:00