Commit Graph

58 Commits

Author SHA1 Message Date
wasrusgen
5fdae262ef backend: parse_* endpoints sync (FastAPI threadpool) — fix Playwright asyncio conflict 2026-05-11 13:30:51 +03:00
wasrusgen
d5f290bd0a backend: Playwright + Chromium for JS-rendered sites (Я.Маркет, OZON fallback)
DOCKERFILE:
- + Chromium system deps (libnss3, libxkbcommon0, libgbm1, libgtk-3-0, etc.)
- + RUN python -m playwright install chromium (~150MB)
- + ENV PLAYWRIGHT_BROWSERS_PATH

REQUIREMENTS:
- + playwright >= 1.45

PARSERS:
- new playwright_engine.py — singleton browser, isolated context per request,
  blocks images/fonts/CSS to save memory, waits for selector + JS hydration
- yamarket.py — rewritten to use Playwright (Я.Маркет is React SPA)
- ozon.py — Playwright fallback when composer-api returns challenge (403)
- wb.py — exponential backoff on 429, still uses direct HTTP (JSON API, no JS needed)

STRATEGY (Hybrid Path C):
- Я.Маркет: Playwright (rendering JS)
- OZON: composer-api first, Playwright fallback
- WB: direct HTTP with backoff (JSON API, fast)
- DNS: kept but lower priority (Qrator hard to crack)
- No more proxy needed for primary path

DEPLOY: removed PROXY_STATIC_LIST from .env, expect ~5min for first build (Chromium download)
2026-05-11 13:25:05 +03:00
wasrusgen
3ee5275ea0 backend: PROXY_STATIC_LIST support (manual proxies without API token)
- proxy_pool now loads from both PROXY_STATIC_LIST (env, comma-separated) and PROXY6_TOKEN (API)
- Static list has priority, merged with API list (dedup by URL)
- /api/proxy_status returns masked proxy URLs for diagnostic (passwords hidden)
- Supports formats: 'http://user:pass@host:port' or 'host:port' (assumed http://)
2026-05-11 13:03:29 +03:00
wasrusgen
82425dbd88 backend: Proxy6 pool + parsers WB / OZON / Я.Маркет / DNS
PROXY POOL (app/proxy_pool.py):
- Loads active proxies from Proxy6.net API every 10 min
- Random rotation per request via proxied_client(timeout, headers)
- Graceful fallback to direct HTTP if PROXY6_TOKEN not set
- Config: PROXY6_TOKEN env var

PARSERS (app/parsers/):
- dns.py — refactored to use proxy_pool with retry+rotation on Qrator block
- wb.py — Wildberries JSON API (search.wb.ru), retries on 429
- ozon.py — OZON composer-api JSON (widgetStates extraction)
- yamarket.py — Я.Маркет HTML + embedded JSON parser
- __init__.py — enrich_one() fans out to all sources, aggregates min/max prices, max rating, sum reviews
- enrich_models() — batch enrich for AI by_category output

NEW DIAGNOSTIC ENDPOINTS (main.py):
- GET /api/parse_wb?q=...&limit=N
- GET /api/parse_ozon?q=...&limit=N
- GET /api/parse_yamarket?q=...&limit=N
- GET /api/parse_all?q=... — fan-out + aggregate
- GET /api/proxy_status — pool diagnostics (count, token configured, age)

PODBOR (main.py):
- _enrich_ai_with_dns -> _enrich_ai_marketplaces (uses all sources)

DEPLOY: needs PROXY6_TOKEN in /opt/zov-tech/deploy/.env on VPS, then docker compose build + up -d backend
2026-05-11 12:18:04 +03:00
wasrusgen
64edb76035 backend: new state-shape AI prompt + DNS parser MVP
AI PROMPT (ai.py):
- Документирует новую форму checklist (per_cat.answers, brand_strategy, single_brand, brands, budget_preset, pick_strategies)
- Просит вернуть 3-5 моделей по КАЖДОЙ категории (не одну)
- Новый формат ответа: by_category[cat].models[] с brand/model/price_min/price_max/search_query/pros/cons/tier
- Подробные правила для бренд-стратегий (single → вся техника одной марки; different → preferred/acceptable/avoid)
- Бюджет-пресеты с авто-распределением по категориям (fridge ~25%, hob ~12% и т.д.)

DNS PARSER (parsers/dns.py):
- search_dns(query, limit) — HTTP + BeautifulSoup
- Реалистичный User-Agent, фолбэк на JSON-LD если HTML-селекторы не сработали
- enrich_models(models) — обогащает список моделей от AI, добавляя dns: {title, price, image, url, rating, reviews}
- Вежливая задержка 0.4с между запросами

MAIN.PY:
- /api/parse_dns?q=... — тестовый эндпоинт для проверки парсера
- _handle_podbor теперь после AI вызывает _enrich_ai_with_dns для каждой модели
- _format_podbor_for_telegram переписан под новый формат by_category — выводит 3-5 моделей в каждой категории с pros/cons
- Fallback на старый формат items[] для совместимости

REQUIREMENTS:
- + beautifulsoup4 >= 4.12
- + lxml >= 5.2

DEPLOY: после пуша на VPS нужно пересобрать backend контейнер (docker compose up --build -d backend)
2026-05-11 11:42:37 +03:00
wasrusgen
b7fa20dc69 fix(backend/sheets): write ISO-string for datetime (gspread can't serialize datetime) 2026-05-10 22:28:13 +03:00
wasrusgen
5263840582 fix(backend): retry sheet open if cached _book is None (after PermissionError) 2026-05-10 21:06:02 +03:00
wasrusgen
0e5895bdc4 feat(infra): Python FastAPI backend + Docker compose for VPS deploy (GigaChat with Russian root CA) 2026-05-10 17:44:21 +03:00