Commit Graph

2 Commits

Author SHA1 Message Date
wasrusgen
c5f662f53d citilink: rewrite parser to walk up from a[href*=/product/] (CSS-in-JS resistant) 2026-05-11 13:57:18 +03:00
wasrusgen
e8b487891f backend: working parsers — OZON + Citilink (DOM via Playwright) + WB
DIAGNOSTIC RESULTS:
- OZON: 19 product links via Playwright on naked VPS-IP ✓
- Citilink: 112 data-meta-name Snippets ✓
- Wildberries: JSON API works with delays ✓
- Я.Маркет, DNS: blocked by ASN (need residential proxy)

OZON PARSER:
- Pure Playwright DOM (composer-api dropped — was blocked)
- Selects a[href*='/product/'], walks up to card div, extracts title/price/img
- Filters fake 'titles' like Распродажа, Скидка

CITILINK PARSER (new):
- Selects [data-meta-name*='Snippet'] or ProductCard markers
- Multiple title selectors fallback chain
- Filters out non-product hits

PARSERS/__init__.py:
- DEFAULT_SOURCES = (ozon, citilink, wb) — all work without proxy
- Я.Маркет, DNS kept but not default — usable when residential proxy added

NEW ENDPOINT:
- GET /api/parse_citilink?q=...&limit=N
2026-05-11 13:53:07 +03:00