# KaggleIngest: AI-Optimized Context Engine KaggleIngest is a high-performance SaaS platform designed to transform complex Kaggle competition data into token-efficient context for Large Language Models. It solves the context window bottleneck by intelligently ranking content and using the Token-Oriented Object Notation (TOON) v2.0 format. ## Technology Stack - **Backend**: FastAPI (Python 3.12+), PostgreSQL (Persistence & Caching), Kaggle API. - **Frontend**: React (Vite), Neo-brutalist / Premium Dark Theme. - **Architecture**: PostgreSQL-as-Everything. Replaced Redis/SQLite with a unified Postgres engine for state, FTS search, and UNLOGGED high-speed caching. - **Security**: X-API-Key based authentication with tiered usage management. ## Core Capabilities - **Smart Context Ranking**: Custom scoring algorithm (Log(Upvotes) * TimeDecay) prioritizes significant, recent solution patterns. - **TOON v2.0**: A proprietary data format achieving 30-60% token reduction compared to JSON while preserving semantic depth. - **Dual-Track Ingestion**: Instant retrieval for cached competitions; automated background fetching for new requests. - **Robust Parsing**: Resilient ingestion for legacy notebook formats and multi-encoding datasets. ## API Reference Base URL: `http://localhost:8000` (Local) | Deployed SaaS URL ### 1. Authentication - `POST /auth/signup`: Create account and receive API key (`ki_...`). - `GET /auth/me`: Verify session and check remaining credits. ### 2. Search - `GET /search?query={slug}`: Ranked full-text and fuzzy search across Kaggle. ### 3. Competition Context - `GET /competitions/{slug}`: Returns full ingestion context. - If cached: Returns `status: completed` with `toon_content`. - If miss: Returns `status: processing` and triggers background fetch. ## TOON v2.0 Structure for LLMs - **metadata**: Competition title, URL, categories, and evaluation metrics. - **schema**: CSV file definitions, column names, and inferred data types. - **sample_rows**: Tabular preview data (first 10 rows). - **notebooks[N]{fields}**: Top ranked notebooks containing community-vetted code. - **content**: Cleaned, high-signal Python/R code cells and documentation. ## Guidelines for LLM Analysis 1. **Ranked Importance**: Solutions are delivered in order of community upvotes and relevance. Prioritize patterns found in the top 3 notebooks. 2. **Feature Engineering**: Compare multiple solution schemas to identify consensus feature sets. 3. **Evaluation Alignment**: Use the provided metadata to ensure models align with the competition's specific scoring metrics. 4. **Data Types**: Refer to the 'schema' section before proposing data processing pipelines. --- © 2026 KaggleIngest | Production-Ready AI Context Infrastructure.