Publish local repo state
This commit is contained in:
75
ai_dev_plan.md
Normal file
75
ai_dev_plan.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# Personal AI Agent: "About Me" Profile Generator
|
||||
|
||||
**Project Goal**
|
||||
Build a showcase AI system that scans and summarizes your professional/personal work from self-hosted services (primarily Gitea for code/repos, plus Flatnotes/Trillium/HedgeDoc for notes/ideas/projects). The agent answers employer-style questions dynamically (e.g., "Summarize Giordano's coding projects and skills") with RAG-grounded responses, links, and image embeds where relevant.
|
||||
|
||||
Emphasize broad AI toolchain integration for skill development and portfolio impact: agentic workflows, RAG pipelines, orchestration, multi-LLM support. No frontend focus — terminal/API-triggered queries only.
|
||||
|
||||
**Key Features**
|
||||
- Periodic/full scanning of services to extract text, summaries, code snippets, links, images.
|
||||
- Populate & query a local vector DB (RAG) for semantic search.
|
||||
- Agent reasons, retrieves, generates responses with evidence (links/images).
|
||||
- Multi-LLM fallback (DeepSeek primary, Gemini/OpenCode trigger).
|
||||
- Scheduled/automated updates via pipelines.
|
||||
- Local/Docker deployment for privacy & control.
|
||||
|
||||
**Tools & Stack Overview**
|
||||
|
||||
| Category | Tool(s) | Purpose & Why Chosen | Integration Role |
|
||||
|-----------------------|----------------------------------|--------------------------------------------------------------------------------------|------------------|
|
||||
| Core Framework | LangChain / LangGraph | Build agent, tools, chains, RAG logic. Modular, industry-standard for LLM apps. | Heart of agent & retrieval |
|
||||
| Crawling/Extraction | Selenium / Playwright + Firecrawl (via LangChain loaders) | Handle auth/dynamic pages (Gitea login/nav), structured extraction (Markdown/JSON). | Scan web views & APIs |
|
||||
| Vector Database | Chroma | Local, lightweight RAG store. Easy Docker setup, native LangChain integration. | Store embeddings for fast semantic search |
|
||||
| LLM(s) | DeepSeek (via API) + Gemini / OpenCode | DeepSeek: cheap, strong reasoning (primary). Gemini/OpenCode: terminal trigger/fallback. | Reasoning & generation |
|
||||
| Data Pipeline / Scheduling | Apache Airflow (Docker) | Industry-best for ETL/ETL-like scans (DAGs). Local install via official Compose. | Schedule periodic scans/updates to Chroma |
|
||||
| Visual Prototyping | Flowise | No-code visual builder on LangChain. Quick agent/RAG prototyping & debugging. | Experiment with chains before code |
|
||||
| Script/Workflow Orchestration | Windmill | Turn Python/LangChain scripts into reusable, scheduled flows. Dev-first, high growth.| Reactive workflows (e.g., on-commit triggers) |
|
||||
| Event-Driven Automation | Activepieces | Connect services event-based (e.g., Gitea webhook → re-scan). AI-focused pieces. | Glue for reactive triggers |
|
||||
|
||||
**High-Level Architecture & Flow**
|
||||
|
||||
1. **Ingestion Pipeline (Airflow + Crawlers)**
|
||||
- Airflow DAG runs on schedule (daily/weekly) or manually.
|
||||
- Task 1: LangChain agent uses Selenium/Playwright tool to browse/authenticate to services (e.g., Gitea repos, Flatnotes/Trillium pages).
|
||||
- Task 2: Firecrawl loader extracts structured content (text, code blocks, links, image URLs).
|
||||
- Task 3: LangChain chunks, embeds (DeepSeek embeddings), upserts to Chroma vector DB.
|
||||
- Optional: Activepieces listens for events (e.g., Gitea push webhook) → triggers partial re-scan.
|
||||
|
||||
2. **Agent Runtime (LangChain/LangGraph + DeepSeek)**
|
||||
- Core agent (ReAct-style): Receives query (e.g., via terminal/OpenCode: "opencode query 'Giordano's top projects'").
|
||||
- Tools: Retrieve from Chroma (RAG), fetch specific pages/images if needed.
|
||||
- LLM: DeepSeek for cost-effective reasoning/summarization. Fallback to Gemini if complex.
|
||||
- Output: Natural response with summaries, links (e.g., Gitea repo URLs), embedded image previews (from scanned pages).
|
||||
|
||||
3. **Prototyping & Orchestration Layer**
|
||||
- Use Flowise to visually build/test agent chains/RAG flows before committing to code.
|
||||
- Windmill wraps scripts (e.g., scan script) as jobs/APIs.
|
||||
- Activepieces adds event-driven glue (e.g., new note in Trillium → notify/update DB).
|
||||
|
||||
**Deployment & Running Locally**
|
||||
- Everything in Docker Compose: Airflow (official image), Chroma, Python services (LangChain agent), optional Flowise/Windmill containers.
|
||||
- Secrets: Env vars for API keys (DeepSeek, service auth).
|
||||
- Trigger: Terminal via OpenCode/Gemini CLI → calls agent endpoint/script.
|
||||
- Scale: Start simple (manual scans), add Airflow scheduling later.
|
||||
|
||||
**Skill Showcase & Portfolio Value**
|
||||
- Demonstrates: Agentic AI, RAG pipelines, web crawling with auth, multi-tool orchestration, cost-optimized LLMs, local/self-hosted infra.
|
||||
- Broad coverage: LangChain ecosystem + industry ETL (Airflow) + modern AI workflow tools (Flowise/Windmill/Activepieces).
|
||||
- Low cost: DeepSeek keeps API bills minimal (often <$5/month even with frequent scans/queries).
|
||||
|
||||
**Next Steps (Implementation Phases)**
|
||||
1. Setup local Docker env + Chroma + DeepSeek API key.
|
||||
2. Build basic crawler tools (Selenium + Firecrawl) for Gitea/Flatnotes.
|
||||
3. Prototype agent in Flowise, then code in LangChain.
|
||||
4. Add Airflow DAG for scheduled ingestion.
|
||||
5. Integrate Windmill/Activepieces for extras.
|
||||
6. Test queries, refine summaries/links/images.
|
||||
|
||||
This setup positions you strongly for AI engineering roles while building real, integrated skills.
|
||||
|
||||
** Extra tools to add.
|
||||
- AutoMaker
|
||||
- AutoCoder - These assist in set and forget long review AI
|
||||
- OpenRouter - Single access point for any CLI with useage fee.
|
||||
- Aider - CLI code and file editing with OpenROuter for any model
|
||||
- Goose - integrates with system and MCP servers like ClawBot
|
||||
Reference in New Issue
Block a user