5.7 KiB
Personal AI Agent: "About Me" Profile Generator
Project Goal
Build a showcase AI system that scans and summarizes your professional/personal work from self-hosted services (primarily Gitea for code/repos, plus Flatnotes/Trillium/HedgeDoc for notes/ideas/projects). The agent answers employer-style questions dynamically (e.g., "Summarize Giordano's coding projects and skills") with RAG-grounded responses, links, and image embeds where relevant.
Emphasize broad AI toolchain integration for skill development and portfolio impact: agentic workflows, RAG pipelines, orchestration, multi-LLM support. No frontend focus — terminal/API-triggered queries only.
Key Features
- Periodic/full scanning of services to extract text, summaries, code snippets, links, images.
- Populate & query a local vector DB (RAG) for semantic search.
- Agent reasons, retrieves, generates responses with evidence (links/images).
- Multi-LLM fallback (DeepSeek primary, Gemini/OpenCode trigger).
- Scheduled/automated updates via pipelines.
- Local/Docker deployment for privacy & control.
Tools & Stack Overview
| Category | Tool(s) | Purpose & Why Chosen | Integration Role |
|---|---|---|---|
| Core Framework | LangChain / LangGraph | Build agent, tools, chains, RAG logic. Modular, industry-standard for LLM apps. | Heart of agent & retrieval |
| Crawling/Extraction | Selenium / Playwright + Firecrawl (via LangChain loaders) | Handle auth/dynamic pages (Gitea login/nav), structured extraction (Markdown/JSON). | Scan web views & APIs |
| Vector Database | Chroma | Local, lightweight RAG store. Easy Docker setup, native LangChain integration. | Store embeddings for fast semantic search |
| LLM(s) | DeepSeek (via API) + Gemini / OpenCode | DeepSeek: cheap, strong reasoning (primary). Gemini/OpenCode: terminal trigger/fallback. | Reasoning & generation |
| Data Pipeline / Scheduling | Apache Airflow (Docker) | Industry-best for ETL/ETL-like scans (DAGs). Local install via official Compose. | Schedule periodic scans/updates to Chroma |
| Visual Prototyping | Flowise | No-code visual builder on LangChain. Quick agent/RAG prototyping & debugging. | Experiment with chains before code |
| Script/Workflow Orchestration | Windmill | Turn Python/LangChain scripts into reusable, scheduled flows. Dev-first, high growth. | Reactive workflows (e.g., on-commit triggers) |
| Event-Driven Automation | Activepieces | Connect services event-based (e.g., Gitea webhook → re-scan). AI-focused pieces. | Glue for reactive triggers |
High-Level Architecture & Flow
-
Ingestion Pipeline (Airflow + Crawlers)
- Airflow DAG runs on schedule (daily/weekly) or manually.
- Task 1: LangChain agent uses Selenium/Playwright tool to browse/authenticate to services (e.g., Gitea repos, Flatnotes/Trillium pages).
- Task 2: Firecrawl loader extracts structured content (text, code blocks, links, image URLs).
- Task 3: LangChain chunks, embeds (DeepSeek embeddings), upserts to Chroma vector DB.
- Optional: Activepieces listens for events (e.g., Gitea push webhook) → triggers partial re-scan.
-
Agent Runtime (LangChain/LangGraph + DeepSeek)
- Core agent (ReAct-style): Receives query (e.g., via terminal/OpenCode: "opencode query 'Giordano's top projects'").
- Tools: Retrieve from Chroma (RAG), fetch specific pages/images if needed.
- LLM: DeepSeek for cost-effective reasoning/summarization. Fallback to Gemini if complex.
- Output: Natural response with summaries, links (e.g., Gitea repo URLs), embedded image previews (from scanned pages).
-
Prototyping & Orchestration Layer
- Use Flowise to visually build/test agent chains/RAG flows before committing to code.
- Windmill wraps scripts (e.g., scan script) as jobs/APIs.
- Activepieces adds event-driven glue (e.g., new note in Trillium → notify/update DB).
Deployment & Running Locally
- Everything in Docker Compose: Airflow (official image), Chroma, Python services (LangChain agent), optional Flowise/Windmill containers.
- Secrets: Env vars for API keys (DeepSeek, service auth).
- Trigger: Terminal via OpenCode/Gemini CLI → calls agent endpoint/script.
- Scale: Start simple (manual scans), add Airflow scheduling later.
Skill Showcase & Portfolio Value
- Demonstrates: Agentic AI, RAG pipelines, web crawling with auth, multi-tool orchestration, cost-optimized LLMs, local/self-hosted infra.
- Broad coverage: LangChain ecosystem + industry ETL (Airflow) + modern AI workflow tools (Flowise/Windmill/Activepieces).
- Low cost: DeepSeek keeps API bills minimal (often <$5/month even with frequent scans/queries).
Next Steps (Implementation Phases)
- Setup local Docker env + Chroma + DeepSeek API key.
- Build basic crawler tools (Selenium + Firecrawl) for Gitea/Flatnotes.
- Prototype agent in Flowise, then code in LangChain.
- Add Airflow DAG for scheduled ingestion.
- Integrate Windmill/Activepieces for extras.
- Test queries, refine summaries/links/images.
This setup positions you strongly for AI engineering roles while building real, integrated skills.
** Extra tools to add.
- AutoMaker
- AutoCoder - These assist in set and forget long review AI
- OpenRouter - Single access point for any CLI with useage fee.
- Aider - CLI code and file editing with OpenROuter for any model
- Goose - integrates with system and MCP servers like ClawBot