Experiments

This section documents experiments in progress or recently completed. Each entry describes what I was trying to do, what I set up, and what I observed.

Experiments may change direction, stall, or be abandoned. That is normal. The value is in the attempt and what it reveals, not in reaching a predefined outcome.

AI-guided runtime

Status: active

Context: Most AI-assisted development generates code that a human then integrates. I wanted to test a different model: an AI that operates inside a predefined runtime with strict boundaries, rather than producing code to be copy-pasted elsewhere.

What I tested: A constrained execution environment where AI agents can invoke pre-approved operations (read, transform, validate) but cannot modify the runtime itself. The human defines the perimeter. The AI works within it.

What worked: Constraining the action space made the AI’s behavior more predictable and easier to audit. Failures became visible instead of subtle.

What didn’t: Defining the right level of abstraction for the allowed operations took longer than expected. Too granular, and the AI could not compose useful actions. Too broad, and the constraints lost their purpose.

Composable module scaffolding

Status: paused

Context: I wanted to test whether a set of standardized modules (frontend, backend, infrastructure) could be scaffolded and then extended by AI-assisted workflows without losing structural coherence over time.

What I tested: A generator that produces a baseline project with clear module boundaries, naming conventions, and integration points. After scaffolding, I used AI tools to extend the modules — adding endpoints, views, and deployment configurations.

What worked: The initial scaffolding held up well. AI-assisted extensions stayed consistent when the naming conventions were explicit and enforced.

What didn’t: When I relaxed the conventions to allow more flexibility, the AI-generated code started drifting from the original structure within a few iterations. Conventions need to be constraints, not suggestions.

Human-in-the-loop agent for technical decisions

Status: active

Context: I wanted to explore agents that assist with technical decisions (architecture, dependency selection, deployment strategy) without making those decisions autonomously.

What I tested: An agent that receives a problem description, proposes two or three options with trade-offs, and waits for a human decision before proceeding. The agent does not act on its own.

What worked: The quality of the proposals improved significantly when the agent had access to project-specific context (existing dependencies, past decisions, constraints). Generic proposals were rarely useful.

What didn’t: The interaction model is slower than fully autonomous agents. This is acceptable for architectural decisions but impractical for high-frequency tasks. The right boundary between autonomy and consultation is still unclear.

Notification-driven micro-app

Status: completed

Context: A small experiment to test whether a useful personal tool could be designed, built, and deployed in a single working session using AI assistance.

What I tested: A minimal app with a daily notification system. The goal was not the app itself but the workflow: from idea to deployed, functional software in one evening.

What worked: The core loop (describe, generate, test, deploy) was smooth. AI handled boilerplate and configuration well. The constraint of a single session forced clear prioritization.

What didn’t: Edge cases around notification timing and device compatibility required manual debugging that the AI could not resolve from context alone. The last 10% of polish took disproportionate effort.