Universal Python Unit Test Generator AI Agent

August 2025

About

An AI-powered Python test generation agent that analyzes codebases, generates contextual unit tests, validates them, and auto-heals failures across multiple LLM backends.

Overview

Developed a Universal Python Unit Test Generator AI Agent that automatically analyzes Python codebases and generates comprehensive, context-aware unit tests. The system supports multiple LLM backends and adapts to different project types such as web applications, machine learning projects, data science workflows, scripts, and libraries.

The agent is designed to reduce manual testing effort, improve code reliability, and accelerate test creation for Python projects of any complexity.

Objectives

Automate unit test generation for Python codebases
Support multiple AI providers and local LLM execution
Generate tests based on project type and code structure
Validate and auto-heal generated tests
Improve developer productivity and test coverage

Tech Stack

Language: Python
Testing Frameworks: Pytest, Unittest
AI Backends: OpenAI GPT, Google Gemini, Anthropic Claude, Ollama
Code Analysis: AST-based parsing
Async Processing: Python async workflows
Tooling: CLI, environment variables, caching, logging

Key Features

Multi-LLM backend support with OpenAI, Gemini, Claude, and Ollama
AST-based code analysis for deep understanding of modules, functions, and classes
Automatic project type detection for web, ML, data science, scripts, and libraries
Context-aware unit test generation with positive cases, edge cases, and error handling
Framework-specific test patterns for Flask, Django, FastAPI, TensorFlow, Scikit-learn, and more
Syntax validation and import resolution before test creation
Auto-healing mechanism to fix common test generation issues
Asynchronous processing for large codebases
Intelligent caching to reduce repeated API calls
Exponential backoff and retry handling for LLM API failures
CLI support with interactive and direct execution modes

Architecture / Design

Designed the system as a modular AI agent pipeline:

Codebase Input → AST Analysis → Project Type Detection → Dependency Mapping → LLM Prompt Generation → Test Generation → Validation → Auto-Healing → Test Output

The architecture supports interchangeable LLM backends, allowing developers to choose cloud-based models or local Ollama models depending on privacy, cost, and performance requirements.

Implementation

Implemented AST-based parsing to extract functions, classes, imports, and module metadata
Built a strategy-based test generation flow based on detected project type
Integrated multiple LLM providers through a unified backend interface
Added validation layers for syntax checking, import resolution, and execution readiness
Implemented caching to reduce duplicate LLM requests by up to 60–80%
Added retry logic with configurable timeouts for reliable API usage
Built CLI options for model selection, repository path, framework choice, dry-run mode, and verbose logging

Outcomes

Reduced manual unit test writing effort by 80%+
Supported Python projects ranging from small scripts to enterprise-scale codebases
Generated tests for positive flows, edge cases, and exception scenarios
Improved developer productivity by automating repetitive testing workflows
Enabled privacy-friendly test generation using local Ollama models

Performance Highlights

Small projects: ~30 seconds test generation time
Medium projects: ~2 minutes average generation time
Large projects: ~8 minutes average generation time
API caching reduces repeated LLM calls by 60–80%
Supports concurrent processing for faster test generation

Impact

This project demonstrates how AI agents can assist developers in writing reliable and maintainable software. By combining code analysis, LLM reasoning, validation, and auto-healing, the system creates a practical developer productivity tool for modern Python teams.

The project showcases:

AI-assisted software engineering
Developer tooling
Test automation
Multi-LLM orchestration
Code intelligence and static analysis

Scalability & Reliability

Async processing for concurrent file handling
Configurable retry and timeout policies
Local LLM support for private codebases
Dry-run mode for safe validation
Exclusion patterns for large repositories
Logging and diagnostics for troubleshooting

Key Learnings

Building AI agents for software engineering workflows
Designing multi-provider LLM abstractions
Static analysis using Python AST
Test validation and auto-healing strategies
Developer experience design through CLI tooling

Role

AI Tooling / Python Developer

View Source Code