← Back to Prompts

PROMPT · coding

Codebase Archaeology

Systematic method for understanding an unfamiliar codebase fast. Instead of reading code top-to-bottom, follow the data flow: entry points → routing → business logic → data layer → external integrations. Produces an architecture map, key abstractions list, and "where to look" guide in under 30 minutes. Works for any stack.

eksploracjaonboardingarchitekturazrozumieniedokumentacja

Download prompt file

YAML frontmatter + markdown body, ready to paste

.md

Codebase Archaeology

The Problem

You've been dropped into an unfamiliar codebase. Reading every file is impossible. You need to understand the architecture, key patterns, and where to make changes — fast.

The Method: Follow the Data

Don't read code top-to-bottom. Follow how data flows through the system:

Layer 1: Entry Points (5 min)

Where does the outside world touch this system?

# Find the main entry points
- package.json scripts (start, dev, build)
- Dockerfile / docker-compose.yml
- Route definitions (app router, express routes, API handlers)
- Event listeners / queue consumers
- Cron jobs / scheduled tasks

Ask: What triggers this system to do something?

Layer 2: Routing and Middleware (5 min)

How do requests get to business logic?

- Auth middleware (who can access what)
- Validation layer (what input is accepted)
- Route → Controller/Handler mapping
- Error handling boundaries

Ask: What happens between "request arrives" and "business logic runs"?

Layer 3: Business Logic (10 min)

Where are the important decisions made?

- Service layer / use cases / domain logic
- Key domain models and their relationships
- State machines / workflow definitions
- Business rules and validations

Ask: What are the 3-5 most important concepts in this domain?

Layer 4: Data Layer (5 min)

How is state persisted?

- Database schema / migrations
- ORM models / repositories
- Cache layers
- File storage

Ask: What's the source of truth? Where does important data live?

Layer 5: External Integrations (5 min)

What other systems does this talk to?

- API clients / SDKs
- Message queues / event buses
- Third-party services (payment, email, auth)
- External database connections

Ask: What breaks if external system X goes down?

Archaeology Commands

Run these to build a quick mental map:

# Project structure overview
find . -type f -name "*.ts" | head -50
ls -la src/

# Dependencies — what libraries matter
cat package.json | jq '.dependencies'

# Database schema — what data exists
find . -name "*.migration.*" -o -name "schema.*"

# Routes — what the API surface looks like
grep -r "router\.\|app\.\(get\|post\|put\|delete\)" --include="*.ts"

# Environment — what config this needs
cat .env.example

# Tests — what's actually tested (reveals what's important)
find . -name "*.test.*" -o -name "*.spec.*"

# Git log — what changed recently (reveals active areas)
git log --oneline -20
git log --oneline --since="2 weeks ago" -- src/

Output: Architecture Map

After the archaeology, produce:

## Architecture Map: {project name}

### Stack
- Runtime: {Node 22 / Python 3.12 / ...}
- Framework: {Next.js / Express / Django / ...}
- Database: {PostgreSQL / MongoDB / ...}
- Key libraries: {top 5 dependencies that shape the architecture}

### Entry Points
- {Web: Next.js app router at /app}
- {API: REST at /api, {N} endpoints}
- {Background: {queue/cron description}}

### Domain Model (top 5 concepts)
1. {User} — {what it represents, key relationships}
2. {Order} — ...

### Key Patterns
- {Repository pattern for data access}
- {Service layer for business logic}
- {Middleware chain for auth/validation}

### Where to Look
- To add a new API endpoint: {path}
- To change business logic for X: {path}
- To modify the database schema: {path}
- To add a new background job: {path}

### Gotchas
- {Things that aren't obvious from the code structure}
- {Implicit conventions the team follows}
- {Known technical debt or fragile areas}

Anti-Patterns

  • Reading every file (you'll drown before you understand)
  • Starting with tests (tests tell you what but not why or how it fits together)
  • Asking "what does this codebase do" without narrowing the question
  • Ignoring git history (recent commits reveal the active areas)
  • Skipping .env.example (config tells you what the system depends on)