---
name: codebase-archaeology
title: Codebase Archaeology
description: "Systematic method for understanding an unfamiliar codebase fast. Instead of reading code top-to-bottom, follow the data flow: entry points → routing → business logic → data layer → external integrations. Produces an architecture map, key abstractions list, and \"where to look\" guide in under 30 minutes. Works for any stack."
category: coding
tags:
  - eksploracja
  - onboarding
  - architektura
  - zrozumienie
  - dokumentacja
source: https://madejski.ai/promptoteka/codebase-archaeology
locale: en
license: MIT
---

# Codebase Archaeology

## The Problem

You've been dropped into an unfamiliar codebase. Reading every file is impossible. You need to understand the architecture, key patterns, and where to make changes — fast.

## The Method: Follow the Data

Don't read code top-to-bottom. Follow how data flows through the system:

### Layer 1: Entry Points (5 min)
Where does the outside world touch this system?

```bash
# Find the main entry points
- package.json scripts (start, dev, build)
- Dockerfile / docker-compose.yml
- Route definitions (app router, express routes, API handlers)
- Event listeners / queue consumers
- Cron jobs / scheduled tasks
```

**Ask:** What triggers this system to do something?

### Layer 2: Routing and Middleware (5 min)
How do requests get to business logic?

```
- Auth middleware (who can access what)
- Validation layer (what input is accepted)
- Route → Controller/Handler mapping
- Error handling boundaries
```

**Ask:** What happens between "request arrives" and "business logic runs"?

### Layer 3: Business Logic (10 min)
Where are the important decisions made?

```
- Service layer / use cases / domain logic
- Key domain models and their relationships
- State machines / workflow definitions
- Business rules and validations
```

**Ask:** What are the 3-5 most important concepts in this domain?

### Layer 4: Data Layer (5 min)
How is state persisted?

```
- Database schema / migrations
- ORM models / repositories
- Cache layers
- File storage
```

**Ask:** What's the source of truth? Where does important data live?

### Layer 5: External Integrations (5 min)
What other systems does this talk to?

```
- API clients / SDKs
- Message queues / event buses
- Third-party services (payment, email, auth)
- External database connections
```

**Ask:** What breaks if external system X goes down?

## Archaeology Commands

Run these to build a quick mental map:

```bash
# Project structure overview
find . -type f -name "*.ts" | head -50
ls -la src/

# Dependencies — what libraries matter
cat package.json | jq '.dependencies'

# Database schema — what data exists
find . -name "*.migration.*" -o -name "schema.*"

# Routes — what the API surface looks like
grep -r "router\.\|app\.\(get\|post\|put\|delete\)" --include="*.ts"

# Environment — what config this needs
cat .env.example

# Tests — what's actually tested (reveals what's important)
find . -name "*.test.*" -o -name "*.spec.*"

# Git log — what changed recently (reveals active areas)
git log --oneline -20
git log --oneline --since="2 weeks ago" -- src/
```

## Output: Architecture Map

After the archaeology, produce:

```markdown
## Architecture Map: {project name}

### Stack
- Runtime: {Node 22 / Python 3.12 / ...}
- Framework: {Next.js / Express / Django / ...}
- Database: {PostgreSQL / MongoDB / ...}
- Key libraries: {top 5 dependencies that shape the architecture}

### Entry Points
- {Web: Next.js app router at /app}
- {API: REST at /api, {N} endpoints}
- {Background: {queue/cron description}}

### Domain Model (top 5 concepts)
1. {User} — {what it represents, key relationships}
2. {Order} — ...

### Key Patterns
- {Repository pattern for data access}
- {Service layer for business logic}
- {Middleware chain for auth/validation}

### Where to Look
- To add a new API endpoint: {path}
- To change business logic for X: {path}
- To modify the database schema: {path}
- To add a new background job: {path}

### Gotchas
- {Things that aren't obvious from the code structure}
- {Implicit conventions the team follows}
- {Known technical debt or fragile areas}
```

## Anti-Patterns

- Reading every file (you'll drown before you understand)
- Starting with tests (tests tell you *what* but not *why* or *how it fits together*)
- Asking "what does this codebase do" without narrowing the question
- Ignoring git history (recent commits reveal the active areas)
- Skipping .env.example (config tells you what the system depends on)
