The Knowledge PR: Structured Learning Extraction

Created: 2025-07-29 Context: How Knowledge PRs work - the key mechanism for extracting learning from code

What is a Knowledge PR?

A Knowledge PR is a pull request to the knowledge branch (usually main) that extracts and documents learnings from experiments or implementations. Unlike code PRs that integrate features, Knowledge PRs integrate understanding.

Anatomy of a Knowledge PR

Title Format

Extract learnings from [branch-type]/[branch-name]

Examples:

Extract learnings from experiment/redis-cache
Extract learnings from impl/search-v2
Extract learnings from hotfix/session-timeout

PR Description Template

# Knowledge PR: [Branch Name]

## Summary
Brief overview of what was attempted and why.

## Hypotheses Tested
- [ ] Hypothesis 1: [What you thought would happen]
  - Result: [What actually happened]
  - Evidence: [Specific metrics, errors, or observations]

## Key Discoveries

### Constraints Found
- [System/tool] has limit of [specific number] because [reason]
- Performance degrades when [condition] due to [cause]

### Patterns Identified  
- Pattern name: [Descriptive name]
  - Context: [When to use]
  - Solution: [What to do]
  - Trade-offs: [Costs and benefits]

### New Problems Uncovered
- [Problem]: [Description] #problem:[tag]
- [Problem]: [Description] #problem:[tag]

## Outcome

### For Experiments
**Result**: Success/Partial Success/Failed/Abandoned
**Learning Value**: High/Medium/Low
**Follow-up**: [Next experiments suggested]

### For Implementations
**Decision**: Proceeding to release / Needs rework / Abandoning
**Release Branch**: impl/feature → release/v2.1 (if applicable)
**Major Changes from Plan**: [What changed during implementation]

## Knowledge Artifacts Created/Updated

### New Files
- `knowledge/patterns/[pattern-name].md` - [Description]
- `knowledge/constraints/[constraint-name].md` - [Description]

### Updated Files  
- `knowledge/learnings/[topic].md` - Added [what]
- `experiments.yaml` - Updated with outcome

## Reusable Insights

1. **For Future Developers**: [Key takeaway]
2. **For Architecture Decisions**: [Key insight]
3. **For Similar Problems**: [Pattern or anti-pattern]

Types of Knowledge PRs

1. Experiment Knowledge PR

Focus on hypothesis validation:

# Knowledge PR: experiment/websocket-scaling

## Summary
Tested WebSocket server scaling to understand connection limits.

## Hypotheses Tested
- ✅ Can handle 50k connections per server
  - Result: Failed at exactly 50k
  - Evidence: OOM error, each connection uses ~100KB

- ❌ Linear performance scaling
  - Result: Performance drops at 30k connections
  - Evidence: CPU spike, GC pressure

## Key Discoveries

### Hard Limits
- Max connections: 50,000 (memory bound)
- Optimal range: 20,000-30,000
- Formula: `max_conn = available_memory / 100KB`

### New Pattern
**Pattern**: Connection Pooling with Backpressure
- Return 503 when connections > 45k
- Prevents cascade failures
- Allows graceful degradation

2. Implementation Knowledge PR

Focus on integration challenges and real-world complexities:

# Knowledge PR: impl/payment-system-v2

## Summary
Implemented new payment system integrating Stripe, PayPal, and crypto.

## Initial Assumptions vs Reality

### Assumed: Single webhook endpoint would work
**Reality**: Each provider has different retry/timeout behavior
**Solution**: Provider-specific endpoints with shared processing

### Assumed: Transactions are atomic
**Reality**: Network failures create partial states
**Solution**: Saga pattern with compensation

## Patterns Developed

### Payment Provider Abstraction

interface PaymentProvider { charge(): Promise refund(): Promise
webhook(): WebhookHandler }

Works for all providers with provider-specific adapters.

## Integration Challenges

1. **Webhook deduplication**: Providers send duplicates
   - Solution: 24-hour cache of webhook IDs
   
2. **Currency precision**: Mixing cents/dollars/satoshis
   - Solution: Everything in smallest unit internally

3. **Testing payments**: Can't test production flows
   - Solution: Record/replay test framework

3. Hotfix Knowledge PR

Focus on root cause and prevention:

# Knowledge PR: hotfix/memory-leak

## Summary
Emergency fix for production memory leak causing hourly crashes.

## Root Cause Analysis

### Symptom
Node process memory growing 100MB/hour

### Investigation Path
1. Heap dumps showed EventEmitter listeners
2. Traced to WebSocket connection handler
3. Found missing `removeListener` on disconnect

### Root Cause
```javascript
// Before (leaking)
ws.on('message', this.handleMessage.bind(this))

// After (fixed)
const handler = this.handleMessage.bind(this)
ws.on('message', handler)
ws.on('close', () => ws.off('message', handler))

Prevention Pattern

Pattern: Always Pair Event Listeners

class SafeEventHandler {
  listen(emitter, event, handler) {
    emitter.on(event, handler)
    this.cleanup.push(() => emitter.off(event, handler))
  }
  
  destroy() {
    this.cleanup.forEach(fn => fn())
  }
}

New Monitoring

Alert when memory growth > 50MB/hour
Weekly heap diff analysis
Event listener count metrics


### 4. Failed Implementation Knowledge PR

Failures often produce the most valuable knowledge:

```markdown
# Knowledge PR: impl/microservices-migration (ABANDONED)

## Summary  
Attempted to split monolith into 5 microservices. Abandoned after 3 months.

## Why It Failed

### Technical Challenges
1. **Data consistency**: Distributed transactions too complex
2. **Latency**: 5x increase in response time
3. **Debugging**: Lost ability to trace requests

### Organizational Challenges  
1. **Team size**: Need 2-3 devs per service minimum
2. **Deployment complexity**: From 1 to 15 deploy steps
3. **Monitoring cost**: 10x increase

## Valuable Learnings

### When Microservices Make Sense
- [ ] Team > 50 developers
- [ ] Clear domain boundaries
- [ ] Different scaling needs per service
- [ ] Independent deployment crucial
- [ ] Budget for infrastructure team

We had none of these.

### Alternative Approach
**Modular Monolith Pattern**
- Separate modules in same codebase
- Clear interfaces between modules
- Can split later if needed
- 90% of benefits, 10% of complexity

## Cost of Attempt
- 3 developer-months
- $15k in infrastructure tests
- Valuable experience
- Clear architectural direction

Knowledge PR Review Process

Review Checklist

Accuracy

Claims backed by evidence
Numbers are specific and measured
Assumptions clearly marked

Completeness

All hypotheses addressed
Failures explained
New questions documented

Clarity

Would a new developer understand?
Technical terms explained
Examples provided

Value

Prevents future mistakes
Enables better decisions
Patterns are reusable

Who Reviews?

Depending on team size:

Small Team: Anyone who wasn’t on the experiment Medium Team: Designated knowledge reviewers Large Team: Domain experts for technical accuracy

Review Comments Focus

Good review comments:

“This limit - is it per process or per machine?”
“Could you add the error message you saw?”
“This pattern looks similar to what we did in Project X”
“What made you choose 5000 for batch size?”

Not helpful:

“Nice work!” (alone)
“LGTM” (without reading)
Style nitpicks on markdown

Continuous Knowledge Integration

Don’t Wait Until The End

For long-running implementations, create periodic Knowledge PRs:

impl/big-feature
├── Week 1: Knowledge PR - Initial discoveries
├── Week 3: Knowledge PR - Architecture decisions  
├── Week 5: Knowledge PR - Integration challenges
└── Week 7: Final Knowledge PR - Complete learnings

The Daily Practice

During standup:

“I discovered X yesterday” → Note for Knowledge PR
“I’m stuck on Y” → Document in problems/
“Z worked perfectly” → Capture pattern

The Weekly Rhythm

Every Friday:

Review the week’s experiments
Create Knowledge PRs for completed work
Review pending Knowledge PRs
Update patterns and constraints

Measuring Knowledge PR Quality

Metrics That Matter

Specificity Score

Vague: “Caching helped performance”
Specific: “Redis caching reduced p95 latency from 800ms to 200ms”

Reusability Score

Low: “We fixed the bug”
High: “Pattern: Always pair event listeners to prevent memory leaks”

Evidence Score

Low: “It seemed faster”
High: “Load test showed 3x throughput increase (logs in PR #123)“

Knowledge Velocity

Track over time:

Time from experiment → Knowledge PR
Knowledge PRs per sprint
Patterns discovered per month
Problems prevented by past knowledge

Common Pitfalls

1. Waiting Too Long

Problem: Knowledge gets stale, details forgotten Solution: Create PR within 48 hours of completion

2. Too Implementation-Specific

Problem: “We used React hooks with Redux” Solution: “Pattern: Separate state management from UI components”

3. Missing the “Why”

Problem: “We set timeout to 30 seconds” Solution: ”30s timeout because 95% of successful requests complete in <10s”

4. No Follow-Up Actions

Problem: Discoveries don’t lead to improvements Solution: Every problem discovered should create a new hypothesis

Templates and Tools

Quick Command-Line Template

# Create Knowledge PR template
cat > .knowledge/pr-template.md << 'EOF'
# Knowledge PR: $(git branch --show-current)

## Summary
[What we tried and why]

## Hypotheses Tested
- [ ] [Hypothesis]: [Result]

## Key Discoveries
- [Discovery]: [Evidence]

## Outcome
Result: [Success/Failed/Partial]
Next: [What to try next]

## Files Changed
- [ ] Updated experiments.yaml
- [ ] Added patterns/[name].md
- [ ] Added constraints/[name].md
EOF

GitHub PR Template

In .github/pull_request_template/knowledge_pr.md:

<!-- 
This is a Knowledge PR template.
Delete sections that don't apply.
Be specific with numbers and evidence.
-->

## Knowledge PR Checklist
- [ ] Hypotheses have results
- [ ] Discoveries have evidence  
- [ ] Patterns are abstracted
- [ ] Next steps identified
- [ ] experiments.yaml updated

[Rest of template...]

The Cultural Shift

Knowledge PRs represent a fundamental shift:

From: “Ship features” To: “Ship features AND understanding”

From: “Failed experiment = waste” To: “Failed experiment = valuable data”

From: “Knowledge in heads” To: “Knowledge in repository”

Make Knowledge PRs a celebrated part of your process. They’re not extra work - they’re the work that makes all future work easier.