209 lines
5.6 KiB
Markdown
209 lines
5.6 KiB
Markdown
# Dataset Template: "Answer First" Schema
|
|
|
|
Use this template for all new Substrate datasets. The key principle: **put the answer at the top**.
|
|
|
|
---
|
|
|
|
## Template
|
|
|
|
```markdown
|
|
# [Dataset Title]
|
|
|
|
---
|
|
|
|
## 🎯 BEST ESTIMATE
|
|
|
|
| Metric | Value | Confidence | Last Updated |
|
|
|--------|-------|------------|--------------|
|
|
| **[Primary Metric]** | **[VALUE]** | [X%] | [YYYY-MM-DD] |
|
|
| **[Secondary Metric]** | **[VALUE]** | [X%] | [YYYY-MM-DD] |
|
|
|
|
**One-liner:** [12 words max - the crisp answer someone can quote]
|
|
|
|
**Caveat:** [Single most important limitation in one sentence]
|
|
|
|
---
|
|
|
|
## Quick Context
|
|
|
|
[2-3 sentences maximum:
|
|
- What this number represents
|
|
- Why it matters
|
|
- Major source of uncertainty]
|
|
|
|
---
|
|
|
|
## Methodology Summary
|
|
|
|
**Approach:** [One sentence on how this estimate was derived]
|
|
|
|
**Sources:**
|
|
- [Primary authoritative source with link]
|
|
- [Secondary source]
|
|
- [Tertiary source]
|
|
|
|
**Definition Used:** [How the metric was precisely defined]
|
|
|
|
---
|
|
|
|
## Detailed Findings
|
|
|
|
[Main body of research - tables, regional breakdowns, sector analysis, etc.]
|
|
|
|
---
|
|
|
|
## Source Analysis
|
|
|
|
### Why These Sources?
|
|
|
|
| Source | Strengths | Weaknesses | Weight Given |
|
|
|--------|-----------|------------|--------------|
|
|
| **[Source 1]** | [Strength] | [Weakness] | [High/Medium/Low] |
|
|
|
|
### Key Source Conflicts
|
|
|
|
1. [Where sources disagreed and how we resolved it]
|
|
|
|
---
|
|
|
|
## Research Metadata
|
|
|
|
| Attribute | Value |
|
|
|-----------|-------|
|
|
| **Research Date** | [YYYY-MM-DD] |
|
|
| **Researcher** | [Name/System] |
|
|
| **Method** | [Brief description] |
|
|
| **Confidence Level** | [X%] |
|
|
| **Known Gaps** | [Brief list] |
|
|
|
|
---
|
|
|
|
## Alternative Estimates & Why We Differ
|
|
|
|
[Recommended section - use when other estimates exist that might seem to contradict yours]
|
|
|
|
| Estimate | Source | What It Actually Measures | Why It Differs |
|
|
|----------|--------|--------------------------|----------------|
|
|
| **[Alternative 1]** | [Source] | [What it measures] | [Why different from ours] |
|
|
| **[Alternative 2]** | [Source] | [What it measures] | [Why different from ours] |
|
|
| **[Our estimate]** | This research | [What we measure] | [Our approach] |
|
|
|
|
### Why Our Approach
|
|
|
|
[2-4 bullet points explaining why you chose this measurement approach over alternatives:
|
|
- What makes it more appropriate for the question being answered
|
|
- Why it's more directly measurable or verifiable
|
|
- What constraints or sanity checks it passes
|
|
- Why apparent contradictions aren't actually contradictions]
|
|
|
|
**Key insight:** [One sentence explaining that different estimates often measure different things, not that one is "wrong"]
|
|
|
|
---
|
|
|
|
## Changelog
|
|
|
|
| Date | Change | Reason |
|
|
|------|--------|--------|
|
|
| [YYYY-MM-DD] | [What changed] | [Why it changed] |
|
|
|
|
---
|
|
|
|
## Full Data
|
|
|
|
[CSV data, detailed tables, raw research output, links to gists, etc.]
|
|
```
|
|
|
|
---
|
|
|
|
## Schema Requirements
|
|
|
|
### Mandatory Sections
|
|
|
|
1. **🎯 BEST ESTIMATE** - Must be the first content section after title
|
|
2. **One-liner** - 12 words max, quotable
|
|
3. **Caveat** - Single most important limitation
|
|
4. **Quick Context** - 2-3 sentences max
|
|
5. **Methodology Summary** - How was this derived
|
|
6. **Sources** - Where did data come from
|
|
7. **Alternative Estimates & Why We Differ** - Recommended when other estimates exist
|
|
8. **Changelog** - Track all revisions
|
|
|
|
### Mandatory Fields in BEST ESTIMATE Table
|
|
|
|
- **Value** - The actual number/range
|
|
- **Confidence** - Percentage (95%, 85%, 65%, etc.)
|
|
- **Last Updated** - Date of most recent validation
|
|
|
|
### Confidence Level Guidelines
|
|
|
|
| Level | Percentage | When to Use |
|
|
|-------|------------|-------------|
|
|
| Very High | 95%+ | Official government data, single authoritative source, widely agreed |
|
|
| High | 85-94% | Multiple corroborating sources, minor definitional variation |
|
|
| Medium | 65-84% | Extrapolated from good sources, definitional uncertainty |
|
|
| Low | <65% | Limited data, significant methodological issues, contested |
|
|
|
|
### Changelog Requirements
|
|
|
|
Every revision must include:
|
|
- **Date** of change
|
|
- **What** specifically changed
|
|
- **Why** it changed (what new evidence/analysis prompted revision)
|
|
|
|
---
|
|
|
|
## Anti-Patterns to Avoid
|
|
|
|
1. **Burying the answer** - Never make someone scroll to find the number
|
|
2. **No confidence level** - Every estimate needs uncertainty bounds
|
|
3. **Stale dates** - Always show when last validated
|
|
4. **Methodology before answer** - People want the answer first, then methodology
|
|
5. **No changelog** - Revisions without history erodes trust
|
|
6. **Comparing incomparables** - Always note when similar-sounding metrics measure different things
|
|
|
|
---
|
|
|
|
## Examples
|
|
|
|
### Good: Knowledge Worker Compensation
|
|
|
|
```markdown
|
|
## 🎯 BEST ESTIMATE
|
|
|
|
| Metric | Value | Confidence | Last Updated |
|
|
|--------|-------|------------|--------------|
|
|
| **Global Knowledge Worker Compensation** | **$35-50 trillion/year** | 65% | December 2025 |
|
|
|
|
**One-liner:** Global knowledge workers earn $35-50T annually in wages and benefits.
|
|
|
|
**Caveat:** "Knowledge worker" has no standard definition - range reflects definitional uncertainty.
|
|
```
|
|
|
|
### Good: US GDP
|
|
|
|
```markdown
|
|
## 🎯 BEST ESTIMATE
|
|
|
|
| Metric | Value | Confidence | Last Updated |
|
|
|--------|-------|------------|--------------|
|
|
| **US GDP (2024)** | **$29.17 trillion** | 99% | December 2025 |
|
|
|
|
**One-liner:** US GDP is $29.17 trillion as of Q3 2024.
|
|
|
|
**Caveat:** Final Q4 revision may adjust by ±0.5%.
|
|
```
|
|
|
|
---
|
|
|
|
## Migration Guide
|
|
|
|
When updating existing datasets to this schema:
|
|
|
|
1. Add `## 🎯 BEST ESTIMATE` section at the very top
|
|
2. Extract the key metric into the table format
|
|
3. Write a 12-word one-liner
|
|
4. Identify the single most important caveat
|
|
5. Add `## Changelog` if not present
|
|
6. Ensure confidence levels are explicit
|
|
7. Move detailed methodology AFTER the answer sections
|