# Dataset Template: "Answer First" Schema Use this template for all new Substrate datasets. The key principle: **put the answer at the top**. --- ## Template ```markdown # [Dataset Title] --- ## 🎯 BEST ESTIMATE | Metric | Value | Confidence | Last Updated | |--------|-------|------------|--------------| | **[Primary Metric]** | **[VALUE]** | [X%] | [YYYY-MM-DD] | | **[Secondary Metric]** | **[VALUE]** | [X%] | [YYYY-MM-DD] | **One-liner:** [12 words max - the crisp answer someone can quote] **Caveat:** [Single most important limitation in one sentence] --- ## Quick Context [2-3 sentences maximum: - What this number represents - Why it matters - Major source of uncertainty] --- ## Methodology Summary **Approach:** [One sentence on how this estimate was derived] **Sources:** - [Primary authoritative source with link] - [Secondary source] - [Tertiary source] **Definition Used:** [How the metric was precisely defined] --- ## Detailed Findings [Main body of research - tables, regional breakdowns, sector analysis, etc.] --- ## Source Analysis ### Why These Sources? | Source | Strengths | Weaknesses | Weight Given | |--------|-----------|------------|--------------| | **[Source 1]** | [Strength] | [Weakness] | [High/Medium/Low] | ### Key Source Conflicts 1. [Where sources disagreed and how we resolved it] --- ## Research Metadata | Attribute | Value | |-----------|-------| | **Research Date** | [YYYY-MM-DD] | | **Researcher** | [Name/System] | | **Method** | [Brief description] | | **Confidence Level** | [X%] | | **Known Gaps** | [Brief list] | --- ## Alternative Estimates & Why We Differ [Recommended section - use when other estimates exist that might seem to contradict yours] | Estimate | Source | What It Actually Measures | Why It Differs | |----------|--------|--------------------------|----------------| | **[Alternative 1]** | [Source] | [What it measures] | [Why different from ours] | | **[Alternative 2]** | [Source] | [What it measures] | [Why different from ours] | | **[Our estimate]** | This research | [What we measure] | [Our approach] | ### Why Our Approach [2-4 bullet points explaining why you chose this measurement approach over alternatives: - What makes it more appropriate for the question being answered - Why it's more directly measurable or verifiable - What constraints or sanity checks it passes - Why apparent contradictions aren't actually contradictions] **Key insight:** [One sentence explaining that different estimates often measure different things, not that one is "wrong"] --- ## Changelog | Date | Change | Reason | |------|--------|--------| | [YYYY-MM-DD] | [What changed] | [Why it changed] | --- ## Full Data [CSV data, detailed tables, raw research output, links to gists, etc.] ``` --- ## Schema Requirements ### Mandatory Sections 1. **🎯 BEST ESTIMATE** - Must be the first content section after title 2. **One-liner** - 12 words max, quotable 3. **Caveat** - Single most important limitation 4. **Quick Context** - 2-3 sentences max 5. **Methodology Summary** - How was this derived 6. **Sources** - Where did data come from 7. **Alternative Estimates & Why We Differ** - Recommended when other estimates exist 8. **Changelog** - Track all revisions ### Mandatory Fields in BEST ESTIMATE Table - **Value** - The actual number/range - **Confidence** - Percentage (95%, 85%, 65%, etc.) - **Last Updated** - Date of most recent validation ### Confidence Level Guidelines | Level | Percentage | When to Use | |-------|------------|-------------| | Very High | 95%+ | Official government data, single authoritative source, widely agreed | | High | 85-94% | Multiple corroborating sources, minor definitional variation | | Medium | 65-84% | Extrapolated from good sources, definitional uncertainty | | Low | <65% | Limited data, significant methodological issues, contested | ### Changelog Requirements Every revision must include: - **Date** of change - **What** specifically changed - **Why** it changed (what new evidence/analysis prompted revision) --- ## Anti-Patterns to Avoid 1. **Burying the answer** - Never make someone scroll to find the number 2. **No confidence level** - Every estimate needs uncertainty bounds 3. **Stale dates** - Always show when last validated 4. **Methodology before answer** - People want the answer first, then methodology 5. **No changelog** - Revisions without history erodes trust 6. **Comparing incomparables** - Always note when similar-sounding metrics measure different things --- ## Examples ### Good: Knowledge Worker Compensation ```markdown ## 🎯 BEST ESTIMATE | Metric | Value | Confidence | Last Updated | |--------|-------|------------|--------------| | **Global Knowledge Worker Compensation** | **$35-50 trillion/year** | 65% | December 2025 | **One-liner:** Global knowledge workers earn $35-50T annually in wages and benefits. **Caveat:** "Knowledge worker" has no standard definition - range reflects definitional uncertainty. ``` ### Good: US GDP ```markdown ## 🎯 BEST ESTIMATE | Metric | Value | Confidence | Last Updated | |--------|-------|------------|--------------| | **US GDP (2024)** | **$29.17 trillion** | 99% | December 2025 | **One-liner:** US GDP is $29.17 trillion as of Q3 2024. **Caveat:** Final Q4 revision may adjust by ±0.5%. ``` --- ## Migration Guide When updating existing datasets to this schema: 1. Add `## 🎯 BEST ESTIMATE` section at the very top 2. Extract the key metric into the table format 3. Write a 12-word one-liner 4. Identify the single most important caveat 5. Add `## Changelog` if not present 6. Ensure confidence levels are explicit 7. Move detailed methodology AFTER the answer sections