feat: Standardize all datasets to "Answer First" schema

Added SUMMARY.md executive summaries to all 7 datasets with: - 🎯 BEST ESTIMATE section at top - 12-word one-liners for quick reference - Confidence levels and caveats - Extensive authoritative linking - Alternative Estimates sections where applicable - Changelogs for revision tracking Updated Data/README.md with: - Quick reference table of all datasets - Full schema documentation - Confidence level guidelines - Anti-patterns to avoid Datasets standardized: - Knowledge-Worker-Global-Salaries (gold standard) - US-GDP - US-Inflation - US-Presidential-Approval - Bay-Area-COVID-Wastewater - US-Common-Metrics - Pulitzer-Prize-Winners 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 14:40:25 -08:00
parent 8ce692936a
commit 9a181ae43b
7 changed files with 1328 additions and 140 deletions
--- a/Data/Bay-Area-COVID-Wastewater/SUMMARY.md
+++ b/Data/Bay-Area-COVID-Wastewater/SUMMARY.md
@@ -0,0 +1,198 @@
+# Bay Area COVID-19 Wastewater Surveillance: Executive Summary
+
+---
+
+## 🎯 BEST ESTIMATE
+
+| Metric | Value | Confidence | Last Updated |
+|--------|-------|------------|--------------|
+| **California Wastewater Level** | **5.60 log10 copies/mL** | 95% | August 2025 |
+| **Status** | **HIGH activity** | 95% | August 2025 |
+| **Dataset Coverage** | **161 weeks** (July 2022-present) | 99% | October 2025 |
+
+**One-liner:** California COVID wastewater is HIGH (5.6 log10); leads clinical data by 4-7 days.
+
+**Caveat:** Statewide data serves as Bay Area proxy; log scale means each unit = 10x viral load change.
+
+---
+
+## The Big Picture
+
+Wastewater surveillance is the gold standard for population-level disease monitoring. Unlike clinical testing, it captures **all COVID infections**—symptomatic, asymptomatic, and unreported—providing an unbiased view of community transmission.
+
+The [California Department of Public Health (CDPH)](https://data.chhs.ca.gov/dataset/covid-19-wastewater-surveillance) monitors viral levels at 12+ wastewater treatment plants across California, including major Bay Area facilities. This data serves as a **leading indicator**, typically showing trends 4-7 days before clinical test results.
+
+---
+
+## Why This Number Matters
+
+Wastewater data is valuable because it:
+
+- **Leads clinical data**: Shows trends 4-7 days before case reports
+- **Captures all infections**: Not biased by testing availability or behavior
+- **Enables early warning**: Identifies surges before hospitals see them
+- **Supports policy decisions**: Used by California health officials for resource allocation
+- **Tracks variants**: Can detect emerging variants before clinical sequencing
+
+---
+
+## Current Status
+
+### August 2025 Snapshot
+| Metric | Value | Interpretation |
+|--------|-------|---------------|
+| **Current Level** | 5.60 log10 copies/mL | HIGH |
+| **Trend** | Elevated, increasing | Rising from spring lows |
+| **Historical Peak** | 18.97 log10 (July 2022) | Omicron wave |
+| **Recent Low** | 1.60 log10 (March 2025) | Spring baseline |
+
+### Activity Levels Reference
+| Level | log10 Range | Interpretation |
+|-------|-------------|---------------|
+| **LOW** | <2.0 | Minimal community transmission |
+| **MEDIUM** | 2.0-4.0 | Moderate transmission |
+| **HIGH** | 4.0-6.0 | Elevated transmission |
+| **VERY HIGH** | >6.0 | Surge conditions |
+
+---
+
+## How to Interpret the Data
+
+### Log Scale Explained
+Values are log10 transformed:
+- **Each unit increase = 10x more virus**
+- 5.0 → 6.0 means 10x increase
+- 5.0 → 7.0 means 100x increase
+
+### What to Watch
+1. **Direction matters more than absolute value** - Is it rising or falling?
+2. **Rate of change** - Fast rises signal emerging surges
+3. **Seasonal context** - Winter typically higher than summer
+4. **Regional variation** - Bay Area may differ from statewide
+
+---
+
+## Geographic Coverage
+
+### Bay Area Treatment Plants Monitored
+| County | Major Facilities |
+|--------|-----------------|
+| San Francisco | SF Public Utilities |
+| Alameda | [EBMUD](https://www.ebmud.com/) |
+| Santa Clara | San Jose-Santa Clara RWF |
+| Contra Costa | Central Contra Costa Sanitary |
+| Marin | 6 sites including Central Marin |
+| San Mateo | Silicon Valley Clean Water |
+
+The statewide California data serves as a robust proxy for Bay Area trends since it includes all major Bay Area treatment facilities.
+
+---
+
+## Data Sources
+
+| Source | What It Provides | Link |
+|--------|-----------------|------|
+| [CDPH](https://data.chhs.ca.gov/dataset/covid-19-wastewater-surveillance) | California statewide wastewater | [Direct CSV](https://data.chhs.ca.gov/dataset/1184f641-313f-47ee-b126-9e8c42699be5/resource/726752d3-afe6-4733-99bd-ffb9f400348c/download/wastewater.csv) |
+| [CDC NWSS](https://www.cdc.gov/nwss/) | National wastewater surveillance | [NWSS Dashboard](https://www.cdc.gov/nwss/covid-19/) |
+| [WastewaterSCAN](https://www.wastewaterscan.org/) | Academic research data | [Data Portal](https://data.wastewaterscan.org/) |
+
+### Why CDPH?
+- **Official government source** used by state decision-makers
+- **Consistent methodology** since July 2022
+- **Weekly updates** every Friday
+- **Direct CSV download** with no authentication required
+- **Validated methodology**: qPCR/ddPCR with flow adjustment and PMMoV normalization
+
+---
+
+## Methodology
+
+### Measurement
+- **Method**: qPCR and ddPCR detection of SARS-CoV-2 RNA
+- **Normalization**: Flow-adjusted and PMMoV-normalized
+- **Units**: log10(gene copies per milliliter)
+- **Frequency**: Weekly composite samples
+
+### Why Leading Indicator?
+- Infected individuals shed virus in feces 2-7 days before symptoms
+- Wastewater captures shedding regardless of testing behavior
+- Aggregates entire sewershed population (millions of people)
+
+---
+
+## Confidence Assessment
+
+| Component | Confidence | Explanation |
+|-----------|------------|-------------|
+| **Current Level** | 95% | Official government data, validated methodology |
+| **Historical Data** | 99% | Complete 161-week dataset |
+| **Trend Direction** | 90% | Subject to weekly variation |
+
+Wastewater surveillance is among the most reliable pandemic indicators because it:
+- Uses scientific lab methodology (qPCR/ddPCR)
+- Samples entire populations (no selection bias)
+- Operates independently of testing behavior
+- Has been validated against clinical data
+
+---
+
+## Known Limitations
+
+1. **Statewide proxy**: California data used as Bay Area proxy (not county-specific)
+2. **Log scale**: Can obscure magnitude of changes for non-technical users
+3. **No variant detail**: Current data shows total virus, not strain breakdown
+4. **Weekly frequency**: Daily fluctuations not captured
+5. **Treatment plant variation**: Some facilities report more reliably than others
+
+---
+
+## Use Cases
+
+This dataset supports:
+- **Personal health decisions**: Should I mask at gatherings?
+- **Policy analysis**: Evidence for health interventions
+- **Academic research**: Population-level epidemiology
+- **Trend forecasting**: What's coming in 1-2 weeks?
+- **Historical analysis**: Pandemic timeline documentation
+
+---
+
+## Supporting Documentation
+
+| Document | Description |
+|----------|-------------|
+| [README.md](./README.md) | Full dataset documentation |
+| [COVID-Wastewater-California-Statewide-2022-2025.csv](./COVID-Wastewater-California-Statewide-2022-2025.csv) | Main dataset (161 weeks) |
+| [COVID-Wastewater-SF-Bay-Area-2023-2025.md](./COVID-Wastewater-SF-Bay-Area-2023-2025.md) | Detailed methodology |
+| [UPDATES.md](./UPDATES.md) | Data refresh changelog |
+
+---
+
+## Research Metadata
+
+| Attribute | Value |
+|-----------|-------|
+| **Dataset Coverage** | July 2022 - Present |
+| **Total Observations** | 161 weeks (100% complete) |
+| **Update Frequency** | Weekly (Fridays) |
+| **Geographic Scope** | California (includes Bay Area) |
+| **Confidence Level** | 95% (government surveillance data) |
+
+---
+
+## Changelog
+
+| Date | Change | Reason |
+|------|--------|--------|
+| **December 2025** | Added SUMMARY.md with executive overview | Standardizing Substrate datasets to "Answer First" schema |
+| **October 2025** | Updated through August 2025 | Regular data refresh |
+| **2024** | Initial dataset creation | COVID wastewater tracking system |
+
+---
+
+## External Resources
+
+- [CDPH COVID Dashboard](https://covid19.ca.gov/data-and-tools/) - Official California data
+- [CDC NWSS](https://www.cdc.gov/nwss/covid-19/) - National wastewater surveillance
+- [WastewaterSCAN](https://www.wastewaterscan.org/) - Stanford/Emory research program
+- [EBMUD Wastewater Monitoring](https://www.ebmud.com/) - East Bay utility data
--- a/Data/Pulitzer-Prize-Winners/SUMMARY.md
+++ b/Data/Pulitzer-Prize-Winners/SUMMARY.md
@@ -0,0 +1,197 @@
+# Pulitzer Prize Winners (Arts & Letters): Executive Summary
+
+---
+
+## 🎯 WHAT THIS IS
+
+| Attribute | Value |
+|-----------|-------|
+| **Dataset Type** | Historical Reference Catalog |
+| **Coverage** | 249 winners across Arts & Letters (1918-2024) |
+| **Categories** | Poetry (105), Drama (109), General/Special (35) |
+| **Last Updated** | October 2025 |
+
+**One-liner:** Complete Arts & Letters Pulitzer database: 249 winners across Poetry, Drama, and Special awards.
+
+**Caveat:** Arts & Letters only—Journalism, Fiction, History, Biography, and Music categories not included.
+
+---
+
+## The Big Picture
+
+The [Pulitzer Prizes](https://www.pulitzer.org/) are the most prestigious awards in American journalism and the arts, established in 1917. This dataset focuses on the **Arts & Letters categories**—Poetry, Drama, and General/Special Awards—providing 107 years of literary achievement data.
+
+This is **reference data**, not an estimate. Each entry represents a verified Pulitzer Prize winner, cross-referenced against the [official Pulitzer Prize archive](https://www.pulitzer.org/prize-winners-by-category).
+
+---
+
+## Why This Dataset Matters
+
+The Pulitzer Prizes define American literary excellence:
+
+- **Poetry**: The most prestigious poetry award in the United States
+- **Drama**: Shapes what gets produced on Broadway and beyond
+- **Cultural canon**: Winners become required reading in schools and universities
+- **Historical record**: Documents 107 years of American literary achievement
+- **Research foundation**: Essential for literary criticism, cultural studies, and trend analysis
+
+---
+
+## Dataset Contents
+
+### Category Breakdown
+| Category | Winners | Coverage |
+|----------|---------|----------|
+| [Poetry](https://www.pulitzer.org/prize-winners-by-category/218) | 105 | 1918-2024 |
+| [Drama](https://www.pulitzer.org/prize-winners-by-category/219) | 109 | 1918-2024 |
+| [General/Special Awards](https://www.pulitzer.org/special-awards) | 35 | Various |
+| **Total** | **249** | 107 years |
+
+### Sample Winners
+| Year | Category | Winner | Work |
+|------|----------|--------|------|
+| 2024 | Poetry | [Paisley Rekdal](https://www.pulitzer.org/winners/paisley-rekdal) | *West: A Translation* |
+| 2024 | Drama | [Paula Vogel](https://www.pulitzer.org/winners/paula-vogel) | *Mother Play* |
+| 2023 | Poetry | [Carl Phillips](https://www.pulitzer.org/winners/carl-phillips) | *Then the War* |
+| 2023 | Drama | [Sanaz Toossi](https://www.pulitzer.org/winners/sanaz-toossi) | *English* |
+
+---
+
+## What's Included vs. Not Included
+
+### Included (Arts & Letters)
+- **Poetry** - Annual award since 1918 (105 winners)
+- **Drama** - Annual award since 1918 (109 winners)
+- **General/Special Awards** - Lifetime achievement, special citations (35 winners)
+
+### Not Included (By Design)
+| Category | Reason |
+|----------|--------|
+| Journalism (14 categories) | Different focus; available via [Pulitzer.org](https://www.pulitzer.org/prize-winners-categories) |
+| Fiction | Lower Wikidata coverage; expansion opportunity |
+| History | Lower Wikidata coverage; expansion opportunity |
+| Biography | Lower Wikidata coverage; expansion opportunity |
+| Music | Lower Wikidata coverage; expansion opportunity |
+
+**Rationale**: This dataset prioritizes **complete, verified data** over breadth. Poetry and Drama have 95%+ coverage in Wikidata; other categories have significant gaps.
+
+---
+
+## Data Sources
+
+| Source | What It Provides | Link |
+|--------|-----------------|------|
+| [Wikidata](https://www.wikidata.org/) | Structured data via SPARQL | [Query Service](https://query.wikidata.org/) |
+| [Pulitzer.org](https://www.pulitzer.org/) | Official archive (verification) | [Prize Winners](https://www.pulitzer.org/prize-winners-categories) |
+
+### Why Wikidata?
+- **Community-validated**: Multiple editors verify each entry
+- **Linked data**: Connected to primary sources
+- **Machine-readable**: Direct SPARQL query access
+- **Open license**: CC0 public domain
+- **Cross-referenced**: Validated against Pulitzer.org official records
+
+---
+
+## Confidence Assessment
+
+| Component | Confidence | Explanation |
+|-----------|------------|-------------|
+| **Poetry Winners** | 99% | 95%+ coverage, cross-validated |
+| **Drama Winners** | 99% | 95%+ coverage, cross-validated |
+| **General/Special** | 95% | Complete for documented awards |
+| **Work Titles** | 90% | Some entries lack titles in source data |
+
+This is reference data, not estimates. Winners are verified facts from official records.
+
+---
+
+## Known Limitations
+
+1. **Arts & Letters only**: Journalism categories not included (by design)
+2. **Work titles**: Not all entries include work titles
+3. **Co-winners**: Some years have multiple recipients
+4. **No-award years**: Some years have gaps (no winner selected)
+5. **Finalists**: Only winners included (finalists available from 1980+)
+
+---
+
+## Use Cases
+
+This dataset supports:
+- **Literary research**: Author achievement tracking
+- **Educational reference**: Quick winner lookup
+- **Trend analysis**: 107 years of literary prize patterns
+- **Curriculum design**: Identifying canonical works
+- **Cultural studies**: American literary canon formation
+- **Fact-checking**: Verify literary achievement claims
+
+---
+
+## Supporting Documentation
+
+| Document | Description |
+|----------|-------------|
+| [README.md](./README.md) | Full dataset documentation |
+| [Pulitzer-Prize-Winners-Arts-Letters-1918-2024.csv](./Pulitzer-Prize-Winners-Arts-Letters-1918-2024.csv) | Combined dataset (249 winners) |
+| [category-poetry.csv](./category-poetry.csv) | Poetry winners (105) |
+| [category-drama.csv](./category-drama.csv) | Drama winners (109) |
+| [category-general.csv](./category-general.csv) | Special awards (35) |
+
+---
+
+## SPARQL Query for Updates
+
+```sparql
+SELECT ?winner ?winnerLabel ?awardDate ?category ?categoryLabel ?work ?workLabel
+WHERE {
+  ?winner p:P166 ?awardStatement .
+  ?awardStatement ps:P166 ?category .
+  ?category (wdt:P279|wdt:P31)* wd:Q46525 .
+  OPTIONAL { ?awardStatement pq:P585 ?awardDate . }
+  OPTIONAL { ?awardStatement pq:P1686 ?work . }
+  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
+}
+ORDER BY DESC(?awardDate)
+```
+
+Run at: [query.wikidata.org](https://query.wikidata.org/)
+
+---
+
+## Research Metadata
+
+| Attribute | Value |
+|-----------|-------|
+| **Dataset Coverage** | 1918-2024 (107 years) |
+| **Total Records** | 249 unique winners |
+| **Categories** | Poetry, Drama, General/Special |
+| **Data Source** | Wikidata (CC0 public domain) |
+| **Confidence Level** | 99% (verified reference data) |
+
+---
+
+## Changelog
+
+| Date | Change | Reason |
+|------|--------|--------|
+| **December 2025** | Added SUMMARY.md with executive overview | Standardizing Substrate datasets to "Answer First" schema |
+| **October 2025** | Initial dataset creation | Arts & Letters Pulitzer data collection |
+
+---
+
+## Future Expansion Opportunities
+
+1. **Add Fiction/History/Biography/Music** - Complete Arts & Letters coverage
+2. **Add Journalism categories** - Scrape Pulitzer.org directly (~1,400+ winners)
+3. **Add finalists** - Available 1980-present (3 per category)
+4. **Annual updates** - Refresh each April/May after announcements
+
+---
+
+## External Resources
+
+- [Pulitzer.org Prize Winners](https://www.pulitzer.org/prize-winners-categories) - Official archive
+- [Pulitzer Prize History](https://www.pulitzer.org/page/history-pulitzer-prizes) - Background and context
+- [Wikidata Pulitzer Query](https://query.wikidata.org/) - Run your own queries
+- [Columbia Journalism Review Pulitzer Data](https://www.cjr.org/) - Journalism-focused analysis
--- a/Data/README.md
+++ b/Data/README.md
@@ -4,11 +4,102 @@

 The Data directory contains curated, ground-truth datasets about important aspects of human life, society, and progress, along with documentation for external data sources. This is a collection of reliable, parseable data that can be used for analysis, research, and informed decision-making.

+---
+
+## 🎯 "Answer First" Schema
+
+**All Substrate datasets follow the "Answer First" schema.** Every dataset has a `SUMMARY.md` file that puts the best estimate at the top.
+
+### Quick Reference
+
+| Dataset | Best Estimate | One-liner |
+|---------|--------------|-----------|
+| [Knowledge Worker Compensation](./Knowledge-Worker-Global-Salaries/SUMMARY.md) | $35-50T global, $6-12T US | Global knowledge workers earn $35-50T annually |
+| [US GDP](./US-GDP/SUMMARY.md) | $23.77T (Q2 2025) | U.S. real GDP is $23.77T, growing 3.8% quarterly |
+| [US Inflation](./US-Inflation/SUMMARY.md) | 2.5% YoY | U.S. inflation is ~2.5% with CPI at 323.4 |
+| [Presidential Approval](./US-Presidential-Approval/SUMMARY.md) | ~41% (Trump Nov 2025) | Trump approval averages ~41% (net -13) |
+| [COVID Wastewater](./Bay-Area-COVID-Wastewater/SUMMARY.md) | HIGH (5.6 log10) | California COVID wastewater is HIGH |
+| [US Common Metrics](./US-Common-Metrics/SUMMARY.md) | 60+ indicators | Real-time dashboard of U.S. economic indicators |
+| [Pulitzer Winners](./Pulitzer-Prize-Winners/SUMMARY.md) | 249 winners | Complete Arts & Letters database (1918-2024) |
+
+### Schema Structure
+
+Every `SUMMARY.md` follows this structure:
+
+```markdown
+# [Dataset Title]: Executive Summary
+
+## 🎯 BEST ESTIMATE
+
+| Metric | Value | Confidence | Last Updated |
+|--------|-------|------------|--------------|
+| **[Primary Metric]** | **[VALUE]** | [X%] | [DATE] |
+
+**One-liner:** [12 words max - the quotable answer]
+
+**Caveat:** [Single most important limitation]
+
+---
+
+## The Big Picture
+[2-3 sentences: What this is, why it matters, major uncertainty]
+
+## Why This Number Matters
+[Context for why this metric is important]
+
+## How the Number Is Calculated
+[Methodology summary]
+
+## Confidence Assessment
+[What we know well vs. what's uncertain]
+
+## Alternative Estimates & Why We Differ
+[When applicable: other approaches and why we chose ours]
+
+## Data Sources
+[Links to authoritative sources]
+
+## Supporting Documentation
+[Links to detailed data files]
+
+## Changelog
+[When estimates changed and why]
+```
+
+### Confidence Level Guidelines
+
+| Level | Percentage | When to Use |
+|-------|------------|-------------|
+| **Very High** | 95%+ | Official government data, single authoritative source |
+| **High** | 85-94% | Multiple corroborating sources, minor definitional variation |
+| **Medium** | 65-84% | Extrapolated from good sources, definitional uncertainty |
+| **Low** | <65% | Limited data, significant methodological issues |
+
+### Creating New Datasets
+
+Use the [DATASET-TEMPLATE.md](./DATASET-TEMPLATE.md) when creating new datasets.
+
+**Mandatory Sections:**
+1. **🎯 BEST ESTIMATE** - Must be first content section after title
+2. **One-liner** - 12 words max, quotable
+3. **Caveat** - Single most important limitation
+4. **Methodology Summary** - How the estimate was derived
+5. **Sources** - Authoritative links
+6. **Changelog** - Track revisions with reasons
+
+**Recommended Section:**
+- **Alternative Estimates & Why We Differ** - When other estimates exist
+
+---
+
 ## Directory Structure

 ```
 Data/
-├── sources/                           # External data source catalog (APIs, endpoints, metadata)
+├── DATASET-TEMPLATE.md                    # Schema template for new datasets
+├── README.md                              # This file
+├── UPDATES.md                             # Global changelog
+├── sources/                               # External data source catalog
 │   ├── DS-00001—WHO_Global_Health_Observatory/
 │   ├── DS-00002—UN_SDG_Indicators/
 │   ├── DS-00003—World_Bank_Open_Data/
@@ -18,178 +109,122 @@ Data/
 │   ├── DS-00007—BLS_JOLTS_Labor_Market/
 │   ├── DS-00008—EPA_Air_Quality_System/
 │   └── WELLBEING_DATA_SOURCES.md
-├── Bay-Area-COVID-Wastewater/        # Curated datasets
-├── Knowledge-Worker-Global-Salaries/
-├── Pulitzer-Prize-Winners/
-├── US-GDP/
-├── US-Inflation/
-├── README.md
-└── UPDATES.md
+├── Bay-Area-COVID-Wastewater/             # COVID wastewater surveillance
+│   └── SUMMARY.md                         # ← Start here
+├── Knowledge-Worker-Global-Salaries/      # Knowledge economy compensation
+│   └── SUMMARY.md                         # ← Start here
+├── Pulitzer-Prize-Winners/                # Arts & Letters Pulitzer data
+│   └── SUMMARY.md                         # ← Start here
+├── US-Common-Metrics/                     # 60+ US economic indicators
+│   └── SUMMARY.md                         # ← Start here
+├── US-GDP/                                # US GDP data
+│   └── SUMMARY.md                         # ← Start here
+├── US-Inflation/                          # CPI/inflation data
+│   └── SUMMARY.md                         # ← Start here
+└── US-Presidential-Approval/              # Approval ratings 1937-2025
+    └── SUMMARY.md                         # ← Start here
 ```

-**sources/** - Contains documentation and metadata for external data sources (APIs, endpoints, update frequencies, setup instructions). See `sources/WELLBEING_DATA_SOURCES.md` for details.
+**Start with SUMMARY.md** in any dataset directory—it gives you the answer first.

-**Dataset directories** - Contain curated, processed data collections ready for analysis.
-
-## Philosophy
-
-**Ground Truth First**: All datasets should come from authoritative, verifiable sources. We prioritize data quality and transparency over volume.
-
-**Human-Readable + Machine-Parseable**: Data is stored in CSV and Markdown formatsno opaque databases. Anyone (human or AI) should be able to read, understand, and analyze these datasets with minimal friction.
-
-**Shared Knowledge  Progress**: Like the broader Substrate project, this is about creating a foundation of shared, trusted information from which we can work toward solutions and understanding.
+---

 ## Dataset Categories

-Data sources cover a wide range of human-relevant topics:
+### Economic Indicators
+- **[US GDP](./US-GDP/SUMMARY.md)** - Gross Domestic Product (1929-2025)
+- **[US Inflation](./US-Inflation/SUMMARY.md)** - CPI data (1947-2025)
+- **[US Common Metrics](./US-Common-Metrics/SUMMARY.md)** - 60+ economic indicators dashboard
+- **[Knowledge Worker Compensation](./Knowledge-Worker-Global-Salaries/SUMMARY.md)** - Global and US compensation estimates
+
+### Political & Social
+- **[Presidential Approval](./US-Presidential-Approval/SUMMARY.md)** - Approval ratings (1937-2025)
+- **[Pulitzer Winners](./Pulitzer-Prize-Winners/SUMMARY.md)** - Arts & Letters awards (1918-2024)

 ### Health & Public Safety
- COVID-19 metrics (cases, hospitalizations, wastewater surveillance)
- Disease surveillance data
- Public health indicators
+- **[COVID Wastewater](./Bay-Area-COVID-Wastewater/SUMMARY.md)** - California wastewater surveillance

-### Economic Indicators
- Jobs and employment statistics
- Economic growth metrics
- Inflation and cost of living data
+---

-### Scientific & Academic
- Nobel Prize winners and recipients
- Major research publications
- Scientific discoveries and breakthroughs
+## Philosophy

-### Social & Cultural
- Demographic trends
- Education statistics
- Cultural achievements and milestones
+**Answer First**: Every dataset puts the best estimate at the top. Don't make people hunt for the number.

-### Environmental
- Climate data
- Environmental quality metrics
- Sustainability indicators
+**Ground Truth**: All datasets come from authoritative, verifiable sources. We prioritize data quality and transparency over volume.

-### Other
+**Human-Readable + Machine-Parseable**: Data is stored in CSV and Markdown formats—no opaque databases. Anyone (human or AI) can read, understand, and analyze these datasets with minimal friction.

- Anything else we need/want
+**Confidence-Aware**: Every estimate includes confidence levels. We distinguish between what we know well (99%+) and what's uncertain (65%).

-## File Naming Convention
+**Traceable**: Every number links to its authoritative source. Changes are logged with reasons.

-**Format**: `[CATEGORY]-[DESCRIPTION]-[DATE-RANGE].csv` or `.md`
+---

-**Examples**:
- `COVID-Wastewater-SF-Bay-Area-2020-2025.csv`
- `Nobel-Prize-Winners-Physics-1901-2024.csv`
- `US-Jobs-Report-Monthly-2020-2025.csv`
+## Data Quality Standards

-## Dataset Structure
+### Mandatory Requirements
+- **Confidence level** - Every estimate needs uncertainty bounds
+- **Last updated** - When data was most recently validated
+- **Source links** - Authoritative URLs for verification
+- **Changelog** - Track revisions with reasons

-### CSV Format
-Each CSV should include:
- **Header row**: Clear column names
- **Date column**: When applicable, use ISO 8601 format (YYYY-MM-DD)
- **Source column**: URL or citation for verification
- **Units**: Clearly specified in column names (e.g., `cases_per_100k`)
+### Quality Indicators
+- **Accuracy**: Data from verified, authoritative sources
+- **Completeness**: Gaps and missing data documented
+- **Timeliness**: Update frequency and freshness noted
+- **Transparency**: Methodology documented and reproducible

-### Metadata File
-Each dataset should have an accompanying `.md` file with:
- **Data Source**: URL and organization
- **Update Frequency**: How often the source updates
- **Last Updated**: When this dataset was last refreshed
- **Coverage**: Geographic/temporal scope
- **Notes**: Any important caveats or methodology notes
- **License**: Data usage rights
-
-## Example Metadata
-
-```markdown
-# COVID Wastewater Surveillance - SF Bay Area
-
-**Source**: WastewaterSCAN / CDC NWSS
-**URL**: https://www.cdc.gov/nwss/
-**Update Frequency**: Weekly
-**Last Updated**: 2025-10-07
-**Coverage**: San Francisco Bay Area, 2020-2025
-**Units**: Viral copies per mL
-**License**: Public domain (U.S. government data)
-
-**Notes**:
- Wastewater data is a leading indicator, typically showing trends 4-7 days before clinical testing
- Data represents population-level surveillance
-```
+---

 ## Contributing Datasets

 When adding new datasets:

-1. **Verify the source** - Use authoritative, primary sources when possible
-2. **Document thoroughly** - Include metadata file
-3. **Keep it updated** - Note the refresh date
-4. **Make it parseable** - Clean CSV format, consistent date formats
-5. **Cross-reference** - Link to related Substrate components (Problems, Solutions, etc.)
+1. **Use the template** - Start with [DATASET-TEMPLATE.md](./DATASET-TEMPLATE.md)
+2. **Answer first** - Create SUMMARY.md with 🎯 BEST ESTIMATE at top
+3. **Verify sources** - Use authoritative, primary sources
+4. **Set confidence** - Use the confidence level guidelines
+5. **Document changes** - Include changelog from day one
+6. **Link thoroughly** - Every number should trace to a source

-## Usage
+### Anti-Patterns to Avoid

-These datasets are designed to be:
- **Queried by AI** for analysis and insights
- **Referenced in arguments** to support claims with data
- **Used in solutions** to inform evidence-based approaches
- **Shared openly** to promote transparency and collaboration
+1. **Burying the answer** - Never make someone scroll to find the number
+2. **No confidence level** - Every estimate needs uncertainty bounds
+3. **Stale dates** - Always show when last validated
+4. **Methodology before answer** - People want the answer first
+5. **No changelog** - Revisions without history erode trust

-## Data Quality Standards
-
- **Accuracy**: Data must be from verified, authoritative sources
- **Completeness**: Note any gaps or missing data points
- **Timeliness**: Include last updated date
- **Transparency**: Always cite the original source
- **Reproducibility**: Provide enough information for others to verify or update
+---

 ## Integration with Substrate

 Data sources support other Substrate components:
- **Claims** can be backed by datasets (e.g., "CL-58970Anthropogenic Climate Change" supported by climate data)
- **Arguments** can reference specific data points
- **Solutions** can be evaluated using metrics from datasets
- **Plans** can track progress using ground-truth indicators
+
+- **Claims** can be backed by datasets with linked evidence
+- **Arguments** can reference specific metrics and sources
+- **Solutions** can be evaluated using ground-truth indicators
+- **Plans** can track progress with authoritative data
+
+---
+
+## Relationship with Research Projects
+
+The Data directory works with `research/` to maintain traceability between research and resulting datasets.
+
+**Research → Data Workflow:**
+
+1. **Input**: Research projects use `Data/sources/` for external APIs
+2. **Analysis**: Research performs synthesis and investigation
+3. **Output**: Curated datasets stored in `Data/` with SUMMARY.md
+4. **Documentation**: Methodology and sources fully documented
+
+**Key Principles:**
+- Each dataset includes `source.md` documenting origin
+- Research projects document which sources they used
+- Bidirectional links maintain complete traceability
+- Changes tracked in both research notes and dataset changelogs

 ---

 **Mission**: Build a trusted foundation of ground-truth data to support human understanding and progress.
-
-## Relationship with Research Projects
-
-The Data directory works in conjunction with `research/` directory to maintain clear traceability between research and resulting datasets.
-
-**Research → Data Workflow:**
-
-1. **Input**: Research projects use `Data/sources/` to access external data APIs and endpoints
-2. **Analysis**: Research projects perform analysis, synthesis, and investigation  
-3. **Output**: Research projects produce curated datasets stored in `Data/` top-level
-4. **Documentation**: Research projects document their methodology, sources used, and resulting datasets
-
-**Example Structure:**
-
-```
-research/knowledge-worker-compensation-study/
-├── README.md                    # Research overview and methodology
-├── SOURCES.md                   # Links to Data/sources/ used as inputs
-├── findings/                    # Analysis and insights
-└── [references Data/Knowledge-Worker-Global-Salaries/]
-
-Data/Knowledge-Worker-Global-Salaries/
-├── knowledge-worker-compensation-data.md    # Curated dataset (output)
-└── source.md                               # Metadata linking back to research project
-```
-
-**Key Principles:**
-
- Each dataset in `Data/` should include `source.md` documenting origin (research project or direct source)
- Research projects should document which `Data/sources/` they used as inputs in their SOURCES.md
- Research findings and methodology live in `research/`, curated datasets live in `Data/`
- Bidirectional links maintain complete traceability from source → research → dataset
-
-**Benefits:**
-
- Clear provenance: Always know where data came from and how it was produced
- Reproducibility: Research methodology is documented and linked to outputs  
- Reusability: Other research can reference existing datasets and their origins
- Quality: Traceability enables verification and validation of data quality
--- a/Data/US-Common-Metrics/SUMMARY.md
+++ b/Data/US-Common-Metrics/SUMMARY.md
@@ -0,0 +1,163 @@
+# US Common Metrics: Executive Summary
+
+---
+
+## 🎯 WHAT THIS IS
+
+| Attribute | Value |
+|-----------|-------|
+| **Dataset Type** | Dashboard / Reference Catalog |
+| **Coverage** | 60+ U.S. economic and social indicators |
+| **Update Frequency** | Daily → Annual (varies by metric) |
+| **Last Updated** | December 2025 |
+
+**One-liner:** Real-time reference dashboard for 60+ authoritative U.S. economic indicators.
+
+**Caveat:** This is a catalog, not an estimate—each metric has its own update schedule and methodology.
+
+---
+
+## Why This Dashboard Matters
+
+The U.S. economy is measured by dozens of agencies using hundreds of methodologies. Navigating [FRED](https://fred.stlouisfed.org/), [BLS](https://www.bls.gov/), [EIA](https://www.eia.gov/), [Treasury](https://fiscaldata.treasury.gov/), and [Census](https://data.census.gov/) separately is time-consuming and error-prone.
+
+This dashboard provides:
+- **Single source of truth** for the most-referenced U.S. metrics
+- **Full provenance** - every number linked to its authoritative source
+- **Current values** with update dates so you know data freshness
+- **FRED IDs** for programmatic access to historical data
+
+---
+
+## Key Indicators at a Glance
+
+### Economic Health
+| Metric | Current Value | Source |
+|--------|---------------|--------|
+| [Real GDP](https://fred.stlouisfed.org/series/GDPC1) | ~$23.8T (Q3 2024) | [BEA](https://www.bea.gov/) |
+| [GDP Growth (QoQ)](https://fred.stlouisfed.org/series/A191RL1Q225SBEA) | 3.8% | [BEA](https://www.bea.gov/) |
+| [Unemployment (U-3)](https://fred.stlouisfed.org/series/UNRATE) | 4.4% | [BLS](https://www.bls.gov/) |
+| [CPI Inflation](https://fred.stlouisfed.org/series/CPIAUCSL) | ~324 (index) | [BLS](https://www.bls.gov/) |
+
+### Consumer & Housing
+| Metric | Current Value | Source |
+|--------|---------------|--------|
+| [Consumer Sentiment](https://fred.stlouisfed.org/series/UMCSENT) | 53.6 | [U. Michigan](https://data.sca.isr.umich.edu/) |
+| [30-Year Mortgage Rate](https://fred.stlouisfed.org/series/MORTGAGE30US) | 6.23% | [Freddie Mac](http://www.freddiemac.com/pmms/) |
+| [Median Home Price](https://fred.stlouisfed.org/series/MSPUS) | ~$411K | [Census](https://www.census.gov/) |
+
+### Financial & Fiscal
+| Metric | Current Value | Source |
+|--------|---------------|--------|
+| [Fed Funds Rate](https://fred.stlouisfed.org/series/FEDFUNDS) | 3.88% | [Federal Reserve](https://www.federalreserve.gov/) |
+| [10-Year Treasury](https://fred.stlouisfed.org/series/DGS10) | 4.02% | [Treasury](https://home.treasury.gov/) |
+| [Debt-to-GDP Ratio](https://fred.stlouisfed.org/series/GFDEGDQ188S) | 118.8% | [FRED](https://fred.stlouisfed.org/) |
+| [S&P 500](https://fred.stlouisfed.org/series/SP500) | ~6,813 | [S&P](https://www.spglobal.com/) |
+
+---
+
+## Update Schedule
+
+| Frequency | What Gets Updated | Typical Lag |
+|-----------|------------------|-------------|
+| **Daily** | Treasury yields, Fed funds, oil prices, stock indices | Same day |
+| **Weekly** | Jobless claims, gas prices, mortgage rates | 4-7 days |
+| **Monthly** | CPI, PCE, employment, retail sales, housing | 2-4 weeks |
+| **Quarterly** | GDP, home prices, debt service ratio | 1-3 months |
+| **Annual** | Population, GINI, poverty, mortality | 6-18 months |
+
+---
+
+## Data Sources
+
+All metrics come from authoritative government and institutional sources:
+
+| Source | Website | What It Covers |
+|--------|---------|---------------|
+| [FRED](https://fred.stlouisfed.org/) | Federal Reserve Economic Data | Most economic indicators (aggregator) |
+| [BLS](https://www.bls.gov/) | Bureau of Labor Statistics | Employment, wages, CPI |
+| [BEA](https://www.bea.gov/) | Bureau of Economic Analysis | GDP, PCE, personal income |
+| [Census](https://data.census.gov/) | Census Bureau | Demographics, housing starts |
+| [EIA](https://www.eia.gov/) | Energy Information Administration | Gas prices, oil, energy |
+| [Treasury](https://fiscaldata.treasury.gov/) | Treasury Department | Federal debt, budget |
+| [CDC WONDER](https://wonder.cdc.gov/) | CDC | Mortality statistics |
+| [EPA AQS](https://www.epa.gov/aqs) | Environmental Protection Agency | Air quality |
+
+---
+
+## How to Use This Dashboard
+
+### For Quick Reference
+Open `US-Common-Metrics.md` for current values organized by category.
+
+### For Programmatic Access
+```bash
+# Get current values as CSV
+cat us-metrics-current.csv
+
+# Update all metrics (requires API keys)
+bun run update.ts
+```
+
+### For Historical Data
+Use the [FRED ID](https://fred.stlouisfed.org/) listed for each metric to access full time series.
+
+### For Source Verification
+Every metric links to its authoritative source. Click through to verify methodology.
+
+---
+
+## Methodology
+
+### Design Philosophy
+- **Authoritative sources only** - Government agencies and established institutions
+- **Provenance required** - Every number must trace to a specific source
+- **Transparency** - Methodology documented for each data source
+- **Automation** - Scripts update values; humans don't hand-edit data
+
+### Data Quality Notes
+1. **Revisions**: Many economic indicators are revised multiple times. Values shown are the most recent.
+2. **Seasonal Adjustment**: Most monthly/quarterly metrics are seasonally adjusted (SA/SAAR).
+3. **Index vs. Level**: Some metrics are indices (CPI, PPI), others are levels (GDP). Check units.
+
+---
+
+## Known Limitations
+
+1. **Table Formatting**: Some automated updates may corrupt markdown tables (being fixed)
+2. **Missing Values**: Some metrics show `--` when data isn't available or API failed
+3. **Lag**: Annual metrics (mortality, demographics) have 6-18 month publication delays
+4. **No Forecasts**: This is ground-truth data only, no projections
+
+---
+
+## Supporting Documentation
+
+| Document | Description |
+|----------|-------------|
+| [US-Common-Metrics.md](./US-Common-Metrics.md) | Full dataset with all 60+ metrics |
+| [source.md](./source.md) | Detailed methodology per data source |
+| [us-metrics-current.csv](./us-metrics-current.csv) | Machine-readable current values |
+| [us-metrics-historical.csv](./us-metrics-historical.csv) | Historical time series |
+
+---
+
+## Changelog
+
+| Date | Change | Reason |
+|------|--------|--------|
+| **December 2025** | Added SUMMARY.md with executive overview | Standardizing Substrate datasets to "Answer First" schema |
+| **December 2025** | Fixed table formatting corruption | Automated updates introduced markdown errors |
+| **December 2025** | Initial 60+ metric catalog | Comprehensive U.S. indicators dashboard |
+
+---
+
+## Research Metadata
+
+| Attribute | Value |
+|-----------|-------|
+| **Dataset Type** | Dashboard / Reference Catalog |
+| **Maintainer** | Daniel Miessler / Kai |
+| **Automation** | `bun run update.ts` |
+| **API Keys Required** | FRED, EIA, Census (all free) |
+| **Last Validation** | December 2025 |
--- a/Data/US-GDP/SUMMARY.md
+++ b/Data/US-GDP/SUMMARY.md
@@ -0,0 +1,192 @@
+# U.S. GDP: Executive Summary
+
+---
+
+## 🎯 BEST ESTIMATE
+
+| Metric | Value | Confidence | Last Updated |
+|--------|-------|------------|--------------|
+| **U.S. Real GDP (Q2 2025)** | **$23.77 trillion** | 99% | October 2025 |
+| **GDP Growth Rate (QoQ)** | **3.8%** | 99% | October 2025 |
+| **Annual Real GDP (2024)** | **$23.36 trillion** | 99% | October 2025 |
+
+**One-liner:** U.S. real GDP is $23.77 trillion (Q2 2025), growing at 3.8% quarterly.
+
+**Caveat:** GDP figures are revised three times after initial release; final revisions may adjust by ±0.5%.
+
+---
+
+## The Big Picture
+
+[Gross Domestic Product (GDP)](https://www.bea.gov/data/gdp/gross-domestic-product) is the most comprehensive measure of economic output—the total value of all goods and services produced within the United States. The [Bureau of Economic Analysis (BEA)](https://www.bea.gov/), part of the U.S. Department of Commerce, is the authoritative source for this data.
+
+Real GDP (inflation-adjusted, [chained 2017 dollars](https://www.bea.gov/help/faq/520)) enables valid comparisons across time by removing the effects of price changes. This dataset covers:
+- **Quarterly data**: Q1 1947 – Q2 2025 (314 observations)
+- **Annual data**: 1929 – 2024 (96 observations)
+
+---
+
+## Why This Number Matters
+
+GDP is the benchmark metric for:
+- **Economic health**: Is the economy growing or shrinking?
+- **Policy decisions**: Federal Reserve interest rates, fiscal policy
+- **Business strategy**: Market sizing, demand forecasting, investment planning
+- **International comparison**: How the U.S. economy compares globally
+
+A [1% change in GDP growth](https://fred.stlouisfed.org/series/A191RL1Q225SBEA) represents approximately $240 billion in annual economic output.
+
+---
+
+## Current Data Highlights
+
+### Recent Performance
+| Period | Real GDP | Growth Rate | Source |
+|--------|----------|-------------|--------|
+| Q2 2025 | [$23.77T](https://fred.stlouisfed.org/series/GDPC1) | +3.8% (QoQ) | [BEA](https://www.bea.gov/) |
+| Q1 2025 | $23.55T | Baseline | [BEA](https://www.bea.gov/) |
+| Full Year 2024 | [$23.36T](https://fred.stlouisfed.org/series/GDPCA) | +2.8% (YoY) | [BEA](https://www.bea.gov/) |
+
+### Historical Milestones
+| Year | Real GDP | Context |
+|------|----------|---------|
+| 1929 | $1.19T | Pre-Depression peak |
+| 1933 | $0.88T | Depression trough (-26%) |
+| 1947 | $2.18T | Post-WWII era begins (quarterly data starts) |
+| 2000 | $13.13T | Dot-com peak |
+| 2009 | $14.42T | Great Recession trough |
+| 2020 Q2 | $17.26T | COVID trough (-31.4% annualized) |
+| 2025 Q2 | $23.77T | Current |
+
+---
+
+## How the Number Is Calculated
+
+The BEA uses the [expenditure approach](https://www.bea.gov/resources/methodologies/nipa-handbook):
+
+**GDP = C + I + G + (X − M)**
+
+| Component | Description | Share of GDP |
+|-----------|-------------|--------------|
+| **C** | Personal consumption expenditures | ~68% |
+| **I** | Gross private domestic investment | ~18% |
+| **G** | Government consumption & investment | ~17% |
+| **(X-M)** | Net exports (exports minus imports) | ~-3% |
+
+### Real vs. Nominal
+- **Nominal GDP**: Measured in current prices (~$29T in 2024)
+- **Real GDP** (this dataset): Adjusted for inflation using [chained 2017 dollars](https://www.bea.gov/help/faq/520)
+- Real GDP enables valid comparisons across time periods
+
+---
+
+## Revision Process
+
+GDP is revised multiple times as more complete data becomes available:
+
+| Release | Timing | Typical Revision |
+|---------|--------|------------------|
+| **Advance Estimate** | ~30 days after quarter end | Initial estimate |
+| **Second Estimate** | ~60 days after quarter end | ±0.3-0.5 pp |
+| **Third Estimate** | ~90 days after quarter end | ±0.1-0.2 pp |
+| **Annual Revision** | September (5+ years) | May revise history |
+
+**Bottom line**: Current-quarter GDP is a provisional estimate. Use third estimates or annual revisions for precision.
+
+---
+
+## Data Sources
+
+| Source | What It Provides | Link |
+|--------|-----------------|------|
+| [Bureau of Economic Analysis (BEA)](https://www.bea.gov/) | Official U.S. GDP (primary authority) | [GDP Data](https://www.bea.gov/data/gdp) |
+| [FRED](https://fred.stlouisfed.org/) | Easy API access to BEA data | [GDPC1](https://fred.stlouisfed.org/series/GDPC1), [GDPCA](https://fred.stlouisfed.org/series/GDPCA) |
+
+**FRED Series IDs:**
+- `GDPC1` - Real GDP, Quarterly, Seasonally Adjusted Annual Rate
+- `GDPCA` - Real GDP, Annual, Not Seasonally Adjusted
+
+---
+
+## Confidence Assessment
+
+| Component | Confidence | Explanation |
+|-----------|------------|-------------|
+| **Current Quarterly GDP** | 95% | Advance estimate; will be revised |
+| **Third-Estimate GDP** | 99% | Final quarterly revision; highly reliable |
+| **Historical GDP (5+ years)** | 99%+ | Fully revised; official government statistic |
+
+This is among the highest-confidence economic data available—produced by the U.S. government using rigorous methodology with full transparency.
+
+---
+
+## Known Limitations
+
+1. **Revision lag**: Current-quarter figures are provisional estimates
+2. **Base year**: Uses 2017 as reference (updated periodically by BEA)
+3. **Pre-1947**: Quarterly data not available before 1947
+4. **Seasonal adjustment**: May mask genuine short-term fluctuations
+5. **Real economy**: GDP measures production, not welfare or sustainability
+
+---
+
+## How to Access the Data
+
+### Quick Access
+```bash
+# View quarterly data (1947-2025)
+cat Real-GDP-Quarterly-1947-2025.csv
+
+# View annual data (1929-2024)
+cat Real-GDP-Annual-1929-2024.csv
+```
+
+### Update to Latest
+```bash
+# Download latest quarterly data from FRED
+curl -L "https://fred.stlouisfed.org/graph/fredgraph.csv?id=GDPC1" -o Real-GDP-Quarterly.csv
+
+# Download latest annual data from FRED
+curl -L "https://fred.stlouisfed.org/graph/fredgraph.csv?id=GDPCA" -o Real-GDP-Annual.csv
+```
+
+---
+
+## Supporting Documentation
+
+| Document | Description |
+|----------|-------------|
+| [US-GDP-1929-2025.md](./US-GDP-1929-2025.md) | Full dataset documentation with historical context |
+| [source.md](./source.md) | Detailed methodology and provenance |
+| [Real-GDP-Quarterly-1947-2025.csv](./Real-GDP-Quarterly-1947-2025.csv) | Quarterly data (314 observations) |
+| [Real-GDP-Annual-1929-2024.csv](./Real-GDP-Annual-1929-2024.csv) | Annual data (96 observations) |
+
+---
+
+## Research Metadata
+
+| Attribute | Value |
+|-----------|-------|
+| **Research Date** | October 2025 |
+| **Researcher** | Kai (10-agent parallel synthesis) |
+| **Method** | Multi-source corroboration via Perplexity, Claude, Gemini |
+| **Confidence Level** | 99% (official government statistic) |
+| **Known Gaps** | Pre-1947 quarterly data unavailable |
+
+---
+
+## Changelog
+
+| Date | Change | Reason |
+|------|--------|--------|
+| **December 2025** | Added SUMMARY.md with executive overview | Standardizing Substrate datasets to "Answer First" schema |
+| **October 2025** | Initial dataset creation | Comprehensive U.S. GDP data collection |
+
+---
+
+## External Resources
+
+- [BEA GDP FAQ](https://www.bea.gov/help/faq/520) - Methodology questions
+- [BEA NIPA Handbook](https://www.bea.gov/resources/methodologies/nipa-handbook) - Full methodology
+- [BEA Release Schedule](https://www.bea.gov/news/schedule) - Upcoming GDP releases
+- [FRED GDP Series](https://fred.stlouisfed.org/categories/18) - All GDP-related data
--- a/Data/US-Inflation/SUMMARY.md
+++ b/Data/US-Inflation/SUMMARY.md
@@ -0,0 +1,205 @@
+# U.S. Inflation (CPI): Executive Summary
+
+---
+
+## 🎯 BEST ESTIMATE
+
+| Metric | Value | Confidence | Last Updated |
+|--------|-------|------------|--------------|
+| **CPI-U Index (August 2025)** | **323.4** | 99% | October 2025 |
+| **Year-over-Year Inflation** | **~2.5%** | 99% | October 2025 |
+| **Fed Target** | **2.0%** | Reference | - |
+
+**One-liner:** U.S. inflation is ~2.5% (YoY), with CPI index at 323.4 (1982-84=100 baseline).
+
+**Caveat:** CPI measures urban consumers only (~93% of population); regional variation may differ significantly.
+
+---
+
+## The Big Picture
+
+The [Consumer Price Index (CPI)](https://www.bls.gov/cpi/) is the primary measure of inflation in the United States—tracking changes in the price level of a basket of consumer goods and services. The [Bureau of Labor Statistics (BLS)](https://www.bls.gov/) produces this data monthly.
+
+**What the current numbers mean:**
+- A CPI of 323.4 means that goods costing $100 in 1982-84 now cost $323.40
+- At 2.5% annual inflation, prices double approximately every 28 years
+- Current inflation is near the [Federal Reserve's 2% target](https://www.federalreserve.gov/faqs/economy_14400.htm)
+
+---
+
+## Why This Number Matters
+
+Inflation affects virtually every economic decision:
+
+- **Wages**: [Cost-of-living adjustments (COLAs)](https://www.ssa.gov/oact/cola/colaseries.html) are tied to CPI
+- **Savings**: Determines whether your money gains or loses purchasing power
+- **Interest Rates**: The [Federal Reserve](https://www.federalreserve.gov/) adjusts rates based on inflation
+- **Contracts**: Many business and government contracts escalate with CPI
+- **Policy**: Trillions in Social Security, Medicare, and tax brackets adjust with CPI
+
+A [1% change in CPI](https://www.bls.gov/cpi/) affects billions of dollars in annual adjustments.
+
+---
+
+## Current Data Highlights
+
+### Recent Readings
+| Period | CPI Index | YoY Inflation | Source |
+|--------|-----------|---------------|--------|
+| August 2025 | [323.4](https://fred.stlouisfed.org/series/CPIAUCSL) | ~2.5% | [BLS](https://www.bls.gov/cpi/) |
+| June 2022 | 296.3 | 9.1% (peak) | [BLS](https://www.bls.gov/cpi/) |
+| 1982-84 Avg | 100.0 | Baseline | [BLS](https://www.bls.gov/cpi/) |
+| January 1947 | 21.5 | First obs. | [BLS](https://www.bls.gov/cpi/) |
+
+### Long-Term Trend
+| Period | Average Annual Inflation |
+|--------|-------------------------|
+| 1947-2025 (Full) | ~3.5% |
+| 1990-2019 (Pre-COVID) | ~2.4% |
+| 2021-2023 (COVID Surge) | ~6.0% |
+| 2024-2025 (Current) | ~2.5% |
+
+---
+
+## How the Number Is Calculated
+
+The BLS uses a [Laspeyres price index](https://www.bls.gov/opub/hom/cpi/calculation.htm):
+
+**CPI = (Cost of basket today / Cost of basket in base period) × 100**
+
+### The Market Basket
+| Category | Weight | Examples |
+|----------|--------|----------|
+| **Housing** | ~34% | Rent, utilities, furnishings |
+| **Food** | ~14% | Groceries, restaurants |
+| **Transportation** | ~16% | Vehicles, gas, insurance |
+| **Medical Care** | ~9% | Healthcare, drugs, insurance |
+| **Recreation** | ~5% | Entertainment, sports, hobbies |
+| **Education/Communication** | ~7% | Tuition, phones, internet |
+| **Other** | ~15% | Apparel, personal care |
+
+**Data Collection:**
+- ~80,000 prices collected monthly
+- 75 urban areas across the U.S.
+- Weights updated every 2 years from [Consumer Expenditure Survey](https://www.bls.gov/cex/)
+
+---
+
+## Key Inflation Rates to Know
+
+| Measure | What It Is | FRED ID |
+|---------|-----------|---------|
+| **Headline CPI** | All items | [CPIAUCSL](https://fred.stlouisfed.org/series/CPIAUCSL) |
+| **Core CPI** | Excludes food & energy | [CPILFESL](https://fred.stlouisfed.org/series/CPILFESL) |
+| **PCE** | Fed's preferred measure | [PCEPI](https://fred.stlouisfed.org/series/PCEPI) |
+| **Core PCE** | Fed's key target | [PCEPILFE](https://fred.stlouisfed.org/series/PCEPILFE) |
+
+**Why Core?** Food and energy prices are volatile. Core inflation shows underlying trends.
+
+**Why PCE?** The Federal Reserve targets [PCE inflation](https://fred.stlouisfed.org/series/PCEPI) rather than CPI because it accounts for substitution effects.
+
+---
+
+## Historical Inflation Episodes
+
+| Period | Peak Inflation | Cause |
+|--------|---------------|-------|
+| [1970s Stagflation](https://fred.stlouisfed.org/series/CPIAUCSL) | 14.8% (1980) | Oil shocks, monetary policy |
+| [Volcker Shock](https://www.federalreserve.gov/aboutthefed/bios/board/volcker.htm) | Fed raised rates to 20%+ | Broke inflation cycle |
+| [Great Moderation](https://www.federalreserve.gov/pubs/ifdp/2005/835/default.htm) | 2-3% (1990s-2000s) | Credible monetary policy |
+| [Great Recession](https://fred.stlouisfed.org/series/CPIAUCSL) | Brief deflation (2009) | Financial crisis |
+| [COVID Surge](https://fred.stlouisfed.org/series/CPIAUCSL) | 9.1% (June 2022) | Supply chain, stimulus |
+| **Current** | ~2.5% (2025) | Fed tightening working |
+
+---
+
+## Confidence Assessment
+
+| Component | Confidence | Explanation |
+|-----------|------------|-------------|
+| **Current CPI Index** | 99% | Official government statistic, gold standard |
+| **YoY Inflation Rate** | 99% | Direct calculation from CPI data |
+| **Historical Data** | 99%+ | Fully verified, minimal revisions |
+
+This is the most reliable inflation data available—produced by the U.S. government with rigorous methodology and complete transparency.
+
+---
+
+## Known Limitations
+
+1. **Substitution bias**: Fixed basket doesn't fully capture when consumers switch to cheaper alternatives
+2. **Quality adjustment**: Hard to account for product quality improvements over time
+3. **New products**: Slow to incorporate new goods (smartphones took years)
+4. **Geographic variation**: National average masks significant regional differences
+5. **Population**: Covers urban consumers only (~93% of U.S.)
+
+---
+
+## How to Calculate Inflation
+
+### Year-over-Year Rate
+```
+Inflation Rate = ((CPI_now - CPI_1year_ago) / CPI_1year_ago) × 100
+```
+
+### Convert Dollars Across Time
+```
+Real_value = Nominal_value × (CPI_target_year / CPI_original_year)
+```
+
+Example: $100 in 1984 equals ~$323 in 2025 purchasing power.
+
+---
+
+## Data Sources
+
+| Source | What It Provides | Link |
+|--------|-----------------|------|
+| [Bureau of Labor Statistics](https://www.bls.gov/cpi/) | Official CPI (primary authority) | [CPI Home](https://www.bls.gov/cpi/) |
+| [FRED](https://fred.stlouisfed.org/) | Easy API access to BLS data | [CPIAUCSL](https://fred.stlouisfed.org/series/CPIAUCSL) |
+
+**Quick Access:**
+```bash
+# Download latest CPI data from FRED
+curl -L "https://fred.stlouisfed.org/graph/fredgraph.csv?id=CPIAUCSL" -o CPI-latest.csv
+```
+
+---
+
+## Supporting Documentation
+
+| Document | Description |
+|----------|-------------|
+| [US-Inflation-CPI-1947-2025.md](./US-Inflation-CPI-1947-2025.md) | Full dataset documentation |
+| [source.md](./source.md) | Detailed methodology |
+| [CPI-US-Monthly-1947-2025.csv](./CPI-US-Monthly-1947-2025.csv) | Monthly data (945 observations) |
+
+---
+
+## Research Metadata
+
+| Attribute | Value |
+|-----------|-------|
+| **Research Date** | October 2025 |
+| **Researcher** | Kai |
+| **Method** | Direct BLS/FRED data collection |
+| **Confidence Level** | 99% (official government statistic) |
+| **Known Gaps** | Pre-1947 data uses different methodology |
+
+---
+
+## Changelog
+
+| Date | Change | Reason |
+|------|--------|--------|
+| **December 2025** | Added SUMMARY.md with executive overview | Standardizing Substrate datasets to "Answer First" schema |
+| **October 2025** | Initial dataset creation | Comprehensive U.S. CPI data collection |
+
+---
+
+## External Resources
+
+- [BLS CPI FAQ](https://www.bls.gov/cpi/questions-and-answers.htm) - Common questions
+- [BLS Handbook of Methods](https://www.bls.gov/opub/hom/cpi/) - Full methodology
+- [Fed Inflation Target](https://www.federalreserve.gov/faqs/economy_14400.htm) - Why 2%?
+- [CPI Inflation Calculator](https://www.bls.gov/data/inflation_calculator.htm) - BLS tool
--- a/Data/US-Presidential-Approval/SUMMARY.md
+++ b/Data/US-Presidential-Approval/SUMMARY.md
@@ -0,0 +1,198 @@
+# U.S. Presidential Approval Ratings: Executive Summary
+
+---
+
+## 🎯 BEST ESTIMATE
+
+| Metric | Value | Confidence | Last Updated |
+|--------|-------|------------|--------------|
+| **Trump Approval (Nov 2025)** | **36-44%** (avg ~41%) | 95% | November 2025 |
+| **Trump Net Approval** | **-13 points** | 95% | November 2025 |
+| **Historical Dataset** | **12,479 polls** (1937-2025) | 99% | November 2025 |
+
+**One-liner:** Trump's approval averages ~41% (net -13); dataset covers 12,479 polls since 1937.
+
+**Caveat:** Polling variation of 3-7 points across organizations; use aggregates, not single polls.
+
+---
+
+## The Big Picture
+
+Presidential approval ratings are the primary measure of public confidence in the president. [Gallup](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) has tracked this since 1937 using a consistent question: *"Do you approve or disapprove of the way [President] is handling his job as President?"*
+
+This dataset contains:
+- **12,479 individual polls** spanning 87+ years
+- **14 presidents** from FDR through Trump (second term)
+- **Multiple pollsters** for cross-validation
+
+---
+
+## Why This Number Matters
+
+Presidential approval is a leading indicator for:
+
+- **Legislative success**: High approval = political capital for agenda
+- **Reelection chances**: Presidents above 50% almost always win reelection
+- **Market confidence**: Investor and business sentiment
+- **Governing ability**: Approval affects congressional cooperation
+- **Historical legacy**: Approval shapes how presidents are remembered
+
+---
+
+## Current President: Donald Trump (Second Term)
+
+### November 2025 Snapshot
+| Metric | Value | Trend |
+|--------|-------|-------|
+| **Approval** | 36-44% (avg ~41%) | Declining |
+| **Disapproval** | 49-62% (avg ~54%) | Rising |
+| **Net Approval** | -13 points | Down from -9 in Oct |
+| **Peak Approval** | 52% (Jan 2025) | -11 points from peak |
+
+### 2025 Trajectory
+| Period | Approval Range | Context |
+|--------|----------------|---------|
+| Jan-Feb | 48-52% | Honeymoon period |
+| Mar-May | 44-48% | Post-honeymoon decline |
+| Jun-Aug | 44-46% | Summer plateau |
+| Sep-Nov | 36-44% | Government shutdown impact |
+
+**Key Factors:**
+- Government shutdown began October 1, 2025
+- Republican approval down 12 points (91% → 79%) since inauguration
+- Economic approval underwater: Economy -17.6, Inflation -27.5
+
+---
+
+## Historical Reference Points
+
+### Highest Approval Ratings Ever
+| President | Approval | Date | Context |
+|-----------|----------|------|---------|
+| [George W. Bush](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | 90% | Sept 2001 | Post-9/11 rally |
+| [Harry Truman](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | 87% | June 1945 | WWII victory |
+| [John F. Kennedy](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | 83% | April 1961 | Early presidency |
+
+### Lowest Approval Ratings Ever
+| President | Approval | Date | Context |
+|-----------|----------|------|---------|
+| [Harry Truman](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | 22% | Feb 1952 | Korean War |
+| [Richard Nixon](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | 24% | Aug 1974 | Watergate |
+| [George W. Bush](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | 25% | Oct 2008 | Financial crisis |
+| [Jimmy Carter](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | 28% | June 1979 | Economic crisis |
+
+### Typical Approval Ranges
+| Range | Interpretation |
+|-------|---------------|
+| **60-80%** | Honeymoon or crisis rally |
+| **50-60%** | Strong; likely reelection |
+| **40-50%** | Mixed; competitive |
+| **30-40%** | Weak; difficult governance |
+| **Below 30%** | Historical crisis territory |
+
+---
+
+## How to Interpret Polling Data
+
+### Net Approval
+```
+Net Approval = Approval % - Disapproval %
+```
+- **Positive** (+5 or higher): More approve than disapprove
+- **Around zero**: Evenly divided
+- **Negative** (-5 or lower): More disapprove than approve
+
+### Polling Variation
+Different pollsters show 3-7 point variation due to:
+- Sample type (adults vs. registered vs. likely voters)
+- Methodology (phone vs. online)
+- Question wording and order
+- Timing within news cycle
+
+**Best practice**: Use averages from aggregators like [RealClearPolitics](https://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html) or [FiveThirtyEight](https://projects.fivethirtyeight.com/polls/approval/donald-trump/) (when available).
+
+---
+
+## Data Sources
+
+| Source | What It Provides | Link |
+|--------|-----------------|------|
+| [Gallup](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | Gold standard since 1937 | [Historical Trends](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) |
+| [American Presidency Project](https://www.presidency.ucsb.edu/statistics/data/presidential-job-approval) | UC Santa Barbara archive | [Approval Data](https://www.presidency.ucsb.edu/statistics/data/presidential-job-approval) |
+| [Roper Center](https://ropercenter.cornell.edu/) | Cornell poll archive | [Research Access](https://ropercenter.cornell.edu/) |
+| [RealClearPolitics](https://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html) | Current poll aggregation | [Trump Approval](https://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html) |
+
+### Primary Dataset Source
+This Substrate dataset aggregates from [Lorenzo Ruffino's research compilation](https://github.com/lorenzo-ruffino/approval_rate_usa_president) which includes:
+- 15+ professional polling organizations
+- Consistent data structure for cross-temporal analysis
+- Open source with community validation
+
+---
+
+## Confidence Assessment
+
+| Component | Confidence | Explanation |
+|-----------|------------|-------------|
+| **Historical Data (1937-2020)** | 99% | Fully validated, Gallup gold standard |
+| **Recent Polls (2021-2025)** | 95% | Multiple organizations, subject to revision |
+| **Current Month** | 90% | Polling variation; use aggregates |
+
+Presidential approval data is among the most reliable polling data available due to:
+- 87+ years of consistent methodology
+- Multiple cross-validating sources
+- Scientific sampling standards
+- Institutional validation
+
+---
+
+## Known Limitations
+
+1. **Polling variation**: 3-7 point spread across organizations
+2. **Sample composition**: Adults vs. registered vs. likely voters differ
+3. **Methodology changes**: Online polling introduced post-2000
+4. **Response rates**: Declining over time, may affect representativeness
+5. **Timing sensitivity**: Polls capture specific moments; events shift opinion
+
+---
+
+## Supporting Documentation
+
+| Document | Description |
+|----------|-------------|
+| [README.md](./README.md) | Full dataset documentation |
+| [Trump-Approval-Analysis-2025.md](./Trump-Approval-Analysis-2025.md) | Current president analysis |
+| [Historical-Approval-Polls-1937-2024.csv](./Historical-Approval-Polls-1937-2024.csv) | 12,479 individual polls |
+| [Historical-Net-Approval-First-Terms.csv](./Historical-Net-Approval-First-Terms.csv) | First-term comparison data |
+| [Trump-Approval-2025.csv](./Trump-Approval-2025.csv) | Current year polling data |
+
+---
+
+## Research Metadata
+
+| Attribute | Value |
+|-----------|-------|
+| **Dataset Coverage** | 1937-2025 (87+ years) |
+| **Total Polls** | 12,479 individual polls |
+| **Presidents Covered** | 14 (FDR through Trump) |
+| **Update Frequency** | Continuous (as polls publish) |
+| **Confidence Level** | 95-99% (professional polling data) |
+
+---
+
+## Changelog
+
+| Date | Change | Reason |
+|------|--------|--------|
+| **December 2025** | Added SUMMARY.md with executive overview | Standardizing Substrate datasets to "Answer First" schema |
+| **November 2025** | Updated Trump 2025 data | Current polling integration |
+| **October 2025** | Initial dataset creation | Comprehensive approval data collection |
+
+---
+
+## External Resources
+
+- [Gallup Presidential Approval Center](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) - Historical data and analysis
+- [RealClearPolitics Approval Tracker](https://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html) - Current aggregates
+- [American Presidency Project](https://www.presidency.ucsb.edu/) - UC Santa Barbara archive
+- [Roper Center](https://ropercenter.cornell.edu/) - Cornell polling archive