From 9a181ae43b250ed16e631afb220d3ad701e72b8b Mon Sep 17 00:00:00 2001 From: Daniel Miessler Date: Wed, 10 Dec 2025 14:40:25 -0800 Subject: [PATCH] feat: Standardize all datasets to "Answer First" schema MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Added SUMMARY.md executive summaries to all 7 datasets with: - 🎯 BEST ESTIMATE section at top - 12-word one-liners for quick reference - Confidence levels and caveats - Extensive authoritative linking - Alternative Estimates sections where applicable - Changelogs for revision tracking Updated Data/README.md with: - Quick reference table of all datasets - Full schema documentation - Confidence level guidelines - Anti-patterns to avoid Datasets standardized: - Knowledge-Worker-Global-Salaries (gold standard) - US-GDP - US-Inflation - US-Presidential-Approval - Bay-Area-COVID-Wastewater - US-Common-Metrics - Pulitzer-Prize-Winners 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- Data/Bay-Area-COVID-Wastewater/SUMMARY.md | 198 ++++++++++++++ Data/Pulitzer-Prize-Winners/SUMMARY.md | 197 ++++++++++++++ Data/README.md | 315 ++++++++++++---------- Data/US-Common-Metrics/SUMMARY.md | 163 +++++++++++ Data/US-GDP/SUMMARY.md | 192 +++++++++++++ Data/US-Inflation/SUMMARY.md | 205 ++++++++++++++ Data/US-Presidential-Approval/SUMMARY.md | 198 ++++++++++++++ 7 files changed, 1328 insertions(+), 140 deletions(-) create mode 100644 Data/Bay-Area-COVID-Wastewater/SUMMARY.md create mode 100644 Data/Pulitzer-Prize-Winners/SUMMARY.md create mode 100644 Data/US-Common-Metrics/SUMMARY.md create mode 100644 Data/US-GDP/SUMMARY.md create mode 100644 Data/US-Inflation/SUMMARY.md create mode 100644 Data/US-Presidential-Approval/SUMMARY.md diff --git a/Data/Bay-Area-COVID-Wastewater/SUMMARY.md b/Data/Bay-Area-COVID-Wastewater/SUMMARY.md new file mode 100644 index 0000000..8aee912 --- /dev/null +++ b/Data/Bay-Area-COVID-Wastewater/SUMMARY.md @@ -0,0 +1,198 @@ +# Bay Area COVID-19 Wastewater Surveillance: Executive Summary + +--- + +## 🎯 BEST ESTIMATE + +| Metric | Value | Confidence | Last Updated | +|--------|-------|------------|--------------| +| **California Wastewater Level** | **5.60 log10 copies/mL** | 95% | August 2025 | +| **Status** | **HIGH activity** | 95% | August 2025 | +| **Dataset Coverage** | **161 weeks** (July 2022-present) | 99% | October 2025 | + +**One-liner:** California COVID wastewater is HIGH (5.6 log10); leads clinical data by 4-7 days. + +**Caveat:** Statewide data serves as Bay Area proxy; log scale means each unit = 10x viral load change. + +--- + +## The Big Picture + +Wastewater surveillance is the gold standard for population-level disease monitoring. Unlike clinical testing, it captures **all COVID infections**—symptomatic, asymptomatic, and unreported—providing an unbiased view of community transmission. + +The [California Department of Public Health (CDPH)](https://data.chhs.ca.gov/dataset/covid-19-wastewater-surveillance) monitors viral levels at 12+ wastewater treatment plants across California, including major Bay Area facilities. This data serves as a **leading indicator**, typically showing trends 4-7 days before clinical test results. + +--- + +## Why This Number Matters + +Wastewater data is valuable because it: + +- **Leads clinical data**: Shows trends 4-7 days before case reports +- **Captures all infections**: Not biased by testing availability or behavior +- **Enables early warning**: Identifies surges before hospitals see them +- **Supports policy decisions**: Used by California health officials for resource allocation +- **Tracks variants**: Can detect emerging variants before clinical sequencing + +--- + +## Current Status + +### August 2025 Snapshot +| Metric | Value | Interpretation | +|--------|-------|---------------| +| **Current Level** | 5.60 log10 copies/mL | HIGH | +| **Trend** | Elevated, increasing | Rising from spring lows | +| **Historical Peak** | 18.97 log10 (July 2022) | Omicron wave | +| **Recent Low** | 1.60 log10 (March 2025) | Spring baseline | + +### Activity Levels Reference +| Level | log10 Range | Interpretation | +|-------|-------------|---------------| +| **LOW** | <2.0 | Minimal community transmission | +| **MEDIUM** | 2.0-4.0 | Moderate transmission | +| **HIGH** | 4.0-6.0 | Elevated transmission | +| **VERY HIGH** | >6.0 | Surge conditions | + +--- + +## How to Interpret the Data + +### Log Scale Explained +Values are log10 transformed: +- **Each unit increase = 10x more virus** +- 5.0 → 6.0 means 10x increase +- 5.0 → 7.0 means 100x increase + +### What to Watch +1. **Direction matters more than absolute value** - Is it rising or falling? +2. **Rate of change** - Fast rises signal emerging surges +3. **Seasonal context** - Winter typically higher than summer +4. **Regional variation** - Bay Area may differ from statewide + +--- + +## Geographic Coverage + +### Bay Area Treatment Plants Monitored +| County | Major Facilities | +|--------|-----------------| +| San Francisco | SF Public Utilities | +| Alameda | [EBMUD](https://www.ebmud.com/) | +| Santa Clara | San Jose-Santa Clara RWF | +| Contra Costa | Central Contra Costa Sanitary | +| Marin | 6 sites including Central Marin | +| San Mateo | Silicon Valley Clean Water | + +The statewide California data serves as a robust proxy for Bay Area trends since it includes all major Bay Area treatment facilities. + +--- + +## Data Sources + +| Source | What It Provides | Link | +|--------|-----------------|------| +| [CDPH](https://data.chhs.ca.gov/dataset/covid-19-wastewater-surveillance) | California statewide wastewater | [Direct CSV](https://data.chhs.ca.gov/dataset/1184f641-313f-47ee-b126-9e8c42699be5/resource/726752d3-afe6-4733-99bd-ffb9f400348c/download/wastewater.csv) | +| [CDC NWSS](https://www.cdc.gov/nwss/) | National wastewater surveillance | [NWSS Dashboard](https://www.cdc.gov/nwss/covid-19/) | +| [WastewaterSCAN](https://www.wastewaterscan.org/) | Academic research data | [Data Portal](https://data.wastewaterscan.org/) | + +### Why CDPH? +- **Official government source** used by state decision-makers +- **Consistent methodology** since July 2022 +- **Weekly updates** every Friday +- **Direct CSV download** with no authentication required +- **Validated methodology**: qPCR/ddPCR with flow adjustment and PMMoV normalization + +--- + +## Methodology + +### Measurement +- **Method**: qPCR and ddPCR detection of SARS-CoV-2 RNA +- **Normalization**: Flow-adjusted and PMMoV-normalized +- **Units**: log10(gene copies per milliliter) +- **Frequency**: Weekly composite samples + +### Why Leading Indicator? +- Infected individuals shed virus in feces 2-7 days before symptoms +- Wastewater captures shedding regardless of testing behavior +- Aggregates entire sewershed population (millions of people) + +--- + +## Confidence Assessment + +| Component | Confidence | Explanation | +|-----------|------------|-------------| +| **Current Level** | 95% | Official government data, validated methodology | +| **Historical Data** | 99% | Complete 161-week dataset | +| **Trend Direction** | 90% | Subject to weekly variation | + +Wastewater surveillance is among the most reliable pandemic indicators because it: +- Uses scientific lab methodology (qPCR/ddPCR) +- Samples entire populations (no selection bias) +- Operates independently of testing behavior +- Has been validated against clinical data + +--- + +## Known Limitations + +1. **Statewide proxy**: California data used as Bay Area proxy (not county-specific) +2. **Log scale**: Can obscure magnitude of changes for non-technical users +3. **No variant detail**: Current data shows total virus, not strain breakdown +4. **Weekly frequency**: Daily fluctuations not captured +5. **Treatment plant variation**: Some facilities report more reliably than others + +--- + +## Use Cases + +This dataset supports: +- **Personal health decisions**: Should I mask at gatherings? +- **Policy analysis**: Evidence for health interventions +- **Academic research**: Population-level epidemiology +- **Trend forecasting**: What's coming in 1-2 weeks? +- **Historical analysis**: Pandemic timeline documentation + +--- + +## Supporting Documentation + +| Document | Description | +|----------|-------------| +| [README.md](./README.md) | Full dataset documentation | +| [COVID-Wastewater-California-Statewide-2022-2025.csv](./COVID-Wastewater-California-Statewide-2022-2025.csv) | Main dataset (161 weeks) | +| [COVID-Wastewater-SF-Bay-Area-2023-2025.md](./COVID-Wastewater-SF-Bay-Area-2023-2025.md) | Detailed methodology | +| [UPDATES.md](./UPDATES.md) | Data refresh changelog | + +--- + +## Research Metadata + +| Attribute | Value | +|-----------|-------| +| **Dataset Coverage** | July 2022 - Present | +| **Total Observations** | 161 weeks (100% complete) | +| **Update Frequency** | Weekly (Fridays) | +| **Geographic Scope** | California (includes Bay Area) | +| **Confidence Level** | 95% (government surveillance data) | + +--- + +## Changelog + +| Date | Change | Reason | +|------|--------|--------| +| **December 2025** | Added SUMMARY.md with executive overview | Standardizing Substrate datasets to "Answer First" schema | +| **October 2025** | Updated through August 2025 | Regular data refresh | +| **2024** | Initial dataset creation | COVID wastewater tracking system | + +--- + +## External Resources + +- [CDPH COVID Dashboard](https://covid19.ca.gov/data-and-tools/) - Official California data +- [CDC NWSS](https://www.cdc.gov/nwss/covid-19/) - National wastewater surveillance +- [WastewaterSCAN](https://www.wastewaterscan.org/) - Stanford/Emory research program +- [EBMUD Wastewater Monitoring](https://www.ebmud.com/) - East Bay utility data diff --git a/Data/Pulitzer-Prize-Winners/SUMMARY.md b/Data/Pulitzer-Prize-Winners/SUMMARY.md new file mode 100644 index 0000000..18eb463 --- /dev/null +++ b/Data/Pulitzer-Prize-Winners/SUMMARY.md @@ -0,0 +1,197 @@ +# Pulitzer Prize Winners (Arts & Letters): Executive Summary + +--- + +## 🎯 WHAT THIS IS + +| Attribute | Value | +|-----------|-------| +| **Dataset Type** | Historical Reference Catalog | +| **Coverage** | 249 winners across Arts & Letters (1918-2024) | +| **Categories** | Poetry (105), Drama (109), General/Special (35) | +| **Last Updated** | October 2025 | + +**One-liner:** Complete Arts & Letters Pulitzer database: 249 winners across Poetry, Drama, and Special awards. + +**Caveat:** Arts & Letters only—Journalism, Fiction, History, Biography, and Music categories not included. + +--- + +## The Big Picture + +The [Pulitzer Prizes](https://www.pulitzer.org/) are the most prestigious awards in American journalism and the arts, established in 1917. This dataset focuses on the **Arts & Letters categories**—Poetry, Drama, and General/Special Awards—providing 107 years of literary achievement data. + +This is **reference data**, not an estimate. Each entry represents a verified Pulitzer Prize winner, cross-referenced against the [official Pulitzer Prize archive](https://www.pulitzer.org/prize-winners-by-category). + +--- + +## Why This Dataset Matters + +The Pulitzer Prizes define American literary excellence: + +- **Poetry**: The most prestigious poetry award in the United States +- **Drama**: Shapes what gets produced on Broadway and beyond +- **Cultural canon**: Winners become required reading in schools and universities +- **Historical record**: Documents 107 years of American literary achievement +- **Research foundation**: Essential for literary criticism, cultural studies, and trend analysis + +--- + +## Dataset Contents + +### Category Breakdown +| Category | Winners | Coverage | +|----------|---------|----------| +| [Poetry](https://www.pulitzer.org/prize-winners-by-category/218) | 105 | 1918-2024 | +| [Drama](https://www.pulitzer.org/prize-winners-by-category/219) | 109 | 1918-2024 | +| [General/Special Awards](https://www.pulitzer.org/special-awards) | 35 | Various | +| **Total** | **249** | 107 years | + +### Sample Winners +| Year | Category | Winner | Work | +|------|----------|--------|------| +| 2024 | Poetry | [Paisley Rekdal](https://www.pulitzer.org/winners/paisley-rekdal) | *West: A Translation* | +| 2024 | Drama | [Paula Vogel](https://www.pulitzer.org/winners/paula-vogel) | *Mother Play* | +| 2023 | Poetry | [Carl Phillips](https://www.pulitzer.org/winners/carl-phillips) | *Then the War* | +| 2023 | Drama | [Sanaz Toossi](https://www.pulitzer.org/winners/sanaz-toossi) | *English* | + +--- + +## What's Included vs. Not Included + +### Included (Arts & Letters) +- **Poetry** - Annual award since 1918 (105 winners) +- **Drama** - Annual award since 1918 (109 winners) +- **General/Special Awards** - Lifetime achievement, special citations (35 winners) + +### Not Included (By Design) +| Category | Reason | +|----------|--------| +| Journalism (14 categories) | Different focus; available via [Pulitzer.org](https://www.pulitzer.org/prize-winners-categories) | +| Fiction | Lower Wikidata coverage; expansion opportunity | +| History | Lower Wikidata coverage; expansion opportunity | +| Biography | Lower Wikidata coverage; expansion opportunity | +| Music | Lower Wikidata coverage; expansion opportunity | + +**Rationale**: This dataset prioritizes **complete, verified data** over breadth. Poetry and Drama have 95%+ coverage in Wikidata; other categories have significant gaps. + +--- + +## Data Sources + +| Source | What It Provides | Link | +|--------|-----------------|------| +| [Wikidata](https://www.wikidata.org/) | Structured data via SPARQL | [Query Service](https://query.wikidata.org/) | +| [Pulitzer.org](https://www.pulitzer.org/) | Official archive (verification) | [Prize Winners](https://www.pulitzer.org/prize-winners-categories) | + +### Why Wikidata? +- **Community-validated**: Multiple editors verify each entry +- **Linked data**: Connected to primary sources +- **Machine-readable**: Direct SPARQL query access +- **Open license**: CC0 public domain +- **Cross-referenced**: Validated against Pulitzer.org official records + +--- + +## Confidence Assessment + +| Component | Confidence | Explanation | +|-----------|------------|-------------| +| **Poetry Winners** | 99% | 95%+ coverage, cross-validated | +| **Drama Winners** | 99% | 95%+ coverage, cross-validated | +| **General/Special** | 95% | Complete for documented awards | +| **Work Titles** | 90% | Some entries lack titles in source data | + +This is reference data, not estimates. Winners are verified facts from official records. + +--- + +## Known Limitations + +1. **Arts & Letters only**: Journalism categories not included (by design) +2. **Work titles**: Not all entries include work titles +3. **Co-winners**: Some years have multiple recipients +4. **No-award years**: Some years have gaps (no winner selected) +5. **Finalists**: Only winners included (finalists available from 1980+) + +--- + +## Use Cases + +This dataset supports: +- **Literary research**: Author achievement tracking +- **Educational reference**: Quick winner lookup +- **Trend analysis**: 107 years of literary prize patterns +- **Curriculum design**: Identifying canonical works +- **Cultural studies**: American literary canon formation +- **Fact-checking**: Verify literary achievement claims + +--- + +## Supporting Documentation + +| Document | Description | +|----------|-------------| +| [README.md](./README.md) | Full dataset documentation | +| [Pulitzer-Prize-Winners-Arts-Letters-1918-2024.csv](./Pulitzer-Prize-Winners-Arts-Letters-1918-2024.csv) | Combined dataset (249 winners) | +| [category-poetry.csv](./category-poetry.csv) | Poetry winners (105) | +| [category-drama.csv](./category-drama.csv) | Drama winners (109) | +| [category-general.csv](./category-general.csv) | Special awards (35) | + +--- + +## SPARQL Query for Updates + +```sparql +SELECT ?winner ?winnerLabel ?awardDate ?category ?categoryLabel ?work ?workLabel +WHERE { + ?winner p:P166 ?awardStatement . + ?awardStatement ps:P166 ?category . + ?category (wdt:P279|wdt:P31)* wd:Q46525 . + OPTIONAL { ?awardStatement pq:P585 ?awardDate . } + OPTIONAL { ?awardStatement pq:P1686 ?work . } + SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } +} +ORDER BY DESC(?awardDate) +``` + +Run at: [query.wikidata.org](https://query.wikidata.org/) + +--- + +## Research Metadata + +| Attribute | Value | +|-----------|-------| +| **Dataset Coverage** | 1918-2024 (107 years) | +| **Total Records** | 249 unique winners | +| **Categories** | Poetry, Drama, General/Special | +| **Data Source** | Wikidata (CC0 public domain) | +| **Confidence Level** | 99% (verified reference data) | + +--- + +## Changelog + +| Date | Change | Reason | +|------|--------|--------| +| **December 2025** | Added SUMMARY.md with executive overview | Standardizing Substrate datasets to "Answer First" schema | +| **October 2025** | Initial dataset creation | Arts & Letters Pulitzer data collection | + +--- + +## Future Expansion Opportunities + +1. **Add Fiction/History/Biography/Music** - Complete Arts & Letters coverage +2. **Add Journalism categories** - Scrape Pulitzer.org directly (~1,400+ winners) +3. **Add finalists** - Available 1980-present (3 per category) +4. **Annual updates** - Refresh each April/May after announcements + +--- + +## External Resources + +- [Pulitzer.org Prize Winners](https://www.pulitzer.org/prize-winners-categories) - Official archive +- [Pulitzer Prize History](https://www.pulitzer.org/page/history-pulitzer-prizes) - Background and context +- [Wikidata Pulitzer Query](https://query.wikidata.org/) - Run your own queries +- [Columbia Journalism Review Pulitzer Data](https://www.cjr.org/) - Journalism-focused analysis diff --git a/Data/README.md b/Data/README.md index 6e85ee0..80d2ba8 100644 --- a/Data/README.md +++ b/Data/README.md @@ -4,11 +4,102 @@ The Data directory contains curated, ground-truth datasets about important aspects of human life, society, and progress, along with documentation for external data sources. This is a collection of reliable, parseable data that can be used for analysis, research, and informed decision-making. +--- + +## 🎯 "Answer First" Schema + +**All Substrate datasets follow the "Answer First" schema.** Every dataset has a `SUMMARY.md` file that puts the best estimate at the top. + +### Quick Reference + +| Dataset | Best Estimate | One-liner | +|---------|--------------|-----------| +| [Knowledge Worker Compensation](./Knowledge-Worker-Global-Salaries/SUMMARY.md) | $35-50T global, $6-12T US | Global knowledge workers earn $35-50T annually | +| [US GDP](./US-GDP/SUMMARY.md) | $23.77T (Q2 2025) | U.S. real GDP is $23.77T, growing 3.8% quarterly | +| [US Inflation](./US-Inflation/SUMMARY.md) | 2.5% YoY | U.S. inflation is ~2.5% with CPI at 323.4 | +| [Presidential Approval](./US-Presidential-Approval/SUMMARY.md) | ~41% (Trump Nov 2025) | Trump approval averages ~41% (net -13) | +| [COVID Wastewater](./Bay-Area-COVID-Wastewater/SUMMARY.md) | HIGH (5.6 log10) | California COVID wastewater is HIGH | +| [US Common Metrics](./US-Common-Metrics/SUMMARY.md) | 60+ indicators | Real-time dashboard of U.S. economic indicators | +| [Pulitzer Winners](./Pulitzer-Prize-Winners/SUMMARY.md) | 249 winners | Complete Arts & Letters database (1918-2024) | + +### Schema Structure + +Every `SUMMARY.md` follows this structure: + +```markdown +# [Dataset Title]: Executive Summary + +## 🎯 BEST ESTIMATE + +| Metric | Value | Confidence | Last Updated | +|--------|-------|------------|--------------| +| **[Primary Metric]** | **[VALUE]** | [X%] | [DATE] | + +**One-liner:** [12 words max - the quotable answer] + +**Caveat:** [Single most important limitation] + +--- + +## The Big Picture +[2-3 sentences: What this is, why it matters, major uncertainty] + +## Why This Number Matters +[Context for why this metric is important] + +## How the Number Is Calculated +[Methodology summary] + +## Confidence Assessment +[What we know well vs. what's uncertain] + +## Alternative Estimates & Why We Differ +[When applicable: other approaches and why we chose ours] + +## Data Sources +[Links to authoritative sources] + +## Supporting Documentation +[Links to detailed data files] + +## Changelog +[When estimates changed and why] +``` + +### Confidence Level Guidelines + +| Level | Percentage | When to Use | +|-------|------------|-------------| +| **Very High** | 95%+ | Official government data, single authoritative source | +| **High** | 85-94% | Multiple corroborating sources, minor definitional variation | +| **Medium** | 65-84% | Extrapolated from good sources, definitional uncertainty | +| **Low** | <65% | Limited data, significant methodological issues | + +### Creating New Datasets + +Use the [DATASET-TEMPLATE.md](./DATASET-TEMPLATE.md) when creating new datasets. + +**Mandatory Sections:** +1. **🎯 BEST ESTIMATE** - Must be first content section after title +2. **One-liner** - 12 words max, quotable +3. **Caveat** - Single most important limitation +4. **Methodology Summary** - How the estimate was derived +5. **Sources** - Authoritative links +6. **Changelog** - Track revisions with reasons + +**Recommended Section:** +- **Alternative Estimates & Why We Differ** - When other estimates exist + +--- + ## Directory Structure ``` Data/ -├── sources/ # External data source catalog (APIs, endpoints, metadata) +├── DATASET-TEMPLATE.md # Schema template for new datasets +├── README.md # This file +├── UPDATES.md # Global changelog +├── sources/ # External data source catalog │ ├── DS-00001—WHO_Global_Health_Observatory/ │ ├── DS-00002—UN_SDG_Indicators/ │ ├── DS-00003—World_Bank_Open_Data/ @@ -18,178 +109,122 @@ Data/ │ ├── DS-00007—BLS_JOLTS_Labor_Market/ │ ├── DS-00008—EPA_Air_Quality_System/ │ └── WELLBEING_DATA_SOURCES.md -├── Bay-Area-COVID-Wastewater/ # Curated datasets -├── Knowledge-Worker-Global-Salaries/ -├── Pulitzer-Prize-Winners/ -├── US-GDP/ -├── US-Inflation/ -├── README.md -└── UPDATES.md +├── Bay-Area-COVID-Wastewater/ # COVID wastewater surveillance +│ └── SUMMARY.md # ← Start here +├── Knowledge-Worker-Global-Salaries/ # Knowledge economy compensation +│ └── SUMMARY.md # ← Start here +├── Pulitzer-Prize-Winners/ # Arts & Letters Pulitzer data +│ └── SUMMARY.md # ← Start here +├── US-Common-Metrics/ # 60+ US economic indicators +│ └── SUMMARY.md # ← Start here +├── US-GDP/ # US GDP data +│ └── SUMMARY.md # ← Start here +├── US-Inflation/ # CPI/inflation data +│ └── SUMMARY.md # ← Start here +└── US-Presidential-Approval/ # Approval ratings 1937-2025 + └── SUMMARY.md # ← Start here ``` -**sources/** - Contains documentation and metadata for external data sources (APIs, endpoints, update frequencies, setup instructions). See `sources/WELLBEING_DATA_SOURCES.md` for details. +**Start with SUMMARY.md** in any dataset directory—it gives you the answer first. -**Dataset directories** - Contain curated, processed data collections ready for analysis. - -## Philosophy - -**Ground Truth First**: All datasets should come from authoritative, verifiable sources. We prioritize data quality and transparency over volume. - -**Human-Readable + Machine-Parseable**: Data is stored in CSV and Markdown formatsno opaque databases. Anyone (human or AI) should be able to read, understand, and analyze these datasets with minimal friction. - -**Shared Knowledge ’ Progress**: Like the broader Substrate project, this is about creating a foundation of shared, trusted information from which we can work toward solutions and understanding. +--- ## Dataset Categories -Data sources cover a wide range of human-relevant topics: +### Economic Indicators +- **[US GDP](./US-GDP/SUMMARY.md)** - Gross Domestic Product (1929-2025) +- **[US Inflation](./US-Inflation/SUMMARY.md)** - CPI data (1947-2025) +- **[US Common Metrics](./US-Common-Metrics/SUMMARY.md)** - 60+ economic indicators dashboard +- **[Knowledge Worker Compensation](./Knowledge-Worker-Global-Salaries/SUMMARY.md)** - Global and US compensation estimates + +### Political & Social +- **[Presidential Approval](./US-Presidential-Approval/SUMMARY.md)** - Approval ratings (1937-2025) +- **[Pulitzer Winners](./Pulitzer-Prize-Winners/SUMMARY.md)** - Arts & Letters awards (1918-2024) ### Health & Public Safety -- COVID-19 metrics (cases, hospitalizations, wastewater surveillance) -- Disease surveillance data -- Public health indicators +- **[COVID Wastewater](./Bay-Area-COVID-Wastewater/SUMMARY.md)** - California wastewater surveillance -### Economic Indicators -- Jobs and employment statistics -- Economic growth metrics -- Inflation and cost of living data +--- -### Scientific & Academic -- Nobel Prize winners and recipients -- Major research publications -- Scientific discoveries and breakthroughs +## Philosophy -### Social & Cultural -- Demographic trends -- Education statistics -- Cultural achievements and milestones +**Answer First**: Every dataset puts the best estimate at the top. Don't make people hunt for the number. -### Environmental -- Climate data -- Environmental quality metrics -- Sustainability indicators +**Ground Truth**: All datasets come from authoritative, verifiable sources. We prioritize data quality and transparency over volume. -### Other +**Human-Readable + Machine-Parseable**: Data is stored in CSV and Markdown formats—no opaque databases. Anyone (human or AI) can read, understand, and analyze these datasets with minimal friction. -- Anything else we need/want - -## File Naming Convention +**Confidence-Aware**: Every estimate includes confidence levels. We distinguish between what we know well (99%+) and what's uncertain (65%). -**Format**: `[CATEGORY]-[DESCRIPTION]-[DATE-RANGE].csv` or `.md` +**Traceable**: Every number links to its authoritative source. Changes are logged with reasons. -**Examples**: -- `COVID-Wastewater-SF-Bay-Area-2020-2025.csv` -- `Nobel-Prize-Winners-Physics-1901-2024.csv` -- `US-Jobs-Report-Monthly-2020-2025.csv` +--- -## Dataset Structure +## Data Quality Standards -### CSV Format -Each CSV should include: -- **Header row**: Clear column names -- **Date column**: When applicable, use ISO 8601 format (YYYY-MM-DD) -- **Source column**: URL or citation for verification -- **Units**: Clearly specified in column names (e.g., `cases_per_100k`) +### Mandatory Requirements +- **Confidence level** - Every estimate needs uncertainty bounds +- **Last updated** - When data was most recently validated +- **Source links** - Authoritative URLs for verification +- **Changelog** - Track revisions with reasons -### Metadata File -Each dataset should have an accompanying `.md` file with: -- **Data Source**: URL and organization -- **Update Frequency**: How often the source updates -- **Last Updated**: When this dataset was last refreshed -- **Coverage**: Geographic/temporal scope -- **Notes**: Any important caveats or methodology notes -- **License**: Data usage rights +### Quality Indicators +- **Accuracy**: Data from verified, authoritative sources +- **Completeness**: Gaps and missing data documented +- **Timeliness**: Update frequency and freshness noted +- **Transparency**: Methodology documented and reproducible -## Example Metadata - -```markdown -# COVID Wastewater Surveillance - SF Bay Area - -**Source**: WastewaterSCAN / CDC NWSS -**URL**: https://www.cdc.gov/nwss/ -**Update Frequency**: Weekly -**Last Updated**: 2025-10-07 -**Coverage**: San Francisco Bay Area, 2020-2025 -**Units**: Viral copies per mL -**License**: Public domain (U.S. government data) - -**Notes**: -- Wastewater data is a leading indicator, typically showing trends 4-7 days before clinical testing -- Data represents population-level surveillance -``` +--- ## Contributing Datasets When adding new datasets: -1. **Verify the source** - Use authoritative, primary sources when possible -2. **Document thoroughly** - Include metadata file -3. **Keep it updated** - Note the refresh date -4. **Make it parseable** - Clean CSV format, consistent date formats -5. **Cross-reference** - Link to related Substrate components (Problems, Solutions, etc.) +1. **Use the template** - Start with [DATASET-TEMPLATE.md](./DATASET-TEMPLATE.md) +2. **Answer first** - Create SUMMARY.md with 🎯 BEST ESTIMATE at top +3. **Verify sources** - Use authoritative, primary sources +4. **Set confidence** - Use the confidence level guidelines +5. **Document changes** - Include changelog from day one +6. **Link thoroughly** - Every number should trace to a source -## Usage +### Anti-Patterns to Avoid -These datasets are designed to be: -- **Queried by AI** for analysis and insights -- **Referenced in arguments** to support claims with data -- **Used in solutions** to inform evidence-based approaches -- **Shared openly** to promote transparency and collaboration +1. **Burying the answer** - Never make someone scroll to find the number +2. **No confidence level** - Every estimate needs uncertainty bounds +3. **Stale dates** - Always show when last validated +4. **Methodology before answer** - People want the answer first +5. **No changelog** - Revisions without history erode trust -## Data Quality Standards - -- **Accuracy**: Data must be from verified, authoritative sources -- **Completeness**: Note any gaps or missing data points -- **Timeliness**: Include last updated date -- **Transparency**: Always cite the original source -- **Reproducibility**: Provide enough information for others to verify or update +--- ## Integration with Substrate Data sources support other Substrate components: -- **Claims** can be backed by datasets (e.g., "CL-58970Anthropogenic Climate Change" supported by climate data) -- **Arguments** can reference specific data points -- **Solutions** can be evaluated using metrics from datasets -- **Plans** can track progress using ground-truth indicators + +- **Claims** can be backed by datasets with linked evidence +- **Arguments** can reference specific metrics and sources +- **Solutions** can be evaluated using ground-truth indicators +- **Plans** can track progress with authoritative data + +--- + +## Relationship with Research Projects + +The Data directory works with `research/` to maintain traceability between research and resulting datasets. + +**Research → Data Workflow:** + +1. **Input**: Research projects use `Data/sources/` for external APIs +2. **Analysis**: Research performs synthesis and investigation +3. **Output**: Curated datasets stored in `Data/` with SUMMARY.md +4. **Documentation**: Methodology and sources fully documented + +**Key Principles:** +- Each dataset includes `source.md` documenting origin +- Research projects document which sources they used +- Bidirectional links maintain complete traceability +- Changes tracked in both research notes and dataset changelogs --- **Mission**: Build a trusted foundation of ground-truth data to support human understanding and progress. - -## Relationship with Research Projects - -The Data directory works in conjunction with `research/` directory to maintain clear traceability between research and resulting datasets. - -**Research → Data Workflow:** - -1. **Input**: Research projects use `Data/sources/` to access external data APIs and endpoints -2. **Analysis**: Research projects perform analysis, synthesis, and investigation -3. **Output**: Research projects produce curated datasets stored in `Data/` top-level -4. **Documentation**: Research projects document their methodology, sources used, and resulting datasets - -**Example Structure:** - -``` -research/knowledge-worker-compensation-study/ -├── README.md # Research overview and methodology -├── SOURCES.md # Links to Data/sources/ used as inputs -├── findings/ # Analysis and insights -└── [references Data/Knowledge-Worker-Global-Salaries/] - -Data/Knowledge-Worker-Global-Salaries/ -├── knowledge-worker-compensation-data.md # Curated dataset (output) -└── source.md # Metadata linking back to research project -``` - -**Key Principles:** - -- Each dataset in `Data/` should include `source.md` documenting origin (research project or direct source) -- Research projects should document which `Data/sources/` they used as inputs in their SOURCES.md -- Research findings and methodology live in `research/`, curated datasets live in `Data/` -- Bidirectional links maintain complete traceability from source → research → dataset - -**Benefits:** - -- Clear provenance: Always know where data came from and how it was produced -- Reproducibility: Research methodology is documented and linked to outputs -- Reusability: Other research can reference existing datasets and their origins -- Quality: Traceability enables verification and validation of data quality diff --git a/Data/US-Common-Metrics/SUMMARY.md b/Data/US-Common-Metrics/SUMMARY.md new file mode 100644 index 0000000..76a6360 --- /dev/null +++ b/Data/US-Common-Metrics/SUMMARY.md @@ -0,0 +1,163 @@ +# US Common Metrics: Executive Summary + +--- + +## 🎯 WHAT THIS IS + +| Attribute | Value | +|-----------|-------| +| **Dataset Type** | Dashboard / Reference Catalog | +| **Coverage** | 60+ U.S. economic and social indicators | +| **Update Frequency** | Daily → Annual (varies by metric) | +| **Last Updated** | December 2025 | + +**One-liner:** Real-time reference dashboard for 60+ authoritative U.S. economic indicators. + +**Caveat:** This is a catalog, not an estimate—each metric has its own update schedule and methodology. + +--- + +## Why This Dashboard Matters + +The U.S. economy is measured by dozens of agencies using hundreds of methodologies. Navigating [FRED](https://fred.stlouisfed.org/), [BLS](https://www.bls.gov/), [EIA](https://www.eia.gov/), [Treasury](https://fiscaldata.treasury.gov/), and [Census](https://data.census.gov/) separately is time-consuming and error-prone. + +This dashboard provides: +- **Single source of truth** for the most-referenced U.S. metrics +- **Full provenance** - every number linked to its authoritative source +- **Current values** with update dates so you know data freshness +- **FRED IDs** for programmatic access to historical data + +--- + +## Key Indicators at a Glance + +### Economic Health +| Metric | Current Value | Source | +|--------|---------------|--------| +| [Real GDP](https://fred.stlouisfed.org/series/GDPC1) | ~$23.8T (Q3 2024) | [BEA](https://www.bea.gov/) | +| [GDP Growth (QoQ)](https://fred.stlouisfed.org/series/A191RL1Q225SBEA) | 3.8% | [BEA](https://www.bea.gov/) | +| [Unemployment (U-3)](https://fred.stlouisfed.org/series/UNRATE) | 4.4% | [BLS](https://www.bls.gov/) | +| [CPI Inflation](https://fred.stlouisfed.org/series/CPIAUCSL) | ~324 (index) | [BLS](https://www.bls.gov/) | + +### Consumer & Housing +| Metric | Current Value | Source | +|--------|---------------|--------| +| [Consumer Sentiment](https://fred.stlouisfed.org/series/UMCSENT) | 53.6 | [U. Michigan](https://data.sca.isr.umich.edu/) | +| [30-Year Mortgage Rate](https://fred.stlouisfed.org/series/MORTGAGE30US) | 6.23% | [Freddie Mac](http://www.freddiemac.com/pmms/) | +| [Median Home Price](https://fred.stlouisfed.org/series/MSPUS) | ~$411K | [Census](https://www.census.gov/) | + +### Financial & Fiscal +| Metric | Current Value | Source | +|--------|---------------|--------| +| [Fed Funds Rate](https://fred.stlouisfed.org/series/FEDFUNDS) | 3.88% | [Federal Reserve](https://www.federalreserve.gov/) | +| [10-Year Treasury](https://fred.stlouisfed.org/series/DGS10) | 4.02% | [Treasury](https://home.treasury.gov/) | +| [Debt-to-GDP Ratio](https://fred.stlouisfed.org/series/GFDEGDQ188S) | 118.8% | [FRED](https://fred.stlouisfed.org/) | +| [S&P 500](https://fred.stlouisfed.org/series/SP500) | ~6,813 | [S&P](https://www.spglobal.com/) | + +--- + +## Update Schedule + +| Frequency | What Gets Updated | Typical Lag | +|-----------|------------------|-------------| +| **Daily** | Treasury yields, Fed funds, oil prices, stock indices | Same day | +| **Weekly** | Jobless claims, gas prices, mortgage rates | 4-7 days | +| **Monthly** | CPI, PCE, employment, retail sales, housing | 2-4 weeks | +| **Quarterly** | GDP, home prices, debt service ratio | 1-3 months | +| **Annual** | Population, GINI, poverty, mortality | 6-18 months | + +--- + +## Data Sources + +All metrics come from authoritative government and institutional sources: + +| Source | Website | What It Covers | +|--------|---------|---------------| +| [FRED](https://fred.stlouisfed.org/) | Federal Reserve Economic Data | Most economic indicators (aggregator) | +| [BLS](https://www.bls.gov/) | Bureau of Labor Statistics | Employment, wages, CPI | +| [BEA](https://www.bea.gov/) | Bureau of Economic Analysis | GDP, PCE, personal income | +| [Census](https://data.census.gov/) | Census Bureau | Demographics, housing starts | +| [EIA](https://www.eia.gov/) | Energy Information Administration | Gas prices, oil, energy | +| [Treasury](https://fiscaldata.treasury.gov/) | Treasury Department | Federal debt, budget | +| [CDC WONDER](https://wonder.cdc.gov/) | CDC | Mortality statistics | +| [EPA AQS](https://www.epa.gov/aqs) | Environmental Protection Agency | Air quality | + +--- + +## How to Use This Dashboard + +### For Quick Reference +Open `US-Common-Metrics.md` for current values organized by category. + +### For Programmatic Access +```bash +# Get current values as CSV +cat us-metrics-current.csv + +# Update all metrics (requires API keys) +bun run update.ts +``` + +### For Historical Data +Use the [FRED ID](https://fred.stlouisfed.org/) listed for each metric to access full time series. + +### For Source Verification +Every metric links to its authoritative source. Click through to verify methodology. + +--- + +## Methodology + +### Design Philosophy +- **Authoritative sources only** - Government agencies and established institutions +- **Provenance required** - Every number must trace to a specific source +- **Transparency** - Methodology documented for each data source +- **Automation** - Scripts update values; humans don't hand-edit data + +### Data Quality Notes +1. **Revisions**: Many economic indicators are revised multiple times. Values shown are the most recent. +2. **Seasonal Adjustment**: Most monthly/quarterly metrics are seasonally adjusted (SA/SAAR). +3. **Index vs. Level**: Some metrics are indices (CPI, PPI), others are levels (GDP). Check units. + +--- + +## Known Limitations + +1. **Table Formatting**: Some automated updates may corrupt markdown tables (being fixed) +2. **Missing Values**: Some metrics show `--` when data isn't available or API failed +3. **Lag**: Annual metrics (mortality, demographics) have 6-18 month publication delays +4. **No Forecasts**: This is ground-truth data only, no projections + +--- + +## Supporting Documentation + +| Document | Description | +|----------|-------------| +| [US-Common-Metrics.md](./US-Common-Metrics.md) | Full dataset with all 60+ metrics | +| [source.md](./source.md) | Detailed methodology per data source | +| [us-metrics-current.csv](./us-metrics-current.csv) | Machine-readable current values | +| [us-metrics-historical.csv](./us-metrics-historical.csv) | Historical time series | + +--- + +## Changelog + +| Date | Change | Reason | +|------|--------|--------| +| **December 2025** | Added SUMMARY.md with executive overview | Standardizing Substrate datasets to "Answer First" schema | +| **December 2025** | Fixed table formatting corruption | Automated updates introduced markdown errors | +| **December 2025** | Initial 60+ metric catalog | Comprehensive U.S. indicators dashboard | + +--- + +## Research Metadata + +| Attribute | Value | +|-----------|-------| +| **Dataset Type** | Dashboard / Reference Catalog | +| **Maintainer** | Daniel Miessler / Kai | +| **Automation** | `bun run update.ts` | +| **API Keys Required** | FRED, EIA, Census (all free) | +| **Last Validation** | December 2025 | diff --git a/Data/US-GDP/SUMMARY.md b/Data/US-GDP/SUMMARY.md new file mode 100644 index 0000000..5532e07 --- /dev/null +++ b/Data/US-GDP/SUMMARY.md @@ -0,0 +1,192 @@ +# U.S. GDP: Executive Summary + +--- + +## 🎯 BEST ESTIMATE + +| Metric | Value | Confidence | Last Updated | +|--------|-------|------------|--------------| +| **U.S. Real GDP (Q2 2025)** | **$23.77 trillion** | 99% | October 2025 | +| **GDP Growth Rate (QoQ)** | **3.8%** | 99% | October 2025 | +| **Annual Real GDP (2024)** | **$23.36 trillion** | 99% | October 2025 | + +**One-liner:** U.S. real GDP is $23.77 trillion (Q2 2025), growing at 3.8% quarterly. + +**Caveat:** GDP figures are revised three times after initial release; final revisions may adjust by ±0.5%. + +--- + +## The Big Picture + +[Gross Domestic Product (GDP)](https://www.bea.gov/data/gdp/gross-domestic-product) is the most comprehensive measure of economic output—the total value of all goods and services produced within the United States. The [Bureau of Economic Analysis (BEA)](https://www.bea.gov/), part of the U.S. Department of Commerce, is the authoritative source for this data. + +Real GDP (inflation-adjusted, [chained 2017 dollars](https://www.bea.gov/help/faq/520)) enables valid comparisons across time by removing the effects of price changes. This dataset covers: +- **Quarterly data**: Q1 1947 – Q2 2025 (314 observations) +- **Annual data**: 1929 – 2024 (96 observations) + +--- + +## Why This Number Matters + +GDP is the benchmark metric for: +- **Economic health**: Is the economy growing or shrinking? +- **Policy decisions**: Federal Reserve interest rates, fiscal policy +- **Business strategy**: Market sizing, demand forecasting, investment planning +- **International comparison**: How the U.S. economy compares globally + +A [1% change in GDP growth](https://fred.stlouisfed.org/series/A191RL1Q225SBEA) represents approximately $240 billion in annual economic output. + +--- + +## Current Data Highlights + +### Recent Performance +| Period | Real GDP | Growth Rate | Source | +|--------|----------|-------------|--------| +| Q2 2025 | [$23.77T](https://fred.stlouisfed.org/series/GDPC1) | +3.8% (QoQ) | [BEA](https://www.bea.gov/) | +| Q1 2025 | $23.55T | Baseline | [BEA](https://www.bea.gov/) | +| Full Year 2024 | [$23.36T](https://fred.stlouisfed.org/series/GDPCA) | +2.8% (YoY) | [BEA](https://www.bea.gov/) | + +### Historical Milestones +| Year | Real GDP | Context | +|------|----------|---------| +| 1929 | $1.19T | Pre-Depression peak | +| 1933 | $0.88T | Depression trough (-26%) | +| 1947 | $2.18T | Post-WWII era begins (quarterly data starts) | +| 2000 | $13.13T | Dot-com peak | +| 2009 | $14.42T | Great Recession trough | +| 2020 Q2 | $17.26T | COVID trough (-31.4% annualized) | +| 2025 Q2 | $23.77T | Current | + +--- + +## How the Number Is Calculated + +The BEA uses the [expenditure approach](https://www.bea.gov/resources/methodologies/nipa-handbook): + +**GDP = C + I + G + (X − M)** + +| Component | Description | Share of GDP | +|-----------|-------------|--------------| +| **C** | Personal consumption expenditures | ~68% | +| **I** | Gross private domestic investment | ~18% | +| **G** | Government consumption & investment | ~17% | +| **(X-M)** | Net exports (exports minus imports) | ~-3% | + +### Real vs. Nominal +- **Nominal GDP**: Measured in current prices (~$29T in 2024) +- **Real GDP** (this dataset): Adjusted for inflation using [chained 2017 dollars](https://www.bea.gov/help/faq/520) +- Real GDP enables valid comparisons across time periods + +--- + +## Revision Process + +GDP is revised multiple times as more complete data becomes available: + +| Release | Timing | Typical Revision | +|---------|--------|------------------| +| **Advance Estimate** | ~30 days after quarter end | Initial estimate | +| **Second Estimate** | ~60 days after quarter end | ±0.3-0.5 pp | +| **Third Estimate** | ~90 days after quarter end | ±0.1-0.2 pp | +| **Annual Revision** | September (5+ years) | May revise history | + +**Bottom line**: Current-quarter GDP is a provisional estimate. Use third estimates or annual revisions for precision. + +--- + +## Data Sources + +| Source | What It Provides | Link | +|--------|-----------------|------| +| [Bureau of Economic Analysis (BEA)](https://www.bea.gov/) | Official U.S. GDP (primary authority) | [GDP Data](https://www.bea.gov/data/gdp) | +| [FRED](https://fred.stlouisfed.org/) | Easy API access to BEA data | [GDPC1](https://fred.stlouisfed.org/series/GDPC1), [GDPCA](https://fred.stlouisfed.org/series/GDPCA) | + +**FRED Series IDs:** +- `GDPC1` - Real GDP, Quarterly, Seasonally Adjusted Annual Rate +- `GDPCA` - Real GDP, Annual, Not Seasonally Adjusted + +--- + +## Confidence Assessment + +| Component | Confidence | Explanation | +|-----------|------------|-------------| +| **Current Quarterly GDP** | 95% | Advance estimate; will be revised | +| **Third-Estimate GDP** | 99% | Final quarterly revision; highly reliable | +| **Historical GDP (5+ years)** | 99%+ | Fully revised; official government statistic | + +This is among the highest-confidence economic data available—produced by the U.S. government using rigorous methodology with full transparency. + +--- + +## Known Limitations + +1. **Revision lag**: Current-quarter figures are provisional estimates +2. **Base year**: Uses 2017 as reference (updated periodically by BEA) +3. **Pre-1947**: Quarterly data not available before 1947 +4. **Seasonal adjustment**: May mask genuine short-term fluctuations +5. **Real economy**: GDP measures production, not welfare or sustainability + +--- + +## How to Access the Data + +### Quick Access +```bash +# View quarterly data (1947-2025) +cat Real-GDP-Quarterly-1947-2025.csv + +# View annual data (1929-2024) +cat Real-GDP-Annual-1929-2024.csv +``` + +### Update to Latest +```bash +# Download latest quarterly data from FRED +curl -L "https://fred.stlouisfed.org/graph/fredgraph.csv?id=GDPC1" -o Real-GDP-Quarterly.csv + +# Download latest annual data from FRED +curl -L "https://fred.stlouisfed.org/graph/fredgraph.csv?id=GDPCA" -o Real-GDP-Annual.csv +``` + +--- + +## Supporting Documentation + +| Document | Description | +|----------|-------------| +| [US-GDP-1929-2025.md](./US-GDP-1929-2025.md) | Full dataset documentation with historical context | +| [source.md](./source.md) | Detailed methodology and provenance | +| [Real-GDP-Quarterly-1947-2025.csv](./Real-GDP-Quarterly-1947-2025.csv) | Quarterly data (314 observations) | +| [Real-GDP-Annual-1929-2024.csv](./Real-GDP-Annual-1929-2024.csv) | Annual data (96 observations) | + +--- + +## Research Metadata + +| Attribute | Value | +|-----------|-------| +| **Research Date** | October 2025 | +| **Researcher** | Kai (10-agent parallel synthesis) | +| **Method** | Multi-source corroboration via Perplexity, Claude, Gemini | +| **Confidence Level** | 99% (official government statistic) | +| **Known Gaps** | Pre-1947 quarterly data unavailable | + +--- + +## Changelog + +| Date | Change | Reason | +|------|--------|--------| +| **December 2025** | Added SUMMARY.md with executive overview | Standardizing Substrate datasets to "Answer First" schema | +| **October 2025** | Initial dataset creation | Comprehensive U.S. GDP data collection | + +--- + +## External Resources + +- [BEA GDP FAQ](https://www.bea.gov/help/faq/520) - Methodology questions +- [BEA NIPA Handbook](https://www.bea.gov/resources/methodologies/nipa-handbook) - Full methodology +- [BEA Release Schedule](https://www.bea.gov/news/schedule) - Upcoming GDP releases +- [FRED GDP Series](https://fred.stlouisfed.org/categories/18) - All GDP-related data diff --git a/Data/US-Inflation/SUMMARY.md b/Data/US-Inflation/SUMMARY.md new file mode 100644 index 0000000..3cfd36b --- /dev/null +++ b/Data/US-Inflation/SUMMARY.md @@ -0,0 +1,205 @@ +# U.S. Inflation (CPI): Executive Summary + +--- + +## 🎯 BEST ESTIMATE + +| Metric | Value | Confidence | Last Updated | +|--------|-------|------------|--------------| +| **CPI-U Index (August 2025)** | **323.4** | 99% | October 2025 | +| **Year-over-Year Inflation** | **~2.5%** | 99% | October 2025 | +| **Fed Target** | **2.0%** | Reference | - | + +**One-liner:** U.S. inflation is ~2.5% (YoY), with CPI index at 323.4 (1982-84=100 baseline). + +**Caveat:** CPI measures urban consumers only (~93% of population); regional variation may differ significantly. + +--- + +## The Big Picture + +The [Consumer Price Index (CPI)](https://www.bls.gov/cpi/) is the primary measure of inflation in the United States—tracking changes in the price level of a basket of consumer goods and services. The [Bureau of Labor Statistics (BLS)](https://www.bls.gov/) produces this data monthly. + +**What the current numbers mean:** +- A CPI of 323.4 means that goods costing $100 in 1982-84 now cost $323.40 +- At 2.5% annual inflation, prices double approximately every 28 years +- Current inflation is near the [Federal Reserve's 2% target](https://www.federalreserve.gov/faqs/economy_14400.htm) + +--- + +## Why This Number Matters + +Inflation affects virtually every economic decision: + +- **Wages**: [Cost-of-living adjustments (COLAs)](https://www.ssa.gov/oact/cola/colaseries.html) are tied to CPI +- **Savings**: Determines whether your money gains or loses purchasing power +- **Interest Rates**: The [Federal Reserve](https://www.federalreserve.gov/) adjusts rates based on inflation +- **Contracts**: Many business and government contracts escalate with CPI +- **Policy**: Trillions in Social Security, Medicare, and tax brackets adjust with CPI + +A [1% change in CPI](https://www.bls.gov/cpi/) affects billions of dollars in annual adjustments. + +--- + +## Current Data Highlights + +### Recent Readings +| Period | CPI Index | YoY Inflation | Source | +|--------|-----------|---------------|--------| +| August 2025 | [323.4](https://fred.stlouisfed.org/series/CPIAUCSL) | ~2.5% | [BLS](https://www.bls.gov/cpi/) | +| June 2022 | 296.3 | 9.1% (peak) | [BLS](https://www.bls.gov/cpi/) | +| 1982-84 Avg | 100.0 | Baseline | [BLS](https://www.bls.gov/cpi/) | +| January 1947 | 21.5 | First obs. | [BLS](https://www.bls.gov/cpi/) | + +### Long-Term Trend +| Period | Average Annual Inflation | +|--------|-------------------------| +| 1947-2025 (Full) | ~3.5% | +| 1990-2019 (Pre-COVID) | ~2.4% | +| 2021-2023 (COVID Surge) | ~6.0% | +| 2024-2025 (Current) | ~2.5% | + +--- + +## How the Number Is Calculated + +The BLS uses a [Laspeyres price index](https://www.bls.gov/opub/hom/cpi/calculation.htm): + +**CPI = (Cost of basket today / Cost of basket in base period) × 100** + +### The Market Basket +| Category | Weight | Examples | +|----------|--------|----------| +| **Housing** | ~34% | Rent, utilities, furnishings | +| **Food** | ~14% | Groceries, restaurants | +| **Transportation** | ~16% | Vehicles, gas, insurance | +| **Medical Care** | ~9% | Healthcare, drugs, insurance | +| **Recreation** | ~5% | Entertainment, sports, hobbies | +| **Education/Communication** | ~7% | Tuition, phones, internet | +| **Other** | ~15% | Apparel, personal care | + +**Data Collection:** +- ~80,000 prices collected monthly +- 75 urban areas across the U.S. +- Weights updated every 2 years from [Consumer Expenditure Survey](https://www.bls.gov/cex/) + +--- + +## Key Inflation Rates to Know + +| Measure | What It Is | FRED ID | +|---------|-----------|---------| +| **Headline CPI** | All items | [CPIAUCSL](https://fred.stlouisfed.org/series/CPIAUCSL) | +| **Core CPI** | Excludes food & energy | [CPILFESL](https://fred.stlouisfed.org/series/CPILFESL) | +| **PCE** | Fed's preferred measure | [PCEPI](https://fred.stlouisfed.org/series/PCEPI) | +| **Core PCE** | Fed's key target | [PCEPILFE](https://fred.stlouisfed.org/series/PCEPILFE) | + +**Why Core?** Food and energy prices are volatile. Core inflation shows underlying trends. + +**Why PCE?** The Federal Reserve targets [PCE inflation](https://fred.stlouisfed.org/series/PCEPI) rather than CPI because it accounts for substitution effects. + +--- + +## Historical Inflation Episodes + +| Period | Peak Inflation | Cause | +|--------|---------------|-------| +| [1970s Stagflation](https://fred.stlouisfed.org/series/CPIAUCSL) | 14.8% (1980) | Oil shocks, monetary policy | +| [Volcker Shock](https://www.federalreserve.gov/aboutthefed/bios/board/volcker.htm) | Fed raised rates to 20%+ | Broke inflation cycle | +| [Great Moderation](https://www.federalreserve.gov/pubs/ifdp/2005/835/default.htm) | 2-3% (1990s-2000s) | Credible monetary policy | +| [Great Recession](https://fred.stlouisfed.org/series/CPIAUCSL) | Brief deflation (2009) | Financial crisis | +| [COVID Surge](https://fred.stlouisfed.org/series/CPIAUCSL) | 9.1% (June 2022) | Supply chain, stimulus | +| **Current** | ~2.5% (2025) | Fed tightening working | + +--- + +## Confidence Assessment + +| Component | Confidence | Explanation | +|-----------|------------|-------------| +| **Current CPI Index** | 99% | Official government statistic, gold standard | +| **YoY Inflation Rate** | 99% | Direct calculation from CPI data | +| **Historical Data** | 99%+ | Fully verified, minimal revisions | + +This is the most reliable inflation data available—produced by the U.S. government with rigorous methodology and complete transparency. + +--- + +## Known Limitations + +1. **Substitution bias**: Fixed basket doesn't fully capture when consumers switch to cheaper alternatives +2. **Quality adjustment**: Hard to account for product quality improvements over time +3. **New products**: Slow to incorporate new goods (smartphones took years) +4. **Geographic variation**: National average masks significant regional differences +5. **Population**: Covers urban consumers only (~93% of U.S.) + +--- + +## How to Calculate Inflation + +### Year-over-Year Rate +``` +Inflation Rate = ((CPI_now - CPI_1year_ago) / CPI_1year_ago) × 100 +``` + +### Convert Dollars Across Time +``` +Real_value = Nominal_value × (CPI_target_year / CPI_original_year) +``` + +Example: $100 in 1984 equals ~$323 in 2025 purchasing power. + +--- + +## Data Sources + +| Source | What It Provides | Link | +|--------|-----------------|------| +| [Bureau of Labor Statistics](https://www.bls.gov/cpi/) | Official CPI (primary authority) | [CPI Home](https://www.bls.gov/cpi/) | +| [FRED](https://fred.stlouisfed.org/) | Easy API access to BLS data | [CPIAUCSL](https://fred.stlouisfed.org/series/CPIAUCSL) | + +**Quick Access:** +```bash +# Download latest CPI data from FRED +curl -L "https://fred.stlouisfed.org/graph/fredgraph.csv?id=CPIAUCSL" -o CPI-latest.csv +``` + +--- + +## Supporting Documentation + +| Document | Description | +|----------|-------------| +| [US-Inflation-CPI-1947-2025.md](./US-Inflation-CPI-1947-2025.md) | Full dataset documentation | +| [source.md](./source.md) | Detailed methodology | +| [CPI-US-Monthly-1947-2025.csv](./CPI-US-Monthly-1947-2025.csv) | Monthly data (945 observations) | + +--- + +## Research Metadata + +| Attribute | Value | +|-----------|-------| +| **Research Date** | October 2025 | +| **Researcher** | Kai | +| **Method** | Direct BLS/FRED data collection | +| **Confidence Level** | 99% (official government statistic) | +| **Known Gaps** | Pre-1947 data uses different methodology | + +--- + +## Changelog + +| Date | Change | Reason | +|------|--------|--------| +| **December 2025** | Added SUMMARY.md with executive overview | Standardizing Substrate datasets to "Answer First" schema | +| **October 2025** | Initial dataset creation | Comprehensive U.S. CPI data collection | + +--- + +## External Resources + +- [BLS CPI FAQ](https://www.bls.gov/cpi/questions-and-answers.htm) - Common questions +- [BLS Handbook of Methods](https://www.bls.gov/opub/hom/cpi/) - Full methodology +- [Fed Inflation Target](https://www.federalreserve.gov/faqs/economy_14400.htm) - Why 2%? +- [CPI Inflation Calculator](https://www.bls.gov/data/inflation_calculator.htm) - BLS tool diff --git a/Data/US-Presidential-Approval/SUMMARY.md b/Data/US-Presidential-Approval/SUMMARY.md new file mode 100644 index 0000000..9de085b --- /dev/null +++ b/Data/US-Presidential-Approval/SUMMARY.md @@ -0,0 +1,198 @@ +# U.S. Presidential Approval Ratings: Executive Summary + +--- + +## 🎯 BEST ESTIMATE + +| Metric | Value | Confidence | Last Updated | +|--------|-------|------------|--------------| +| **Trump Approval (Nov 2025)** | **36-44%** (avg ~41%) | 95% | November 2025 | +| **Trump Net Approval** | **-13 points** | 95% | November 2025 | +| **Historical Dataset** | **12,479 polls** (1937-2025) | 99% | November 2025 | + +**One-liner:** Trump's approval averages ~41% (net -13); dataset covers 12,479 polls since 1937. + +**Caveat:** Polling variation of 3-7 points across organizations; use aggregates, not single polls. + +--- + +## The Big Picture + +Presidential approval ratings are the primary measure of public confidence in the president. [Gallup](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) has tracked this since 1937 using a consistent question: *"Do you approve or disapprove of the way [President] is handling his job as President?"* + +This dataset contains: +- **12,479 individual polls** spanning 87+ years +- **14 presidents** from FDR through Trump (second term) +- **Multiple pollsters** for cross-validation + +--- + +## Why This Number Matters + +Presidential approval is a leading indicator for: + +- **Legislative success**: High approval = political capital for agenda +- **Reelection chances**: Presidents above 50% almost always win reelection +- **Market confidence**: Investor and business sentiment +- **Governing ability**: Approval affects congressional cooperation +- **Historical legacy**: Approval shapes how presidents are remembered + +--- + +## Current President: Donald Trump (Second Term) + +### November 2025 Snapshot +| Metric | Value | Trend | +|--------|-------|-------| +| **Approval** | 36-44% (avg ~41%) | Declining | +| **Disapproval** | 49-62% (avg ~54%) | Rising | +| **Net Approval** | -13 points | Down from -9 in Oct | +| **Peak Approval** | 52% (Jan 2025) | -11 points from peak | + +### 2025 Trajectory +| Period | Approval Range | Context | +|--------|----------------|---------| +| Jan-Feb | 48-52% | Honeymoon period | +| Mar-May | 44-48% | Post-honeymoon decline | +| Jun-Aug | 44-46% | Summer plateau | +| Sep-Nov | 36-44% | Government shutdown impact | + +**Key Factors:** +- Government shutdown began October 1, 2025 +- Republican approval down 12 points (91% → 79%) since inauguration +- Economic approval underwater: Economy -17.6, Inflation -27.5 + +--- + +## Historical Reference Points + +### Highest Approval Ratings Ever +| President | Approval | Date | Context | +|-----------|----------|------|---------| +| [George W. Bush](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | 90% | Sept 2001 | Post-9/11 rally | +| [Harry Truman](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | 87% | June 1945 | WWII victory | +| [John F. Kennedy](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | 83% | April 1961 | Early presidency | + +### Lowest Approval Ratings Ever +| President | Approval | Date | Context | +|-----------|----------|------|---------| +| [Harry Truman](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | 22% | Feb 1952 | Korean War | +| [Richard Nixon](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | 24% | Aug 1974 | Watergate | +| [George W. Bush](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | 25% | Oct 2008 | Financial crisis | +| [Jimmy Carter](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | 28% | June 1979 | Economic crisis | + +### Typical Approval Ranges +| Range | Interpretation | +|-------|---------------| +| **60-80%** | Honeymoon or crisis rally | +| **50-60%** | Strong; likely reelection | +| **40-50%** | Mixed; competitive | +| **30-40%** | Weak; difficult governance | +| **Below 30%** | Historical crisis territory | + +--- + +## How to Interpret Polling Data + +### Net Approval +``` +Net Approval = Approval % - Disapproval % +``` +- **Positive** (+5 or higher): More approve than disapprove +- **Around zero**: Evenly divided +- **Negative** (-5 or lower): More disapprove than approve + +### Polling Variation +Different pollsters show 3-7 point variation due to: +- Sample type (adults vs. registered vs. likely voters) +- Methodology (phone vs. online) +- Question wording and order +- Timing within news cycle + +**Best practice**: Use averages from aggregators like [RealClearPolitics](https://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html) or [FiveThirtyEight](https://projects.fivethirtyeight.com/polls/approval/donald-trump/) (when available). + +--- + +## Data Sources + +| Source | What It Provides | Link | +|--------|-----------------|------| +| [Gallup](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | Gold standard since 1937 | [Historical Trends](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) | +| [American Presidency Project](https://www.presidency.ucsb.edu/statistics/data/presidential-job-approval) | UC Santa Barbara archive | [Approval Data](https://www.presidency.ucsb.edu/statistics/data/presidential-job-approval) | +| [Roper Center](https://ropercenter.cornell.edu/) | Cornell poll archive | [Research Access](https://ropercenter.cornell.edu/) | +| [RealClearPolitics](https://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html) | Current poll aggregation | [Trump Approval](https://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html) | + +### Primary Dataset Source +This Substrate dataset aggregates from [Lorenzo Ruffino's research compilation](https://github.com/lorenzo-ruffino/approval_rate_usa_president) which includes: +- 15+ professional polling organizations +- Consistent data structure for cross-temporal analysis +- Open source with community validation + +--- + +## Confidence Assessment + +| Component | Confidence | Explanation | +|-----------|------------|-------------| +| **Historical Data (1937-2020)** | 99% | Fully validated, Gallup gold standard | +| **Recent Polls (2021-2025)** | 95% | Multiple organizations, subject to revision | +| **Current Month** | 90% | Polling variation; use aggregates | + +Presidential approval data is among the most reliable polling data available due to: +- 87+ years of consistent methodology +- Multiple cross-validating sources +- Scientific sampling standards +- Institutional validation + +--- + +## Known Limitations + +1. **Polling variation**: 3-7 point spread across organizations +2. **Sample composition**: Adults vs. registered vs. likely voters differ +3. **Methodology changes**: Online polling introduced post-2000 +4. **Response rates**: Declining over time, may affect representativeness +5. **Timing sensitivity**: Polls capture specific moments; events shift opinion + +--- + +## Supporting Documentation + +| Document | Description | +|----------|-------------| +| [README.md](./README.md) | Full dataset documentation | +| [Trump-Approval-Analysis-2025.md](./Trump-Approval-Analysis-2025.md) | Current president analysis | +| [Historical-Approval-Polls-1937-2024.csv](./Historical-Approval-Polls-1937-2024.csv) | 12,479 individual polls | +| [Historical-Net-Approval-First-Terms.csv](./Historical-Net-Approval-First-Terms.csv) | First-term comparison data | +| [Trump-Approval-2025.csv](./Trump-Approval-2025.csv) | Current year polling data | + +--- + +## Research Metadata + +| Attribute | Value | +|-----------|-------| +| **Dataset Coverage** | 1937-2025 (87+ years) | +| **Total Polls** | 12,479 individual polls | +| **Presidents Covered** | 14 (FDR through Trump) | +| **Update Frequency** | Continuous (as polls publish) | +| **Confidence Level** | 95-99% (professional polling data) | + +--- + +## Changelog + +| Date | Change | Reason | +|------|--------|--------| +| **December 2025** | Added SUMMARY.md with executive overview | Standardizing Substrate datasets to "Answer First" schema | +| **November 2025** | Updated Trump 2025 data | Current polling integration | +| **October 2025** | Initial dataset creation | Comprehensive approval data collection | + +--- + +## External Resources + +- [Gallup Presidential Approval Center](https://news.gallup.com/poll/116677/presidential-approval-ratings-gallup-historical-statistics-trends.aspx) - Historical data and analysis +- [RealClearPolitics Approval Tracker](https://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html) - Current aggregates +- [American Presidency Project](https://www.presidency.ucsb.edu/) - UC Santa Barbara archive +- [Roper Center](https://ropercenter.cornell.edu/) - Cornell polling archive