# Document Creation vs Access Rates: Quantifying the Utilization Gap **Research Date:** November 10, 2025 **Context:** Analysis of document creation (149 billion words/day globally) versus actual consumption rates **Objective:** Quantify the gap between document CREATION and document CONSUMPTION --- ## Executive Summary Research reveals a massive utilization gap between document creation and consumption: - **41-80%** of stored documents are **never accessed** after creation - **60-73%** of enterprise data goes **completely unused** for analytics or business purposes - **55%** of organizational data remains **"dark data"** (created but never illuminated) - **33%** of all content is **ROT data** (Redundant, Obsolete, Trivial) The document creation engine is massively overproducing relative to actual consumption, representing substantial waste in storage costs, employee time, and organizational efficiency. --- ## 1. Document Access Statistics ### Never Opened After Creation **NetApp 2024 Data:** - **41%** of stored data is never accessed (baseline estimate) - **70-80%** never accessed (revised estimates in some enterprise contexts) - Data "waste" represents significant portion of enterprise storage **Enterprise Data Utilization (Forrester):** - **60-73%** of all data within enterprises goes unused for analytics - **68%** of data available to enterprises goes unleveraged (Seagate survey of 1,500 global business leaders) - **66%** of organizations report at least half their enterprise data remains "dark" (Google Cloud 2024 Data and AI Trends Report) ### ROT Data (Redundant, Obsolete, Trivial) **Industry Benchmarks:** - **33%** of all content in unmanaged servers is ROT data (conservative estimate) - **Up to 70%** ROT in poorly managed environments - **85%** of all content stored represents ROT data (Veritas Global Databerg Report - extreme case) - ROT data represents wasted storage and maintenance costs ### Average View Counts Per Document **Direct Statistics:** - Limited published data on exact view counts per document - Proxy metric: **35%** of customers struggle with finding reliable information quickly in knowledge bases - **57%** of customer support calls come from customers who visited website first (indicating failed document/knowledge discovery) ### Single-Author Documents Never Shared/Viewed **Academic Collaboration as Proxy:** - Multi-authored papers have **higher citation rates** than single-authored papers - Increasing trend toward collaboration: international collaboration in S&E articles grew from **19% (2012)** to **23% (2022)** - Single-author articles show lower engagement and utility **Enterprise Context:** - **70%** of Google Workspace users collaborate on shared documents weekly - **Over 60%** of Workspace users use @-mentions to tag collaborators - Inverse suggests **30-40%** of documents may remain single-author/unshared --- ## 2. Google Workspace / Microsoft 365 Statistics ### Google Workspace (2024) **Document Creation Volume:** - **2 billion+** new Docs, Sheets, and Slides created monthly - **20 million+** comments made per day on documents - **3 billion** users globally (10+ million paying organizations) **Collaboration Statistics:** - **70%** of users collaborate on shared documents weekly - **Over 60%** use @-mentions to tag collaborators - **94.44%** use Google Drive monthly - **44%** market share for office suite technology **Collaboration Impact:** - **31%** reduction in document turnaround time with real-time collaboration - Inverse: **30%** of users may NOT collaborate weekly (single-author pattern) ### Microsoft 365 / SharePoint / OneDrive (2024) **User Base:** - **200+ million** monthly active users (SharePoint Online + OneDrive for Business) - **500+ trillion** distinct files and documents managed monthly **Collaboration Metrics:** - **85%** of organizations report improved collaboration and communication - **85%** boost in employee engagement with SharePoint-enabled intranets - **60%** of SharePoint users leverage automation workflows **Efficiency Improvements:** - **30%** reduction in email-based file sharing - **15%** reduction in time spent on document management tasks **Document Sharing vs Private:** - Specific private vs. shared file percentages **not publicly disclosed** by Google or Microsoft - Files are private by default until manually shared (suggests significant private file population) --- ## 3. Knowledge Base Systems (Confluence, Notion, Wiki Platforms) ### Dark Data Statistics **Overall Dark Data:** - **55%** of data stored by organizations is dark data - **40-90%** dark data estimates depending on industry - **90%** of business executives agree organizations must extract value from unstructured data to succeed ### Search Hit Rates & Findability **Search Effectiveness Challenges:** - **35%** of customers struggle with finding reliable information quickly - **57%** of support calls come from customers who visited website first (search failure indicator) - Knowledge workers spend **2.5 hours per day (30% of workday)** searching for information **Knowledge Base Adoption:** - **91%** of customers would use a knowledge base if available and tailored to needs - **70%** of customers expect companies to offer self-service portal - **51%** prefer technical support through knowledge base - **Only 31%** of companies have comprehensive knowledge management strategy **Support Agent Efficiency:** - **20-25%** time saved when agents use knowledge bases - Implies effective knowledge bases improve retrieval, but gaps remain significant ### Confluence/Notion Page View Statistics **Confluence Insights:** - Page view tracking available in Confluence Cloud (Standard, Premium, Enterprise subscriptions) - Displays views and unique viewers per page - **Orphaned pages:** Pages without incoming links (unlikely to be found through natural navigation) - No published industry benchmarks on percentage of orphaned pages **Search Effectiveness:** - Third-party apps ("Page Views", "Page View Tracker") needed for enhanced tracking - Suggests native analytics insufficient for comprehensive utilization analysis --- ## 4. Document Lifecycle ### Creation → First View Timing **Active Data Period:** - **30-90 days:** Modern data typically remains actively used before becoming less useful or redundant - After 90 days, new data flood makes existing data "less useful or even redundant" **Document Processing Metrics:** - With DMS: **30 seconds** average time to store or retrieve document - Without DMS: **2.5 hours per day** spent by employees on data entry (versus <30 minutes with DMS) ### Active vs Archived vs Abandoned **Microsoft 365 Data Retention:** - **90-day** limited-function account period after subscription ends before data deletion - Suggests 90-day threshold as common retention/archival decision point **Document Abandonment Patterns:** - **25%** of documents end up lost without ECM strategy - **50%** of knowledge worker time spent creating and preparing documents - High creation volume + low access rates = massive abandonment ### Version History Engagement **Collaboration Frequency:** - **Real-time collaboration** reduces turnaround time by 31% - Active documents see frequent edits and views - No specific statistics on version history review rates published **Backup Duplication as Proxy:** - For daily backups with 1% change rate retained for 30 backups: **99%** of every backup is duplicated - Suggests extremely low re-access of older versions --- ## 5. Collaboration Rates: Multi-User vs Single-Author ### Multi-User Document Engagement **Google Workspace:** - **70%** of users collaborate on shared documents weekly - **20 million+** daily comments (high engagement signal) - **Over 60%** use @-mentions for collaboration **Microsoft 365/SharePoint:** - **85%** report improved collaboration - **60%** improvement in team collaboration due to better document sharing tools - **54%** of companies report improved employee collaboration from digitization ### Single-Author Documents **Inverse Calculation:** - If **70%** collaborate weekly, **30%** may not (potential single-author population) - Academic context: Multi-authored papers show higher quality and citation rates than single-authored - Single-author documents likely have **lower access rates** and **higher abandonment risk** ### Sharing Statistics **Private vs Shared Files:** - No published Google/Microsoft statistics on private vs. shared file ratios - Files are **private by default** until manually shared - Suggests substantial private file population with limited access --- ## 6. Industry Benchmarks & ROI Context ### Document Management System ROI **Return on Investment:** - **404%** ROI over five years with DMS implementation - **$4.80** return for every $1 invested in DMS - **3x** ROI within first year of implementation - **59%** of businesses break even within 1 year - **26%** achieve excellent ROI within 6 months or less ### Time Savings **Efficiency Gains:** - **98 work hours per month** saved with effective DMS - **21%** loss of organizational productivity from manual document management - **30%** of workday spent searching for information (without proper systems) - **30 seconds** to retrieve document (with DMS) vs. much longer manual searches ### Cost Savings **Operational Efficiency:** - **$20,000** annual savings from eliminating paper-based processes - **30-40%** reduction in operational costs through workflow automation - **10%** reduction in overall operational expense for document processing - **30%** fewer errors with document management systems ### File Duplication/Redundancy **Deduplication Potential:** - **50-60%** average storage savings from deduplication (general file shares) - **30-50%** savings for user documents - **70-80%** savings for software development datasets - **33%** of organizations achieve <10x deduplication reduction - **48%** achieve 10-20x reduction - **18%** achieve 21-100x reduction --- ## Key Deliverables Summary ### Percentage Accessed Within Time Windows | Time Window | Access Rate | Never Accessed Rate | |-------------|-------------|---------------------| | **7 days** | Estimated 20-30% | 70-80% | | **30 days** | Estimated 30-40% | 60-70% | | **90 days** | Estimated 40-50% | 50-60% | | **Lifetime** | 20-60% (varies by context) | **41-80%** | *Note: 7/30/90-day breakdowns are estimates based on 30-90 day "active data period" research and overall never-accessed rates.* ### Percentage Never Accessed (Except by Creator) - **Conservative Estimate:** 41% (NetApp baseline) - **Mid-Range Estimate:** 55% (dark data average) - **High-End Estimate:** 70-80% (revised NetApp, specific contexts) - **Enterprise Data Unused:** 60-73% for analytics/business purposes ### Collaboration Rates | Document Type | Percentage | |---------------|------------| | **Multi-user collaborative documents** | 70% (Google Workspace weekly collaboration rate) | | **Single-author/unshared documents** | 30% (inverse of collaboration rate) | | **Documents with improved collaboration** | 85% (with SharePoint/DMS implementation) | ### Industry Benchmark Context - **ROT Data:** 33% baseline (up to 70-85% in poorly managed environments) - **Dark Data:** 55% average (40-90% range by industry) - **Document Duplication:** 50-60% redundancy average - **Time Spent Searching:** 30% of workday (2.5 hours/day) - **Documents Lost (no ECM):** 25% --- ## Analysis: The Massive Creation-Consumption Gap ### The Core Problem **149 billion words created daily** (from original context) versus: - **41-80% never accessed** = 61-119 billion words/day created but never consumed - **60-73% unused for business** = 89-109 billion words/day providing zero organizational value - **55% dark data** = 82 billion words/day disappearing into darkness ### Structural Causes 1. **Creation Friction < Consumption Friction** - Easy to create documents (2 billion/month in Google Workspace alone) - Hard to find documents (30% of workday spent searching) - Result: Overproduction relative to discoverability 2. **Private by Default Architecture** - Files private until manually shared - 30% of users don't collaborate weekly - Single-author documents have lower utility 3. **Lack of Knowledge Management Strategy** - Only 31% have comprehensive strategy - 25% of documents lost without ECM - Orphaned pages with no incoming links 4. **Short Active Lifecycle** - 30-90 days before data becomes "less useful" - Flood of new data buries existing content - 99% duplication in backup versions ### Business Impact **Wasted Resources:** - Storage costs for 41-80% never-accessed files - Employee time: 50% spent creating/preparing documents (25% end up lost) - Search inefficiency: 2.5 hours/day seeking information **ROI Opportunity:** - **404%** ROI with proper DMS implementation - **98 hours/month** saved per organization - **30-40%** operational cost reduction - **$20,000** annual savings from process optimization --- ## Recommendations ### Immediate Actions 1. **Implement Comprehensive Knowledge Management Strategy** (only 31% have one) - Reduce 55% dark data through better organization and searchability - Target 70% collaboration rate (current Google Workspace benchmark) 2. **Deploy Document Management Systems** - Achieve 404% ROI over 5 years - Reduce search time from 2.5 hours/day to 30 seconds per retrieval - Cut operational costs by 30-40% 3. **Enable Deduplication & ROT Cleanup** - Target 50-60% storage savings - Reduce 33% ROT baseline through active archival policies - Implement 90-day retention/archival decision points 4. **Improve Findability & Search Effectiveness** - Address 35% customer struggle with finding information - Reduce 57% support call rate from failed website searches - Implement connected, searchable knowledge architecture ### Long-Term Transformation 1. **Shift from Creation-Centric to Consumption-Centric** - Measure document utility, not just volume - Incentivize reuse over recreation - Default to collaboration over single-author 2. **Active Data Lifecycle Management** - Auto-archive after 90-day active period - Surface frequently accessed content - Deprecate orphaned pages 3. **Cultural Change: Quality over Quantity** - 149 billion words/day is too much if 60-73% is unused - Better curation reduces creation burden - Collaboration multiplies document utility --- ## Sources & Data Quality Notes **Primary Data Sources:** - NetApp 2024 Data Complexity Report - Forrester Research on Enterprise Data - Google Cloud 2024 Data and AI Trends Report - Seagate Technology Global Business Leader Survey (1,500 respondents) - Veritas Global Databerg Report - Google Workspace 2024 Statistics - SharePoint/Microsoft 365 2024 Usage Data - Various document management industry reports and ECM statistics **Data Quality:** - 7/30/90-day access breakdowns are **estimates** (specific metrics not widely published) - Private vs. shared file ratios **not disclosed** by Google/Microsoft - Confluence/Notion orphaned page percentages **not standardized** across industry - Academic collaboration rates used as **proxy** for enterprise single-author behavior **Confidence Levels:** - **High confidence:** Overall never-accessed rates (41-80%), dark data (55%), ROT data (33%) - **Medium confidence:** Collaboration rates (70%), time-window estimates (30-90 days) - **Low confidence:** Exact private vs. shared ratios, specific platform orphaned page percentages --- ## Conclusion The document creation-consumption gap is substantial and quantifiable: - **At least 41%** of documents are never accessed after creation (conservative) - **Up to 80%** in poorly managed environments (high-end estimate) - **60-73%** of enterprise data provides **zero business value** - **55%** remains "dark" despite creation investment **The utilization gap represents massive inefficiency:** Organizations are creating 149 billion words/day globally, but 61-119 billion words/day (41-80%) disappear into the void, consuming storage, employee time, and organizational focus while providing no return on investment. **The opportunity:** Proper document management systems deliver 404% ROI by addressing this gap—not by creating more documents, but by making existing documents findable, usable, and valuable. The problem isn't document creation capability. **The problem is document consumption infrastructure.**