Data Universe
SN13Collects and organises social media data from across the web for AI training
Web-scale social media data, decentralized. Miners collectively scrape and store real-time data from X, Reddit, and YouTube, forming what aims to be the world's largest open-source social media dataset. Validators enforce quality through continuous sampling, authenticity checks, and reputation-based incentives.
// The world's largest open social media dataset.
Data Universe is a Macrocosmos subnet where miners scrape, store, and serve social media data at massive scale. The network creates a distributed data lake of social media content that businesses, researchers, and AI developers can tap into for market research, sentiment analysis, and AI training.
The simple version: Imagine a library that collects every public social media post in real time. Instead of one company controlling the archive (and charging whatever they want for access), hundreds of independent collectors maintain the library, and automated inspectors verify everything is real. Data Universe is that library.
Centralized equivalent: Think Brandwatch, Sprout Social's listening tools, or Twitter's now-expensive Enterprise API, but decentralized, open, and without single-vendor lock-in.
How it works:
- Miners scrape and store data from X, Reddit, and YouTube. Rewards are based on data volume, freshness (linear decay over 30 days), source diversity, and alignment with demand from the Gravity product. Miners who upload to public S3 storage for comprehensive validation earn bonus credits. Credibility below 80% results in exponentially reduced rewards.
- Validators maintain a complete map of all data across the network. They verify authenticity through probabilistic sampling and comprehensive S3 validation every 6 hours. Each miner's credibility is tracked using an exponential moving average. Duplicated data across many miners is worth less, incentivizing unique coverage.
- The problem it solves: Social media data is expensive and centralized. Twitter/X's API pricing pushed most researchers and small companies out. AI training requires diverse, fresh social data, and traditional providers create vendor lock-in.
- The opportunity: The social media analytics market exceeds $15 billion. Every AI company, market researcher, and brand strategist needs real-time social data.
- The Bittensor advantage: 212 distributed miners scrape across platforms simultaneously, creating broader coverage than any single scraper. The credibility system (exponential penalty below 80%) aggressively punishes bad data, maintaining quality without centralized oversight.
- Traction signals: 2,001 commits across 14 contributors with 6-11 commits per week. Led by William Squires (CEO) and Steffen Cruz (CTO) at Macrocosmos. 68 GitHub stars. 212 active miners. Gravity product creates organic demand. Dev activity score: 7.2/10.
Category: Data Scraping and Archival | Centralized Competitor: Brandwatch, Sprout Social, Twitter Enterprise API, Meltwater, Talkwalker
Data Universe's competitive advantage is the Gravity product. Gravity creates organic demand by matching data buyers with the miner pool. When a buyer wants specific data (say, all Reddit posts about a topic), the demand signal flows through Gravity to miners, who earn up to 5x multipliers for job-matched data. This creates a flywheel: more buyers attract more miners, more miners produce more data, more data attracts more buyers.
Mechanism:
The incentive design is sophisticated. Data freshness decays linearly over 30 days (then becomes worthless), preventing miners from sitting on old data. Scarcity scoring means data that many miners hold is worth less, rewarding unique coverage. And the credibility multiplier (credibility^2.5) makes it always worse to fake data than to honestly report a smaller store.
The S3 validation mechanism adds another quality layer: miners who upload datasets for comprehensive checking earn bonuses equivalent to 90-120MB of perfect data. This optional verification incentivizes transparency.
The codebase is substantial: 2,001 commits across 14 contributors with consistent 6-11 commits per week. The Macrocosmos team is one of the more established development groups in Bittensor.
Market metrics are healthy. At 41,340 TAO market cap with 2,847 holders, Data Universe is solidly mid-tier. Gini of 0.625 is moderate. The 90-day return of +15% is positive.
- Platform dependency: Scraping social media platforms may violate their terms of service. Platform countermeasures could disrupt the supply.
- Regulatory risk: Mass social media data collection faces increasing regulatory scrutiny under GDPR, CCPA, and similar frameworks.
- Moderate outflows: -194 TAO net 7-day flow despite positive 90-day trajectory.
- Data commoditization: Raw social media data is increasingly available from multiple sources. The premium is in enrichment and analysis, not raw scraping.