ReadyAI
SN33Analyses conversations to extract structured knowledge and insights from dialogue
A low-cost, decentralized data structuring pipeline. Miners process raw text data and generate semantic tags and annotations using LLMs. Validators establish ground truth by tagging the full data source, then score miner submissions based on cosine distance between miner tags and ground truth tags.
// Teaching AI to hold a conversation.
ReadyAI (the Conversation Genome Project) is a subnet that turns raw, unstructured text data into structured, AI-ready data. Miners receive data windows from validators, process them through LLMs to generate semantic tags and annotations, and return structured metadata. Validators score submissions by comparing miner-generated tags against their own ground truth tags.
The simple version: You have a pile of raw text, maybe podcast transcripts, documents, or web scraped content. ReadyAI's miners tag and annotate that text so AI applications can actually use it. Think of it as turning a messy filing cabinet into a perfectly organized database.
Centralized equivalent: Think Scale AI's data labeling or Labelbox, but using LLMs instead of human annotators and decentralized competition instead of managed workforces.
How it works:
- Miners receive data windows (subsets of a larger data source) from validators. They process these windows using LLMs (GPT-4o by default, with support for Anthropic and local models) to generate semantic tags, annotations, and embeddings. Three miners receive the same window for competitive evaluation.
- Validators pull raw data, generate an overview of ground truth tags for the full data source, create data windows for miners, and score submissions. Scoring uses cosine distance between miner tag embeddings and the ground truth tag neighborhood. The final score weights top 3 unique tag scores (55%), overall mean (25%), median (10%), and top score (10%).
- The problem it solves: AI applications need structured data, but structuring raw data is expensive and slow. Human annotators are inconsistent and don't scale. LLMs are now cheaper and more accurate than human annotators for many data structuring tasks.
- The opportunity: Every business sitting on unstructured data (documents, transcripts, customer interactions) needs a pipeline to make it AI-ready. ReadyAI provides that at low cost.
- The Bittensor advantage: 241 active miners means massive parallel processing. The fractal data mining approach allows miners to process diverse data sources. The cosine distance scoring ensures quality without subjective human judgment.
- Traction signals: 1,146 commits across 23 contributors. 241 active miners. Recent MCP (Model Context Protocol) reference integration with llms.txt site summaries. Dev activity score: 7.3/10.
Category: Data Structuring and Annotation | Centralized Competitor: Scale AI, Labelbox, Appen, Snorkel AI
ReadyAI's value proposition is straightforward: raw data in, structured data out. The subnet capitalizes on two trends: LLMs are now cheaper and more accurate than human annotators, and distributed compute makes the pipeline infinitely scalable.
Mechanism:
Validators serve as both data providers and quality controllers. They pull from a raw data source (minimum 50,000 items recommended to prevent miners reusing results), generate comprehensive ground truth tags, then create data windows for miners. Each window goes to three miners simultaneously.
Miners process their windows using LLMs and return tags, annotations, and vector embeddings. The scoring system compares miner outputs against ground truth using cosine distance in the embedding space. Penalties apply for missing shared tags, insufficient unique tags, or low-quality outputs. This creates a clear optimization target: produce tags that are both accurate (close to ground truth) and novel (unique, meaningful contributions).
The recent llms.txt MCP integration adds a reference data layer: a searchable database of AI-readable website summaries that validators and miners can query for context.
The codebase has 1,146 commits across 23 contributors with 2-5 commits per week. Recent work on auto-fetch and MCP reference integration suggests the team is building toward programmatic access to the data pipeline.
Market metrics are mid-tier. At 35,923 TAO market cap with 1,968 holders, ReadyAI has decent backing. Gini of 0.702 and root proportion of 0.189 are moderate. The 90-day return of +6% is mildly positive, though the 30-day is down 18%.
- 30-day decline: -18% suggests cooling interest despite active development.
- LLM cost dependency: Both miners and validators require LLM API access (OpenAI keys). Cost fluctuations directly impact mining profitability.
- No dedicated website: The subnet lacks a product page, making it harder for potential data buyers to discover and evaluate the service.
- Quality ceiling: The scoring system depends on validator ground truth quality. If validators use weak LLMs or poor data sources, the entire pipeline degrades.