An artificial intelligence training image data set developed by decentralized AI solution provider OORT has seen considerable success on Google’s platform Kaggle.
OORT’s Diverse Tools Kaggle data setlistingwas released in early April; since then, it has climbed to the first page in multiple categories. Kaggle is a Google-owned online platform for data science and machine learning competitions, learning and collaboration.
Ramkumar Subramaniam, core contributor at crypto AI project OpenLedger, told Cointelegraph that “a front-page Kaggle ranking is a strong social signal, indicating that the data set is engaging the right communities of data scientists, machine learning engineers and practitioners.“
Max Li, founder and CEO of OORT, told Cointelegraph that the firm “observed promising engagement metrics that validate the early demand and relevance” of its training data gathered through a decentralized model. He added:
Li also said that OORT plans to release multiple data sets in the coming months. Among those is an in-car voice commands data set, one for smart home voice commands and another for deepfake videos meant to improve AI-powered media verification.
Related:AI agents are coming for DeFi — Wallets are the weakest link
First page in multiple categories
The data set in question was independently verified by Cointelegraph to have reached the first page in Kaggle’s General AI, Retail & Shopping, Manufacturing, and Engineering categories earlier this month. At the time of publication, it lost those positions following a possibly unrelated data set update on May 6 and another on May 14.
While recognizing the achievement, Subramaniam told Cointelegraph that “it’s not a definitive indicator of real-world adoption or enterprise-grade quality.” He said that what sets OORT’s data set apart “is not just the ranking, but the provenance and incentive layer behind the data set.” He explained:
Lex Sokolin, partner at AI venture capital firm Generative Ventures, said that while he does not think these results are hard to replicate, “it does show that crypto projects can use decentralized incentives to organize economically valuable activity.”
Related:Sweat wallet adds AI assistant, expands to multichain DeFi
High-quality AI training data: a scarce commodity
Datapublishedby AI research firm Epoch AI estimates that human-generated text AI training data will be exhausted in 2028. The pressure is high enough that investors are nowmediatingdeals granting rights to copyrighted materials to AI companies.
Reports concerning increasingly scarce AI training data and how it may limit growth in the space have beencirculatingfor years. While synthetic (AI-generated) data is increasingly used with at least some degree of success, human data is still largely viewed as the better alternative, higher-quality data that leads to better AI models.
When it comes to images for AI training specifically, things are becoming increasingly complicated with artists sabotaging training efforts on purpose. Meant to protect their images from being used for AI training without permission,Nightshadeallows users to “poison” their images and severely degrade model performance.
Subramaniam said, “We’re entering an era where high-quality image data will become increasingly scarce.” He also recognized that this scarcity is made more dire by the increasing popularity of image poisoning:
In this situation, Subramaniam said that verifiable and community-sourced incentivized data sets are “more valuable than ever.” According to him, such projects “can become not just alternatives, but pillars of AI alignment and provenance in the data economy.“
Magazine:AI Eye: AI’s trained on AI content go MAD, is Threads a loss leader for AI data?
Explore more articles like this
Subscribe to the Finance Redefined newsletter
A weekly toolkit that breaks down the latest DeFi developments, offers sharp analysis, and uncovers new financial opportunities to help you make smart decisions with confidence. Delivered every Friday
By subscribing, you agree to ourTerms of Services and Privacy Policy