Datasets

Coverage across the data types modern AI products depend on.

Dragon AI supports both shelf-ready and custom-built datasets so customers can start quickly or scope a program around a specific model, domain, or deployment need.

Speech

ASR, speaker labeling, multilingual audio, transcription, and structured speech datasets.

Text

Instruction data, QA corpora, classification sets, knowledge assets, and language data for advanced AI learning.

Image

Classification, detection, segmentation, captioning, and visual taxonomy annotation.

Video

Clip tagging, event detection, temporal segmentation, scene understanding, and multimodal alignment.

Audiovisual

Synchronized audio-video datasets for multimodal training, annotation, and evaluation workflows.

OCR and Documents

Document parsing, key information extraction, and layout-aware annotation.

Multimodal

Image-text, video-text, and speech-text collections for aligned systems.

Fine-tuning

Supervised fine-tuning data, prompt-response pairs, and instruction tuning sets.

Preference and RLHF

Ranking data and pairwise judgments for model alignment.

Evaluation

Benchmark creation, adversarial testing, and release-stage regression suites.

Delivery Options

Choose shelf-ready data for speed or custom programs for specific model goals.

Off-the-Shelf Datasets

Best when teams need to move quickly, test feasibility, or accelerate prototyping with structured data that is already available.

Custom Dataset Programs

Best when data requirements involve unique domains, languages, policies, taxonomies, or evaluation criteria.