SciMKG A Multimodal Knowledge Graph for Science Education with Text, Image, Video and Audio
🚀 Introduction
SciMKG is a large-scale multimodal educational knowledge graph (MEKG) covering text, images, videos, and audio for K-12 science education. It is automatically constructed using a novel LLM-powered pipeline for concept extraction and multimodal alignment.
- Four modalities covered: text, image, video, audio
- 1,356 knowledge points
- 34,630 multimodal concepts
- 403,400 triples
- 10,527 images · 10,425 videos · 34,630 audios
🔥 Framework
- Extraction Use multiple LLMs to extract K–12 science concepts from MOOC subtitles.
- Verification Apply self-feedback (SELF-REFINE) to prune ambiguous or irrelevant concepts.
- Integration Use self-consistency voting to merge multiple LLM outputs.
- Augmentation Expand concepts through ConceptNet & Wikipedia; generate rewritten text and audio.
- Multimodal Alignment Align images, videos, and audio to concepts using multimodal LLMs (e.g., GPT-4o, Gemini).
This pipeline ensures robustness, high precision, and semantic consistency across modalities.
📦 Installation & Usage
Installation
pip install scimkg
Usage
import scimkg
kg = scimkg("video_path,pdf_path")
triples = kg.build("subject")
rdf = triples.rdf()
📊 Dataset Statistics
| Discipline | Knowledge Points | Concepts | Exercises | Triples |
|---|---|---|---|---|
| Biology | 526 | 16,839 | 255 | 191,928 |
| Physics | 521 | 11,015 | 288 | 145,666 |
| Chemistry | 309 | 6,776 | 220 | 65,806 |
| Modality | Items | Concept Coverage |
|---|---|---|
| Image | 10,527 | 39% |
| Video | 10,425 | 80% |
| Audio | 34,630 | 100% |
🧠 Applications
SciMKG enables:
- Multimodal educational question answering
- Multimodal question generation
- Cross-modal knowledge retrieval
- Intelligent tutoring systems
- Science education agents
- Curriculum-level analytics
📄 Citation
If you use SciMKG or our construction framework, please cite:
@article{SciMKG2026,
title={SciMKG: A Multimodal Knowledge Graph for Science Education with Text, Image, Video and Audio},
author={Tong Lu, Zhichun Wang, Yaoyu Zhou, Yiming Guan, Zhiyong Bai, Junsheng Du},
year={2026},
journal={AAAI}
}