SciMKG A Multimodal Knowledge Graph for Science Education with Text, Image, Video and Audio

🚀 Introduction

SciMKG is a large-scale multimodal educational knowledge graph (MEKG) covering text, images, videos, and audio for K-12 science education. It is automatically constructed using a novel LLM-powered pipeline for concept extraction and multimodal alignment.

Four modalities covered: text, image, video, audio
1,356 knowledge points
34,630 multimodal concepts
403,400 triples
10,527 images · 10,425 videos · 34,630 audios

🔥 Framework

SciMKG Framework

SciMKG is built using an Extraction–Verification–Integration–Augmentation (EVIA) pipeline:

Extraction Use multiple LLMs to extract K–12 science concepts from MOOC subtitles.
Verification Apply self-feedback (SELF-REFINE) to prune ambiguous or irrelevant concepts.
Integration Use self-consistency voting to merge multiple LLM outputs.
Augmentation Expand concepts through ConceptNet & Wikipedia; generate rewritten text and audio.
Multimodal Alignment Align images, videos, and audio to concepts using multimodal LLMs (e.g., GPT-4o, Gemini).

This pipeline ensures robustness, high precision, and semantic consistency across modalities.

📦 Installation & Usage

Installation

pip install scimkg

Usage

import  scimkg
kg = scimkg("video_path,pdf_path")
triples = kg.build("subject")
rdf = triples.rdf()

📊 Dataset Statistics

Discipline	Knowledge Points	Concepts	Exercises	Triples
Biology	526	16,839	255	191,928
Physics	521	11,015	288	145,666
Chemistry	309	6,776	220	65,806

Modality	Items	Concept Coverage
Image	10,527	39%
Video	10,425	80%
Audio	34,630	100%

🧠 Applications

SciMKG enables:

Multimodal educational question answering
Multimodal question generation
Cross-modal knowledge retrieval
Intelligent tutoring systems
Science education agents
Curriculum-level analytics

📄 Citation

If you use SciMKG or our construction framework, please cite:

@article{SciMKG2026,
  title={SciMKG: A Multimodal Knowledge Graph for Science Education with Text, Image, Video and Audio},
  author={Tong Lu, Zhichun Wang, Yaoyu Zhou, Yiming Guan, Zhiyong Bai, Junsheng Du},
  year={2026},
  journal={AAAI}
}

Educational Knowledge Graph Large Language Models Concept Extraction Multimodal Alignment

Authors

Tong Lu

PhD candidate

I am a second year Ph.D. candidate in the School of Artificial Intelligence at Beijing Normal University, Beijing, China. I obtained a Bachelor of Science degree from Hebei GEO University and a Master of Engineering degree from Yunnan University. Now, I engage in research related to the field of natural language processing.

No results found