The IAEA Fusion Data Lake Project โ€” Accelerating AI and Big Data Applications through Open Science and FAIR Data

์ €์ž: Daljeet Singh Gahle, Matteo Barbarino | ๋‚ ์งœ: 2026-04-02 | URL: https://arxiv.org/abs/2604.01797 📄 PDF


Essence

Figure 1

Figure 1. Shows the high level architecture of the Fusion Data Lake: from the various data

๋ณธ ๋…ผ๋ฌธ์€ IAEA AI for Fusion ์ด๋‹ˆ์…”ํ‹ฐ๋ธŒ์˜ ํ•ต์‹ฌ ์ธํ”„๋ผ์ธ Fusion Data Lake ํ”„๋กœ์ ํŠธ๋ฅผ ๋ณด๊ณ ํ•˜๋ฉฐ, ๊ตญ์ œ ๋ฐ์ดํ„ฐ ์นดํƒˆ๋กœ๊ทธยท๋ฐ์ดํ„ฐ ํŽ˜๋”๋ ˆ์ด์…˜ยท์ค‘์•™ ์Šคํ† ๋ฆฌ์ง€๋ผ๋Š” 3๋Œ€ ์ถ•์œผ๋กœ ๊ตฌ์„ฑ๋œ ๊ธ€๋กœ๋ฒŒ ์œตํ•ฉ ๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ์˜ ์•„ํ‚คํ…์ฒ˜์™€ ๊ตฌํ˜„ ํ˜„ํ™ฉ์„ ์ œ์‹œํ•œ๋‹ค. FAIR ๋ฐ์ดํ„ฐ ์›์น™ ์ค€์ˆ˜ ๋ฐ surrogate modelยทdigital twin ์›Œํฌํ”Œ๋กœ ์ง€์›์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1. Shows the high level architecture of the Fusion Data Lake: from the various data

ํ”Œ๋žซํผ ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„ ๋ฐ ๊ตฌํ˜„: ๊ตญ์ œ ๋ฐ์ดํ„ฐ ์นดํƒˆ๋กœ๊ทธ, ํŽ˜๋”๋ ˆ์ด์…˜, ์ค‘์•™ ์Šคํ† ๋ฆฌ์ง€ 3๋Œ€ ์ถ•์˜ ํ†ตํ•ฉ ํ”Œ๋žซํผ ์™„์„ฑ. ๋‹ค์ค‘ ๊ธฐ๊ด€ ๋ฐ์ดํ„ฐ ํ†ตํ•ฉ: MAST(์˜๊ตญ), LHD(์ผ๋ณธ), Alcator C-Mod(๋ฏธ๊ตญ), HL-2A(์ค‘๊ตญ) 4๊ฐœ ์ฃผ์š” ํ† ์นด๋ง‰/์Šคํ…”๋ผ๋ ˆ์ดํ„ฐ ์นดํƒˆ๋กœ๊ทธ๋ฅผ ๋‹จ์ผ FDL ๋ฐ์ดํ„ฐ ๋ชจ๋ธ ํ•˜์— ์ˆ˜๋ ด. ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ํ‘œ์ค€ํ™”: Minimal Metadata Model ์ •์˜ ๋ฐ ITER IMAS Data Dictionary์™€ ์ •๋ ฌ. ๋ฐ์ดํ„ฐ ๊ฑฐ๋ฒ„๋„Œ์Šค ์ฒด๊ณ„: Terms of Service ๋ฐ 4๋‹จ๊ณ„ ์ ‘๊ทผ-๊ฐœ์ธ์ •๋ณด ๋ณดํ˜ธ ์ˆ˜์ค€(Public, Internal, Restricted, Closed) ๊ทœ์ • ์ˆ˜๋ฆฝ. AI/ML ์›Œํฌํ”Œ๋กœ ๊ธฐ๋ฐ˜ ์ œ๊ณต: Human-friendly ์›น ์ธํ„ฐํŽ˜์ด์Šค์™€ programmable API ์ œ๊ณต์œผ๋กœ surrogate modelยทdigital twin ํ†ตํ•ฉ ์ง€์›.

How

Figure 1

Figure 1. Shows the high level architecture of the Fusion Data Lake: from the various data

โ€ข Snowflake ํด๋ผ์šฐ๋“œ ํ”Œ๋žซํผ ๊ธฐ๋ฐ˜์˜ ETL ํŒŒ์ดํ”„๋ผ์ธ ๋ฐ medallion ๊ตฌ์กฐ ๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ ๊ตฌ์ถ•\nโ€ข ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์ˆ˜์ง‘(metadata-driven ingestion) ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ์„ค์ • ํŒŒ์ผ ๋ฐฉ์‹์˜ ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ณ€ํ™˜ ๋กœ์ง ๊ฐœ๋ฐœ\nโ€ข MAST Data Catalog REST API ํ™œ์šฉํ•œ ํŽ˜๋”๋ ˆ์ด์…˜ ๋™๊ธฐํ™” ๋ฉ”์ปค๋‹ˆ์ฆ˜(Phase I)\nโ€ข Azure Blob Storage ๋ฐ MDS Plus ๋“ฑ ๋‹ค์–‘ํ•œ data source ์—ฐ๊ณ„ ํŒจํ„ด ๊ฐœ๋ฐœ(Phase II)\nโ€ข NUCLEUS ๊ณ„์ • ๊ธฐ๋ฐ˜์˜ ๊ณ„์ธต์  ์ ‘๊ทผ ์ œ์–ด(Public/Internal/Restricted/Closed) ์‹œ์Šคํ…œ ๊ตฌํ˜„\nโ€ข ITER IMAS Data Dictionary ์ •๋ ฌ์„ ํ†ตํ•œ ์˜จํ†จ๋กœ์ง€ ๊ฐœ๋ฐœ

Originality

โ€ข IAEA์˜ ์ค‘๋ฆฝ์  ๊ตญ์ œ๊ธฐ๊ตฌ ์œ„์น˜๋ฅผ ํ™œ์šฉํ•œ ๊ธ€๋กœ๋ฒŒ ๋ฐ์ดํ„ฐ ํŽ˜๋”๋ ˆ์ด์…˜ ๋ชจ๋ธ ์ œ์‹œ โ€” ๊ธฐ์กด ๋‹จ์ผ ๊ธฐ๊ด€ ์ค‘์‹ฌ ๋ฐ์ดํ„ฐ ์‹œ์Šคํ…œ๊ณผ ๋‹ฌ๋ฆฌ ๊ตญ์ œ ํ˜‘๋ ฅ ์ฒด๊ณ„ ๊ธฐ๋ฐ˜์˜ ํƒˆ์ค‘์•™ํ™” ๊ตฌ์กฐ\nโ€ข FAIR ์›์น™ ์ค€์ˆ˜์™€ ITER IMAS ํ‘œ์ค€ํ™”๋ฅผ ๊ฒฐํ•ฉํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ „๋žต โ€” ๋„๋ฉ”์ธ ํŠนํ™” ์˜จํ†จ๋กœ์ง€์™€ ๊ธฐ์ˆ  ์ธํ”„๋ผ์˜ ํ†ตํ•ฉ\nโ€ข 3๋‹จ๊ณ„ PoC ๊ธฐ๋ฐ˜์˜ ์ ์ง„์  ํ™•์žฅ ๋ชจ๋ธ โ€” ๊ฒ€์ฆ๋œ ์•„ํ‚คํ…์ฒ˜๋ฅผ ํ†ตํ•œ ์œ„ํ—˜ ๊ด€๋ฆฌ์™€ ๊ธฐ๊ด€๋ณ„ ์ˆ˜์šฉ์„ฑ ์ œ๊ณ 

Limitation & Further Study

โ€ข ํ˜„์žฌ Phase II ์™„๋ฃŒ ์ƒํƒœ๋กœ ์•„์ง pre-release ๋‹จ๊ณ„์ด๋ฉฐ, Phase III ์ˆ˜ํ–‰ ์ค‘์ด๋ฏ€๋กœ ๋ณธ๊ฒฉ์ ์ธ ์šด์˜ ๊ฒฝํ—˜๊ณผ ๋Œ€๊ทœ๋ชจ ์‚ฌ์šฉ์ž ๊ธฐ๋ฐ˜์—์„œ์˜ ์„ฑ๋Šฅ ๋ฐ์ดํ„ฐ ๋ถ€์žฌ\nโ€ข Minimal Metadata Model ์ƒ์„ธํ™” ์ˆ˜์ค€์ด ์ดˆ๊ธฐ ๋‹จ๊ณ„์ด๋ฉฐ, ITER IMAS์™€์˜ ์ •๋ ฌ ์ž‘์—…์ด ์ง„ํ–‰ ์ค‘์ด์–ด์„œ ์ตœ์ข… ์˜จํ†จ๋กœ์ง€ ํ™•์ • ๋ฏธํก\nโ€ข ๋ฐ์ดํ„ฐ ๊ฑฐ๋ฒ„๋„Œ์Šค ์ „๋žต์ด 'provisional'์œผ๋กœ ๋ช…์‹œ๋˜์–ด ์žˆ์œผ๋ฉฐ, ๊ถŒํ•œ ๋งคํŠธ๋ฆญ์Šค(view/download/edit) ๊ฐœ๋ฐœ์ด ์ถ”ํ›„ ๊ณผ์ œ\nโ€ข ๋…ผ๋ฌธ์—์„œ ๊ธฐ์ˆ  ์„ธ๋ถ€์‚ฌํ•ญ(์˜ˆ: ํŽ˜๋”๋ ˆ์ด์…˜ ๋™๊ธฐํ™” ์ง€์—ฐ, ์บ์‹œ ์ „๋žต, ํ™•์žฅ์„ฑ ํ•œ๊ณ„์ ) ๋ฐ ์„ฑ๋Šฅ ๋ฒค์น˜๋งˆํฌ ๋ถ€์กฑ\nโ€ข ๋ณด์•ˆ, ๋ฐ์ดํ„ฐ ํ”„๋ผ์ด๋ฒ„์‹œ, ์žฌํ•ด๋ณต๊ตฌ(DR) ๋“ฑ ์šด์˜ ์•ˆ์ •์„ฑ ๊ด€๋ จ ๊ธฐ์ˆ ์  ์‹ฌํ™” ๋…ผ์˜ ๋ฏธํก

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: IAEA Fusion Data Lake ํ”„๋กœ์ ํŠธ๋Š” ๊ตญ์ œ ์œตํ•ฉ ์—๋„ˆ์ง€ ๊ณต๋™์ฒด์˜ AI/ML ์—ญ๋Ÿ‰ ๊ฐ•ํ™”๋ฅผ ์œ„ํ•œ ์ „๋žต์  ์ธํ”„๋ผ๋กœ, FAIR ์›์น™๊ณผ ๋ฐ์ดํ„ฐ ํŽ˜๋”๋ ˆ์ด์…˜ ๋ชจ๋ธ์„ ์ ์ ˆํžˆ ๊ฒฐํ•ฉํ•œ ์‹ค์šฉ์  ๊ตฌํ˜„์„ ์ œ์‹œํ•œ๋‹ค. Phase II ์™„๋ฃŒ ๋ฐ Phase III ์ง„ํ–‰ ์ค‘์ธ ์ƒํ™ฉ์—์„œ MASTยทLHDยทAlcator C-ModยทHL-2A 4๊ฐœ ์ฃผ์š” ์žฅ์น˜์˜ ๋‹ค์ค‘ ๋ฐ์ดํ„ฐ ํ†ตํ•ฉ ์„ฑ๊ณต์€ ๊ธ€๋กœ๋ฒŒ ํ˜‘๋ ฅ ์ฒด๊ณ„์˜ ๊ธฐ์ˆ ์  ์‹คํ˜„์„ฑ์„ ์ž…์ฆํ•˜๋Š” ์˜๋ฏธ์žˆ๋Š” ์„ฑ๊ณผ์ด๋‹ค. ๋‹ค๋งŒ ๋ณธ ๋…ผ๋ฌธ์€ ์ธํ”„๋ผ ํ˜„ํ™ฉ ๋ณด๊ณ ์— ์ค‘์ ์„ ๋‘๊ณ  ์žˆ์–ด (1) ์‹ค์ œ ์šด์˜ ํ™˜๊ฒฝ์—์„œ์˜ ์„ฑ๋Šฅ, (2) ๋จธ์‹ ๋Ÿฌ๋‹ ์›Œํฌํ”Œ๋กœ ํ†ตํ•ฉ์˜ ๊ตฌ์ฒด์  ์‚ฌ๋ก€, (3) ๋ฐ์ดํ„ฐ ์งˆ ๋ณด์ฆ(Data Quality Assurance) ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๋“ฑ ๊ตฌํ˜„ ์„ธ๋ถ€์‚ฌํ•ญ์ด ๋ถ€์กฑํ•˜๋ฉฐ, (4) ๊ฑฐ๋ฒ„๋„Œ์Šค ์ „๋žต์ด ์•„์ง provisional ๋‹จ๊ณ„์ด๋ฏ€๋กœ ์ •์ฑ… ๊ตฌํ˜„์˜ ๊ตฌ์ฒด์„ฑ์ด ์ œํ•œ์ ์ด๋‹ค. ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  pre-release ๋‹จ๊ณ„์—์„œ 3๊ฐœ๊ตญ ์ด์ƒ์˜ ๋‹ค๊ธฐ๊ด€ ๋ฐ์ดํ„ฐ ์ˆ˜๋ ด๊ณผ ํ‘œ์ค€ํ™”๋œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ชจ๋ธ ์ •์˜๋Š” ์‹ค์งˆ์  ์ง„์ „์ด๋ฉฐ, ๊ตญ์ œ ๋น…๋ฐ์ดํ„ฐ ์ธํ”„๋ผ ๊ตฌ์ถ•์˜ ์„ ๋ก€๋กœ ๋†’์€ ๊ฐ€์น˜๋ฅผ ์ง€๋‹Œ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์—‘์‚ฌ์Šค์ผ€์ผ ์ปดํ“จํŒ… ๊ธฐ๋ฐ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ธํ”„๋ผ์— ๊ด€ํ•œ ์ด๋ก ์  ๋ฐฐ๊ฒฝ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋จธํ‹ฐ๋ฆฌ์–ผ์Šค ๋ถ„์•ผ์—์„œ ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ๊ณผ ๋ฐ์ดํ„ฐ ์ธํ”„๋ผ์˜ ์—ญํ• ์„ ํฌ๊ด„์ ์œผ๋กœ ๋ฆฌ๋ทฐํ•˜์—ฌ ๊ธ€๋กœ๋ฒŒ ๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ ๊ตฌ์ถ•์— ์ด๋ก ์  ๊ทผ๊ฑฐ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
822๋Š” AI ์—์ด์ „ํŠธ์˜ ์‹ ๋ขฐ์„ฑ์„ ๊ณผํ•™์  ๊ด€์ ์—์„œ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ๋‹ค๋ฃจ์–ด, 3257์ฒ˜๋Ÿผ ๋Œ€๊ทœ๋ชจ ๊ธ€๋กœ๋ฒŒ ํ”Œ๋žซํผ์—์„œ AI ํ™œ์šฉ์‹œ ์‹œ์Šคํ…œ ์‹ ๋ขฐ์„ฑ ๋ฌธ์ œ์— ๋„์›€์„ ์ค๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์ „ ์„ธ๊ณ„ ์ž์œจ ์‹คํ—˜ ํ”Œ๋žซํผ์„ ์—ฐ๊ฒฐํ•˜๋Š” ๊ธ€๋กœ๋ฒŒ ์ž๋™๊ณผํ•™ ์ƒํƒœ๊ณ„ ๊ตฌ์ถ• ๋…ผ๋ฌธ์œผ๋กœ, ๋ฐ์ดํ„ฐ ๋ ˆ์ดํฌ ํ”„๋กœ์ ํŠธ์™€ ์œ ์‚ฌ๋ฏธ๋ž˜์ƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๊ณผํ•™์  ๋ฐœ๊ฒฌ ์ž๋™ํ™”์— ๊ด€ํ•œ ๋…ผ์˜๋กœ, ๋ณธ ๋…ผ๋ฌธ์˜ ๊ธ€๋กœ๋ฒŒ ๋ฐ์ดํ„ฐ ์นดํƒˆ๋กœ๊ทธ์˜ ํ™œ์šฉ ๊ฐ€๋Šฅ์„ฑ๊ณผ ์—ฐ๊ด€๋ฉ๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
AI ๊ธฐ๋ฐ˜ ๊ฐ€์† peer review ๋ฐ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ํ™œ์šฉ ์‚ฌ๋ก€๋กœ ์œตํ•ฉ ๋ฐ์ดํ„ฐ ๋ ˆ์ดํฌ์˜ ์ž ์žฌ์  ์‘์šฉ์ž…๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
254๋Š” ์—์ด์ „ํ‹ฑ ๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ ๊ตฌ์ถ• ์‚ฌ๋ก€๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, 3257์˜ ๊ธ€๋กœ๋ฒŒ ๋ฐ์ดํ„ฐ ๋ ˆ์ดํฌ ์ธํ”„๋ผ ์„ค๊ณ„์™€ ์‹ค๋ฌด์ ์œผ๋กœ ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
ํ•ต์œตํ•ฉ ๊ธฐ๋ฐ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ๋ฐ์ดํ„ฐ ๋ ˆ์ดํฌ ํ™œ์šฉ ์‹œ, ์‹ค์‹œ๊ฐ„ ์ œ์–ด ๋ฐ ์—์ด์ „ํŠธ ํ”„๋ ˆ์ž„์›Œํฌ ์ถ”์ง„๋ฐฉํ–ฅ์— ์‘์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •