
Open access code: Querying one of the largest mineral databases in the world using natural language for co-occurrence mineral analysis and heat map visualization for geoscience data analysis.
Interesting paper from Zhang et al (2025) from the University of Idaho connecting Open AI’s GPT-4o Large Language Model (LLM) through prompt engineering to the mineral database Mindat via the Python OpenMindat API. Mindat is viewed annually over 50 million times making a significant contribution to the geological community.
The authors claim the LLM workflow enhances the efficiency of exploratory data analysis. This looks very promising, it may be interesting to quantify this in the medium term through empirical information science research with geoscientists. The resulting output, prompts and code is all Open-sourced in Github to support Open science, funded by the US National Science Foundation (NSF).
Paper here: Jiyin Zhang, Cory Clairmont, Xiang Que, Wenjia Li, Weilin Chen, Chenhao Li, Xiaogang Ma (2025). Streamlining geoscience data analysis with an LLM-driven workflow, Applied Computing and Geosciences, 25. https://doi.org/10.1016/j.acags.2024.100218.
GitHub here: https://github.com/ChuBL/LLM_Driven_Mindat_Workflow
Leave a comment