
I have to thank Richard Scott from BHP who pointed me at this recent (June 2023) published paper. “Learning a Foundation Language Model for Geoscience Knowledge Understanding and Utilization” by Deng et al (2023). https://arxiv.org/abs/2306.05064
Thought the community at large would be interested.
The researchers mainly from Shanghai Jiao Tong University claim to have created the world’s first geoscience LLM (at least in the public domain – I know a company that built their own in 2022) using 1.2 Million PDF’s from Deep Time Digital Earth (DDE).https://www.ddeworld.org They call it K2.
They transfer learned LLaMA-7B with the geoscience content and fine tuned further using a geoscience instruction dataset GeoSignal for entity extraction, classification and summarisation. They also created a benchmark test set for geoscience LLM’s.
Model here in Github
https://github.com/davendw49/k2
There is also a demo:
Leave a comment