Showing similar terms to a search query, in this case ‘Pegmatite’ using text embedddings in a graph network. This can support data exploration, the thickness of the edge is related to cosine similarity (the thicker the green line, the more similar the association). For this visualisation I modelled text embeddings into NetworkX to display in PyVis (Python).
Graph networks typically consist of nodes and edges. In this situation nodes are terms derived from millions of words trained on geoscience text. The edges are the cosine similarity of words vectors, the more similar words are to each other (based on co-occurring words) the closer to 1 the cosine will be.
Exploring networks such as these can lead to data driven discoveries. Use cases in the geoscience sector include mining to geothermal, renewables to oil and gas, radioactive storage to carbon capture, geohazards to hydrogeology, geotechnical to academic geosciences.
#naturallanguageprocessing #python #geosciences #subsurface #digital #visualization #languagemodels #knowledgegraph #semanticsearch #geology #earthscience
Leave a comment