From a collection of documents to a Graph Visualization in 60 minutes using OpenSource. As a quick exploratory view of a collection of documents too vast to read, these techniques may be useful.
Using tools like Python, create n-grams (bi-grams, tri-grams to make it quicker) from your text, terms are nodes, spaces are edges, frequency of occurrence can be visualised through the thickness or colour of the edges (lines connecting nodes). Load these into any Graph Database/Structure and visualise, store URI’s to drill down to documents once you find an interesting association.
Use semantic techniques to enhance accuracy and/or algorithms like Pointwise Mutual Information (PMI) for discriminatory analysis. Example below from Feb 2017 when I used 6,000 PDF articles from SEG Journal (courtesy GSW).
Very quick, very simple – may be capable when context limited (visualising the entire Graph structure is normally too dense) of highlighting surprising connections & facilitating learning.
#bigdata #analytics #geology #visualization #graph