The GCKEngine was developed as part of my Arizona State University undergraduate thesis. The project started out as an attempt to create a general-purpose semantic search engine, but focus eventually shifted towards the difficulty of automatic document classification and tagging. It was realized that automatic tagging would be far easier given arbitrarily specific ontologies, the project turned into an attempt to create a program which could, given a set of documents, create a reasonably accurate and complete weighted ontology for the domain represented by the training docs.

Automatic ontology-building is powered by Google: by running a Google search on multiple terms, one may analyze the results and ascertain a relationship between the terms. This thesis attempts solely to establish a basic "association level" between two terms, but I believe that deeper relationships - and more complex ones - may also be ascertained through this relationship.

This project found that, using only a very basic automatic ontology generation program (the titular "GCKEngine") an ontology of reasonable accuracy and completeness could be generated. I believe that further work in this field could result in very accurate ontology generators, which could then be used to power the semantic web.

The images show an Apache Solr database consisting of documents tagged using automatically generated ontologies. As can be seen, the resources are accurately tagged. The images also show the GCKEngine architecture, and important snippets of code from the engine itself.

The thesis may be found in ASU's Digital Thesis Repository here. It may also be downloaded using the links below.

If you wish to learn more about the GCKEngine, please contact me.

View the GCKEngine