Tuesday 25th November 2008
Data mining

In this study, by author Laura Ruotsalainen, four efficient tools for analysing patent documents were tested: Thomson Reuter's Aureka and Thomson Data Analyzer, Biowisdom's OmniViz, and STN's STN AnaVist. All four tools analyse structured and unstructured data alike. They all visualize the results achieved fromclustering the text fields of patent documents and either provide basic statistics graphs themselves or contain filters for performing them with other solutions.

Approximately 80% of scientific and technical information can be found from patent documents alone, according to a study carried out by the European Patent Office. Patents are also a unique source of information since they are collected, screened and published according to internationally agreed standards. In addition to being an extremely valuable source of technology intelligence, patent documents offer a business competitive intelligence by revealing a competitor's strengths and strategies. Information gained from patents can also help in locating partners for cross-licensing and collaboration.

Since the patent system was established, more than 60m patent
applications have been published. It would be impossible to find and
analyse relevant documents manually. The need for analysis and
evaluation tools for patents has been acknowledged by many solution

New solutions are continuously coming onto the market; tools
for reading and evaluating individual patents and tools for analyzing
sets of patent documents. Solutions of the latter type can still be
roughly divided into two groups: tools for retrieving and preparing
basic statistics for patent documents, and tools for visualisation and
progressive analysis of patents. The former group deals only with data
in a structured form, whereas the latter also analyses unstructured text
and other data.

Comparison of the features of the tools. The number of plus signs is to
make it easier to evaluate the tool for the specific needs of the reader.


The tools were tested with two cases, evaluating their ability to offer
technology and business intelligence from patent documents forcompanies' daily business. Being aware of the state of the art of relevant technology areas is crucial for a company's innovation process.

Knowledge of developed techniques and products forestalls overlapping
R&D projects and thereby prevents unnecessary investment. Equally
important is the recognition of other actors operating in the field.
Benchmarking and evaluating a competitor's R&D and market strategies
aids in managing one's own processes and locating possible parties for
collaboration or cross-licensing.

This study took the point of view of a patent analyst with a basic
understanding of patent data but no special knowledge of data mining
techniques or the tools tested. All the tools evaluated are very useful for the task and quite easy to adopt for daily work. All four had some strengths and weaknesses in comparison to each other.

As a conclusion it could be stated that OmniViz and Thomson Data Analyzer are tools for sophisticated and diversified mathematical analysis of the data. Aureka and AnaVist are convenient for easily visualizing basic statistics and "top lists" of the data and for making stylish patent maps.

  • OmniViz unique features when compared to the other tools tested, are the possibility to visualize clustered data from many different points of view and the possibility to evaluate some attributes with patent map animations.
  • Thomson Data Analyzer offers efficient tools for comparing different subsets of the data, e.g. for identifying unique values of an attribute.
  • Aureka is the only tool to allow citation analyses and has the most illustrative patent map.

