A patent recognition system using text matching in published journals

A law firm specialising in patent applications requested us to create a system which would find existing patents to reduce the likelihood of duplication during the patent application process.

Our solution was a server into which the national monthly journal of patent applications could be imported, in PDF format. The journal consists of paragraphs for each patent application in a somewhat structured format.

The server extracted all words and stored them in a PostgreSQl database in searchable format. Lawyers could then lookup potential patent words and phrases, and the search would find matching results, taking into account duplicated and replaced letters.

All results were scored, to indicate similarity to the search words and the users could open the matching journal entry to check if there was potential infringement.

This system was written over a period of a few weeks and deployed to the client’s servers.