Linguistic Analysis in InfoWatch Solutions | InfoWatch

You are here

Linguistic Analysis in InfoWatch Solutions

The use of linguistic analysis techniques is one of the advantages of InfoWatch solutions compared with those of competitors, since only this technique can guarantee a high level of detection of critical information at any stage of the lifecycle of information, including immediately after it has been created, enabling the format of the document to be determined and also providing an understanding of its sense. This gives quality results, even when analyzing small fragments of text, which may be inserted into any document or sent in informal correspondence or via an instant messaging system (ICQ, jabber, etc.).

Content Filtering Database

Definition of a CFD

A content filtering database is a database that consists of a hierarchically structured list (tree) of categories defined on the basis of probability and mathematical methods with an arbitrary number of nested levels, and contains the words and expressions that enable the topic and level of confidentiality of a document to be determined.

In this technique, automatic determination of the topic of a text is carried out on the basis of a content filtering database (CFD) that has previously been created. A CFD not only describes the categories of information that are circulated within a company, it also takes into account various attributes to determine its confidentiality, including the specific nature of the company's business and its requirements for security. As a result of linguistic analysis, a text is automatically assigned to the appropriate categories based on its topic and content. Analyzed information may contain terms (words and phrases) from different categories; therefore, it can be assigned to one or several CFD categories.

It is important to create a database that will ensure reliable results when filtering information by category. The main technique in CFD-assisted linguistic analysis involves searching the fragment of information being analyzed for words and phrases describing confidential data and structured by category.

Creation of a CFD


Advantages of the Technology

  • proactive protection, including for 'zero day' data;
  • automatic classification of analyzed text;
  • supports all European languages, linguistic support for Russian, English, French, German, Spanish, Italian, Ukrainian, Arabic, Polish, Romanian and Latvian, automatic language detection;
  • can handle multilingual documents;
  • supports inflections (dictionary and fuzzy morphology), automatic selection of morphological analysis technique;
  • pre-defined general content filtering database, ability to use industry databases and add your own.

Digital FingerprintsTemplates Analyzer