Zuordnung von Werbebannern zu spezifischen Texten

Ausgangslage
Es liegt eine grössere Menge von Werbebannern vor (z.B. 1000 bis 1 Million), und zu jedem Werbebanner gibt es einige Stichwörter oder einen Textblock (deutsch, englisch, französisch, italienisch oder spanisch).

Ziel
Zu irgendeinem beliebigen Dokument (in D; E, F, I oder ES) sollen diejenigen Werbebanner gefunden werden, die inhaltlich am besten zu diesem Dokument passen.

Mögliche Lösung mit InfoCodex
In einem ersten Schritt werden die gegebenen Werbebanner aufgrund der vorhandenen Stichwörter bzw. Kurztexte inhaltlich analysiert und automatisch in eine sachlogisch gegliederte Informations­landkarte (“virtuelles Bücherregal”) eingeordnet. Werbebanner mit ähnlichem Inhalt werden dabei im gleichen Fach abgelegt. Die Gliederung erfolgt durch InfoCodex ohne menschliches Zutun, kann aber im Bedarfsfall beeinflusst werden.

Auto-Kategorisierung

Die sporadisch eingehenden Dokumente (denen passende Werbebanner zugeordnet werden sollen) werden laufend inhaltlich analysiert und aufgrund eines fundierten Ähnlichkeitsmasses in der Infor­mationslandkarte “platziert”. Als Resultat wird eine kurze Liste mit den am besten passenden Werbe­bannern zurückgegeben:
Werbebanner 37: 95% Relevanz
Werbebanner 2021: 92% Relevanz
Werbebanner 195: 87% Relevanz
etc.

Technische Angaben
Die Software-Komponenten von InfoCodex stehen als API-Module zur Verfügung und können auch in der Form von Web Services angeboten werden.

Die Software läuft unter Windows, Linux (Debian, Suse, Red Hat) oder Unix (Solaris, IBM AIX, HP Unix).

Weitere Unterlagen
Die vorgeschlagene Lösung entspricht im Prinzip der beiliegend beschriebenen Einordnung von neuen Dokumenten in ein vorgegebenes Klassifikationsschema (“Matching a Fixed Classification System with InfoCodex”).

Security Gaps in Search Engines

Theories and allegations are one thing – but it is functionality in practice that counts.

Suppose your documents have been indexed by Google Search Appliance. Make any search and  note, e.g., the seventh search result. Then, change the access right for this document such that your user account has no read access anymore to this specific document. Now submit the same search again and see what happens…

Enterprise Search, Security and Privacy – InfoCodex makes a difference.

1. Enterprise versus Internet search
The assumption that similar approaches could be used in enterprise and internet searches “turns out to be surprisingly faulty” (Marc Strohlein: Executive Guide to Search, BusinessWeek, May 15 2006; see also Alan Cane: The future of search: It’s how, not where, you look, Financial Times, March 28 2007). The most import difference is that internet search engines do not have to care about security at all.

As a consequence, it is not easy for search engines originally developed for the internet to satisfy the security and privacy requirements of an enterprise environment (see, e.g., Gartner Research: Manage Google’s desktop search now or lock it out, 16 Feb. 2006; or Gartner Research: Google enterprise search has its limits, 13 Mar 2006)

2. Access rights for enterprise document repositories (security)
It seems to be generally expected that a user of an enterprise search engine should see only those documents for which he has the necessary access rights. This means that a search engine must respect “File system security” that adheres to the access rights of the underlying network. This requirement is, however, not always met – even it the product supplier claims to have a “sophisticated security system”. A real support of “File system security” may have serious impacts on the performance (search speed) and corresponds to a ridge walk between “Scylla and Charybdis” (security and performance).

It might happen that the access rights of some files must be changed by the system administrator or by a user (e.g. because a search engine has displayed search results to unauthorized users). In such a case the enterprise search engine should react immediately to the modified access rights – a requirement that is seldom met (one of the very few systems supporting this feature is InfoCodex).

3. Highly sensitive data and privacy
Today’s systems for handling the file access rights in an enterprise network offer a great flexibility on various levels. But this means also that the administration has become really difficult – leading to increased human mistakes or negligences.

Enterprise search engines facilitate the discovery of information stored on networks, and relying on the “File system security” might not be enough in view of possible risks in the access right settings. For the handling of high data security and privacy, additional measures have to be taken. In the InfoCodex system, this is achieved by creating protected sub-domains for which selected users/groups own the full sovereign rights. Even system administrators have no access rights to the search and viewing functions in those protected sub-domains.

Via the InfoCodex Blog.

Follow-Up:

Follow-Up II: Namics also has an opinion on this matter.

Follow-Up III: It just seems that if you change file-system access right on one file via the windows explorer then the windows explorer will not find the file anymore, but GSA will still find the file. InfoCodex will also not find the file anymore. This is where the Google Search Appliance just lacks security and privacy, not matter what Matthew Glotzbach says.