Enterprise Search, Security and Privacy – InfoCodex makes a difference.

1. Enterprise versus Internet search
The assumption that similar approaches could be used in enterprise and internet searches “turns out to be surprisingly faulty” (Marc Strohlein: Executive Guide to Search, BusinessWeek, May 15 2006; see also Alan Cane: The future of search: It’s how, not where, you look, Financial Times, March 28 2007). The most import difference is that internet search engines do not have to care about security at all.

As a consequence, it is not easy for search engines originally developed for the internet to satisfy the security and privacy requirements of an enterprise environment (see, e.g., Gartner Research: Manage Google’s desktop search now or lock it out, 16 Feb. 2006; or Gartner Research: Google enterprise search has its limits, 13 Mar 2006)

2. Access rights for enterprise document repositories (security)
It seems to be generally expected that a user of an enterprise search engine should see only those documents for which he has the necessary access rights. This means that a search engine must respect “File system security” that adheres to the access rights of the underlying network. This requirement is, however, not always met – even it the product supplier claims to have a “sophisticated security system”. A real support of “File system security” may have serious impacts on the performance (search speed) and corresponds to a ridge walk between “Scylla and Charybdis” (security and performance).

It might happen that the access rights of some files must be changed by the system administrator or by a user (e.g. because a search engine has displayed search results to unauthorized users). In such a case the enterprise search engine should react immediately to the modified access rights – a requirement that is seldom met (one of the very few systems supporting this feature is InfoCodex).

3. Highly sensitive data and privacy
Today’s systems for handling the file access rights in an enterprise network offer a great flexibility on various levels. But this means also that the administration has become really difficult – leading to increased human mistakes or negligences.

Enterprise search engines facilitate the discovery of information stored on networks, and relying on the “File system security” might not be enough in view of possible risks in the access right settings. For the handling of high data security and privacy, additional measures have to be taken. In the InfoCodex system, this is achieved by creating protected sub-domains for which selected users/groups own the full sovereign rights. Even system administrators have no access rights to the search and viewing functions in those protected sub-domains.

Via the InfoCodex Blog.

Follow-Up:

Follow-Up II: Namics also has an opinion on this matter.

Follow-Up III: It just seems that if you change file-system access right on one file via the windows explorer then the windows explorer will not find the file anymore, but GSA will still find the file. InfoCodex will also not find the file anymore. This is where the Google Search Appliance just lacks security and privacy, not matter what Matthew Glotzbach says.

Heap fragmentation in a long running Ruby process

 

Abstract

In a long-running ruby process with a highly dynamic object-space, we encountered performance degradation and finally memory-allocation failure due to heap fragmentation. The problem can be mitigated by linking ruby against ptmalloc3.

 

Hi all! I’m writing this mail in the hope that my experiences may point you in the right direction, if you ever encounter a similar problem. Naturally I would be delighted to read your comments and advice on my conclusions and the steps taken.

 

http://ch.oddb.org [1] provides information on the swiss health-care market. Behind an Apache/mod-ruby setup lies a single ruby-process, which acts as a DRb-Server. Predating Ruby on Rails, the application is based on self-baked libraries [2-4].

 

A couple of weeks ago we experienced a spike in user requests. Although the application seemed to scale well most of the time, we began experiencing outages after a couple of hours. Whenever that happened, CPU-Load rose to 100% and DRb-Requests were hanging, sometimes for several minutes. At the same time, memory usage started rising considerably. If left to run for enough time, the application would crash with a NoMemoryError: ‘Failed to allocate Memory’ – even though there was still plenty of Memory available in the system.

 

Thanks to Jamis Buck [5] and Mauricio Fernandez [6] I was able to determine that the application was stuck for several seconds in glibc’s realloc, which may be called (via ruby_xrealloc) from basically anywhere within ruby where a new or enlarged chunk of memory might be required.

 

Having stated the diagnosis: heap fragmentation [7], there were a couple of things I could try to improve the performance of our application, all revolving around the principle of creating fewer objects, and in particular fewer Strings, Arrays and Hashes. By eliminating a number of obvious suspects (mainly to do with the on-demand sorting of values stored in a large Hash), I was able to raise the life-expectancy of our application considerably – close, but no cigar.

 

And then – all praise bugzilla – I found a bugreport [8] describing almost exactly my problems and leading me to ptmalloc3 [9]. Glibc’s malloc implementation is based on ptmalloc2, and may be replaced by simply linking ruby against ptmalloc3.

 

As far as I understand, ptmalloc3 does not eliminate heap fragmentation. However, due to the bit-wise tree employed in the newer version, it finds free chunks of the right size in shorter time by several orders of magnitude. Additionally, it seems that glibc 2.5 abandons its attempts to find a best-fit chunk after a while (possibly after 10000 tries), instead expanding the heap as long as possible and finally failing to allocate memory – causing first the fast rise in memory usage and later the observed NoMemoryError.

 

At this time, http://ch.oddb.org has run – powered by ruby and ptmalloc3 – for a little more than 24 hours without displaying any of the signs I have come to associate with heap fragmentation. Significantly less time is spent in allocating memory – and consequently in GC, and the overall memory-footprint has decreased by about 30%.

 

I hope this is of use – thanks in advance for any thoughts you want to share.

Hannes Wyss

[1] Open Drug Database
http://scm.ywesee.com/?p=oddb.org;a=summary
[2] Object-Database Access and Object Cache
http://scm.ywesee.com/?p=odba;a=summary
[3] State-Based Session Management
http://scm.ywesee.com/?p=sbsm;a=summary
[4] Component-Based Html generator
http://scm.ywesee.com/?p=htmlgrid;a=summary
[5] Inspecting a live ruby process, Jamis Buck
http://weblog.jamisbuck.org/2006/9/22/inspecting-a-live-ruby-process
[6] Ruby live process introspection, Mauricio Fernandez
http://eigenclass.org/hiki.rb?ruby+live+process+introspection
[7] Heap fragmentation, Bruno R. Preiss
http://www.brpreiss.com/books/opus8/html/page425.html
[8] Glibc bugzilla report 4349, Mingzhou Sun, Tomash Brechko
http://sourceware.org/bugzilla/show_bug.cgi?id=4349
[9] Ptmalloc home, Wolfram Gloger
http://www.malloc.de/en/

Replaced glibc with ptmalloc3

Ok, this seems to kick some serious ass as far as our heap fragmentation at ODDB.org is concerned. Our CPU is not constantly at 99% anymore.

Heap Fragmentation

It seems that we maybe suffering from a Heap Fragmentation at ODDB.org – Anybody out there who has experience with Heap Fragmentations and Ruby?

Update: Our Heap Fragmentation seems to take this direction.

Enterprise Intelligence

I just watched this piece of information available at PBS.org. I came across the name of Jeff Jonas and his post about Enterprise Intelligence. I believe he should be interested in the InfoCodex technology.

ODBA Verbesserungen

Aus dem Mail von Hannes Wyss an die ywesee interne Liste:

Der Aktuelle commit der ODBA
beinhaltet mehrere Verbesserungen und zwei Bugfixes, die insgesamt die
langfristige Memory-Auslastung kontrollieren sollen. Endgültige
Bestätigung werden wir erst aus dem online-Dauerbetrieb erhalten.

  • der Cleaner-Thread im ODBA.cache läuft mit höherer (normaler) Priorität und häufiger, dafür aber für kürzere Zeit. D.h. konkret dass im Zeitraum von ca 10 Sekunden jeweils 500 Objekte überprüft und gegebenenfalls aus dem Cache gelöscht werden.
  • Bugfix: wenn Collection-Elemente einzeln aus der DB geladen werden, werden sie neu auch im Cache registriert.
  • CacheEntry führt darüber Buch, welche Objekte auf ein bestimmtes anderes Objekt zugreifen. Neu wird dies nicht mehr direkt über die Referenz gemacht, sondern über odba_id/object_id – damit ist diese Information garantiert kein Hindernis für die GC mehr – bis jetzt wurden die Referenzen jeweils rechtzeitig entfernt und ‘sollten’ auch keine Rolle gespielt haben, jetzt gibts dafür eine Garantie.
  • Ebenfalls aus CacheEntry entfernt wurde der @collection-Eintrag. Beim Speichern einer Collection ist es notwendig zu wissen, welche Elemente der Collection bereits in der Datenbank liegen, welche gelöscht werden müssen, und welche neu hinzukommen. Dies wurde bis jetzt mit eben diesem @collection-Eintrag gelöst. Neu werden die bestehenden Daten jeweils direkt von der DB bezogen.
  • Bugfix: die Ausgabe einer Fehlermeldung führte bei einer speziellen Konstellation zu einem Memory-Spike. (konkret: Narcotic#to_s ist abhängig von den Substanzen in Narcotic@substances. Bei einigen instanzen von Narcotic war das entsprechende Objekt aber gelöscht oder nie gespeichert worden. Dies sollte eigentlich mit einer Fehlermeldung vermerkt werden; da die Fehlermeldung aber ODBA::Stub@container.to_s als Bestandteil hatte, ergab sich ein unendlicher Loop von Exceptions) Dieser Bug nahm die Hälfte der 3 Tage in Anspruch. Ich konnte ihn schlussendlich nur dank dieses Tools finden.

Aargau pfeift Ärzte bezüglich Versandapotheken zurück

Aus der NZZ-Online.

Ärzte, die Aktionäre der Versandapotheke zur Rose AG sind, dürfen im Kanton Aargau künftig keine Medikamente mehr über diesen Vertriebskanal verkaufen. Dieser Versandhandel stelle eine Umgehung des Selbstdispensationsverbotes dar, hält das Aargauer Gesundheitsdepartement in einem am Dienstag veröffentlichten Entscheid fest. Das Departement heisst damit eine Beschwerde der Aargauer Apotheker gut.

Was ist los in der Schweiz? In Deutschland gehören die Versandapotheken schon lange zur Tagesordnung und tragen deutlich zu tieferen Medikamentenkosten bei. Man muss unterscheiden zwischen dem Versandhandel und der Entstehung von Medikamenten-Versandhandel mit ausschließlich Ärzten als Eigentümern. Der Versandhandel muss in der Schweiz weiter liberalisiert werden.

Die Patiententaxe ist ein weiteres Problem. Viele Versandapotheken verzichten auf die Patiententaxe. Bei einigen Apotheken kann sich der Preis für das Medikament verdoppeln wegen der Patiententaxe. Die Patiententaxe wird z.T. auch dann verlangt, wenn der Kunde keinen Eintrag in das Patienten-Dossier wünscht. Die Stadelhofen-Apotheke verlangt z.B. keine Patiententaxe weil sie die Fragerei der Kunden leid ist und zuviel Zeit verliert.

America’s health-care market is not as unfettered as it seems

From this weeks Economist. The article concludes as following:

If America’s health-care regulations are as costly as they claim, the system is merely masquerading as a free-market model and may be no better than others.

Indeed all the health care systems of the modern democracies are horribly slow and have overpriced services. The US health care system leads the pack in terms of “bad service” and “high price”.

Bank on your health

Beyond the Blockbuster

From this weeks Economist.

But as Graham Higson, head of regulatory affairs at AstraZeneca, a British drugs firm, notes: “There’s no such thing as no risk. The industry simply cannot continue developing drugs exactly the same way it has for 40 years.”

Well I agree. After 40 years the industry needs a healthy shift. Like in marriage, you constantly have to reinvent yourself. Just “buying” a new spouse won’t get you very far.