Sunday, May 23, 2010

A hack

We had an interesting problem in the lab yesterday. The Snowglobe 2.0 Second Life Viewer allows "Shared Media," which comes down to web pages and the like being displayed on objects in Second Life.

It appears that whenever the user walks their avatar up to an object (called a "prim") that has a web page displayed on it, a process is forked called SLPlugin. As the avatar gets moved away from the object, at some point, the process is terminated. Usually. Most of the time.

But occasionally, some of these processes survive. They seem to be very busy reading from a TCP connection to localhost that has been closed on the other end and gives an EAGAIN error over and over again. Anyway, whatever they do, they take up memory.

At some point, these SLPlugin processes fill up RAM. On our machines, it takes about 100 of them. The Linux OOM killer then kicks in and kills the main Second Life process. Kaboom.

The solution comes from an option to pkill: -o. It lets you kill the oldest of the processes matching some description. If you want to kill all but the 50 most recent SLPlugin processes, using zsh:

while [[ `pgrep SLPlugin | wc -l` > 50 ]]; do pkill -o SLPlugin; done

Wrap that in another loop to execute it every 10 seconds. VoilĂ . It seems that in our application, there are never more than 50 non-stale plugin processes, and it all works.