Tech Notes

Tech Note #20030001

It appears that there may be a problem in gethostbyname when it is called from a thread task that has been created through the pthreads library. The problem manifests itself as a segment violation in one of the low-level routines called by gethostbyname.

Problem Description:

An email filtering program (MailCorral) that runs as part of the sendmail milter interface experiences random segment violations when run under heavy load. The filter code is run by the milter interface under one or more threads, depending on the mail load, created through the pthreads library.

Both the number of threads active (6-8) and the number of hostnames being resolved on the machine is high, due to the volume of mail being processed and the number of names being resolved through DNS.

The segment violation occurs in a low-level routine, called by gethostbyname, which is in turn called by the filter code. The problem was observed to happen on BSD Unix. The filtering program is written in C and it links to the following libraries (ldd output):


    libkvm.so => /shlib/libkvm.so.0.0 (0x4806a000)
    libgcc.so.1 => /shlib/libgcc.so.1 (0x48073000)
    libc.so.2 => /shlib/libc.so.2 (0x48080000)

The segment violation occurs randomly in code that normally works, while attempting to resolve the host name "localhost". Resolution of this particular host name is usually expected to work, since it does not involve a call to DNS. The name should be able to be resolved through a simple lookup in /etc/hosts which is presumed not to be failure prone.

Problem Discussion:

Mentions of several problems of a similar nature were turned up in a search of the Internet. Segment violations during name resolving appeared to be typical. Also, one fix to serialize multithreaded races in gethostbyname was mentioned.

Although no mention of the exact problem was found in the search, it does appear that the kind of problem encountered is likely in the place where it occurred.

Problem Resolution:

If you have the luxury of upgrading your C libraries to the latest release, you should do so (the routines in question are found in libc.so). The problem may well have been fixed in a later version.

If you do not have the luxury of upgrading your C libraries or if the upgrade does no good, coding around the problem will be required. If possible, move the calls to gethostbyname outside of the thread (i.e. resolve the names before the threads are launched, in the main task). This will have a positive effect on performance too. If it is not possible to move the calls to gethostbyname, try locking them with a mutex to protect their workings with a critical section.

For MailCorral in particular, the calls to gethostbyname have been moved outside of the thread, for the most part. If the problem persists, the mutex solution will be investigated but, so far, no reoccurrence has been observed on the offending system and the problem appears to have been resolved.