Socket accept and too many open files
26 Jun 2009PROBLEM
Socket accept()
fails with error Too many open files.
WHAT DOES THIS MEAN
It means the application is using more file descriptors than the maximum allowable set.
WHAT TO DO
Let’s start by finding the pid of our program using the following command:
ps axf | grep programName
Example:
[root@cow ~]$ ps axf | grep nags
11476 pts/0 S+ 0:01 | _ nags - current state : working
Continue by counting the file descriptors opened by our application ls -l /proc/pid/fd/ | wc -l
Example:
[root@cow ~]$ ls -l /proc/11476/fd/ | wc -l
7
[root@cow ~]$
Let’s now use ulimit -n
to find the maximum limit set for open files. Alternatively you can run ulimit -a
and look for open files:
Example:
[root@cow ~]$ ulimit -n
1024
[root@cow ~]$
There are two possible scenarios here: either your application is actually using that many file descriptors or you have a leakage somewhere produced by not properly closing file descriptors after they did their job. In order to see which of the two we’re dealing with, try to count your active connections or opened files using netstat -ap | grep pid/ | wc -l
Example:
[root@cow ~]$ netstat -ap | grep 11476/ | wc -l
3
[root@cow ~]$
If the application is indeed using as many file descriptors as the currently maximum limit set, the solution is to increase the maximum open files limit by using ulimit -n maxFDs
Example:
[root@cow ~]$ ulimit -n 65536
Please note that you must do this before starting your application.
If the application is not actively using that many file descriptors it means you have a leakage somewhere. Start verifying what happens to your sockets and whether you close them properly when the connection is lost or interrupted. Monitor the number of file descriptors before connecting a test client and after disconnecting it. It should be pretty obvious if there’s something wrong.
Useful debug commands
Here are some useful commands for debugging your application:
List the file descriptors used by a certain application
ls -l /proc/<pid>/fd/
Example:
[root@cow ~]$ ls -l /proc/11476/fd/
total 6
lrwx------ 1 root root 64 May 28 12:29 0 -> /dev/pts/0
lrwx------ 1 root root 64 May 28 12:29 1 -> /dev/pts/0
lrwx------ 1 root root 64 May 28 12:29 2 -> /dev/pts/0
lrwx------ 1 root root 64 May 28 12:29 3 -> socket:[230053]
lrwx------ 1 root root 64 May 28 12:29 4 -> socket:[230054]
lrwx------ 1 root root 64 May 28 12:29 5 -> socket:[230111]
[root@cow ~]$
List the file descriptors used by a certain application
netstat -ape | grep pid/
Example:
[root@cow ~]$ netstat -ape | grep 11476/
tcp 0 0 192.168.3.115:8888 *:* LISTEN root 230053 11476/nags - curren
tcp 0 0 192.168.3.115:50651 192.168.3.110:mysql ESTABLISHED root 230054 11476/nags - curren
tcp 0 0 192.168.3.115:50653 192.168.3.115:9000 ESTABLISHED root 230111 11476/nags - curren
[root@cow ~]$
Count the sockets in TIME_WAIT on a certain port
netstat -an | grep ':port[[:space:]]' | grep TIME_WAIT | wc -l
Example:
[root@cow ~]$ netstat -an | grep ':80[[:space:]]' | grep TIME_WAIT | wc -l
54
[root@cow ~]$
If you happen to have a large number of sockets in TIME_WAIT
, it may be a sign of not properly closing the sockets.