SWAP 0Kb, something is not well

jnarvaez · Sep 18, 2006

Hi, this is my top:

top - 16:23:32 up 8:13, 1 user, load average: 0.27, 0.39, 0.27
Tasks: 180 total, 1 running, 179 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.8% us, 0.3% sy, 0.0% ni, 98.7% id, 0.2% wa, 0.0% hi, 0.0% si
Mem: 2072640k total, 1778132k used, 294508k free, 182888k buffers
Swap: 1052248k total, 0k used, 1052248k free, 759352k cached

As you can see, the swap is 0, this could seems really good, but sometimes I only have a few kbs of physical memory free and again 0kb swap used. I think this is not normal, and i'm experiencing some hangs up in my box.

is swap bad configured?

[]# cat /proc/swaps
Filename Type Size Used Priority
/dev/sda3 partition 1052248 0 -1
[]#

[]# df
S.ficheros Bloques de 1K Usado Dispon Uso% Montado en
/dev/sda2 5162828 3624740 1275828 74% /
/dev/sda1 505604 15934 463566 4% /boot
none 1036320 0 1036320 0% /dev/shm
/dev/sda6 34906088 24255680 8877236 74% /home
/dev/sda5 35278540 21064844 12421648 63% /var

[]# fdisk -l

Disco /dev/sda: 80.0 GB, 80000000000 bytes
255 cabezas, 63 sectores/pista, 9726 cilindros
Unidades = cilindros de 16065 * 512 = 8225280 bytes

Disposit. Boot Start End Blocks Id System
/dev/sda1 * 1 65 522081 83 Linux
/dev/sda2 66 718 5245222+ 83 Linux
/dev/sda3 719 849 1052257+ 82 Linux swap
/dev/sda4 850 9726 71304502+ 5 Extendida
/dev/sda5 850 5311 35840983+ 83 Linux
/dev/sda6 5312 9726 35463456 83 Linux

any help would be appreciated

wagnerch · Sep 19, 2006

Memory is not your issue based on that particular instance. You have about 1GB of RAM available. Free mem = free + buffers + cache, as you can see you have 800MB in cache, 180MB in buffers, and 294MB free. Linux will automatically reduce the cache and buffers as memory demands increase. Best thing to look at is the Total: line of "free -t".

Have you looked in your syslog (/var/log/messages) ?

jnarvaez · Sep 20, 2006

free -t

total used free shared buffers cached
Mem: 2072640 1911552 161088 0 297240 364068
-/+ buffers/cache: 1250244 822396
Swap: 1052248 11848 1040400
Total: 3124888 1923400 1201488

in /var/log/messages i don't see anything anormal

thank for your help

wagnerch · Sep 20, 2006

If your server is "freezing", then you need to look for an Oops in the syslog. Also, can you describe what you mean by freezing? Does the server recover itself, or does it have to be power cycled? Provide as much detail as you can think of, and provide all of the things you looked at and thought of.

It is possible your server has bad memory, or some other faulty component that may be causing it to freeze or hang. Sometimes it is something as simple & stupid as a CPU fan. How long has the server been in operation?

jnarvaez · Sep 20, 2006

what do you mean with an "Oops"? I have searched for "Oops" in my /var/log/messages and nothing found.

My server isn't really freezing, you can establish a connection to any open port (e.g. ssh or pop3) but don't recieve any response.

wagnerch · Sep 20, 2006

Oops is a kernel panic, but your issue is something different. Have you monitored the server when these issues occur? Have you waited say a few minutes and eventually pop3 responds? How do you recover? Is it one source IP or any source IP that has a problem? Does it always happen around the same time every day? Based on what you have sent so far, it doesn't appear to be a memory issue -- but it does appear to behave like a memory issue. Did you provide the top results after a reboot or at the time the issue was occurring?

The more detail you can provide the more it would help out.

jnarvaez · Sep 21, 2006

thanks for your help wagnerch, i will try to explain you my problem as much as i can.

when the server "freeze", it simply stops responding and nothing strange in /var/log/messages, it simply stops and pop3 doesn't respond if you wait 1 hour, only connection opened but no respond. To recover it i have to reboot the entire system. The problem is with every source IP and every destination IP (i have several ips in the same server). The problem usually ocurrs from 2.00 to 9.00am (gmt+1) and not all the days. The top provided was in "working mode" without any problem. When i have the problem i can't do a top.

Here is new top, from now:
top - 10:44:57 up 3 days, 2:35, 2 users, load average: 0.61, 0.66, 0.62
Tasks: 226 total, 1 running, 225 sleeping, 0 stopped, 0 zombie
Cpu(s): 12.7% us, 3.1% sy, 0.0% ni, 77.9% id, 5.6% wa, 0.2% hi, 0.5% si
Mem: 2072640k total, 2007092k used, 65548k free, 352836k buffers
Swap: 1052248k total, 9064k used, 1043184k free, 442076k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24439 root 18 0 45632 38m 6676 S 16.9 1.9 1:17.13 spamd
5577 apache 15 0 56948 33m 20m S 3.3 1.7 0:22.58 httpd
31196 apache 15 0 48812 27m 17m S 1.0 1.4 0:00.21 httpd
5409 apache 15 0 55568 32m 20m S 0.7 1.6 0:05.80 httpd
19238 apache 16 0 54240 30m 19m S 0.3 1.5 0:02.08 httpd
28330 qmails 16 0 2192 492 1324 S 0.3 0.0 0:01.11 qmail-send
32099 root 16 0 3720 1044 1680 R 0.3 0.1 0:01.33 top
32744 canalxau 15 0 5996 2268 3764 S 0.3 0.1 0:00.14 in.proftpd
1 root 16 0 1696 576 1424 S 0.0 0.0 0:02.47 init
2 root RT 0 0 0 0 S 0.0 0.0 0:01.81 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.99 ksoftirqd/0
4 root RT 0 0 0 0 S 0.0 0.0 0:01.54 migration/1
5 root 34 19 0 0 0 S 0.0 0.0 0:01.13 ksoftirqd/1
6 root 5 -10 0 0 0 S 0.0 0.0 0:00.00 events/0
7 root 5 -10 0 0 0 S 0.0 0.0 0:00.00 events/1
8 root 7 -10 0 0 0 S 0.0 0.0 0:00.00 khelper
9 root 15 -10 0 0 0 S 0.0 0.0 0:00.00 kacpid

wagnerch · Sep 21, 2006

Have you tweaked any of your mysql settings or apache settings to deal with client load? If so show us the configuration files.

A top of the system working doesn't really tell us much other then it is fine. It would be a good idea to login and stay logged in, and do a few things when it starts going down the tubes:

free -t
ps fuxwa

Another thing is tell us what kind of services you are running, it is mostly Apache and MySQL? Is there anything unusual running as the "apache" user?

jnarvaez · Sep 21, 2006

my.cnf

[mysqld]
safe-show-database
innodb_data_file_path=ibdata1:10M:autoextend
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock

old_passwords=1

max_allowed_packet = 16M

max_connections = 600
max_connect_errors = 10

wait_timeout=100
connect_timeout = 10
interactive_timeout=100

query_cache_limit = 1M
query_cache_size = 64M
query_cache_type = 1

table_cache = 1024
thread_concurrency=2

key_buffer=150M
join_buffer_size = 1M
record_buffer=1M
read_buffer_size = 1M
sort_buffer_size = 1M
read_rnd_buffer_size = 1M
myisam_sort_buffer_size=64M

#thread_cache_size = 256
thread_cache_size = 128

[mysql.server]
user=mysql
basedir=/var/lib

[safe_mysqld]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

[mysqldump]
quick
max_allowed_packet = 16M

[myisamchk]
key_buffer = 64M
sort_buffer = 64M
read_buffer = 16M
write_buffer = 16M

httpd.conf

KeepAlive On
MaxKeepAliveRequests 1000
KeepAliveTimeout 10
<IfModule prefork.c>
StartServers 5
MinSpareServers 32
MaxSpareServers 64
ServerLimit 400
#MaxClients 256
MaxRequestsPerChild 0
</IfModule>

<IfModule worker.c>
StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
</IfModule>

Normally I'm logged to the server all the day with top started to see load level and which services eat more resources (usually apache, mysql, qmail and spamassassin)

Attached is a file with a "ps fuxwa" from right now.

ps.txt

wagnerch · Sep 21, 2006

I see a few problems right away...

mysql 1851 0.0 7.8 423116 163140 ? Sl Sep18 0:22 \_ /usr/libexec/mysqld --defaults-file=/etc/my.cnf --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-locking --socket=/var/lib/mysql/mysql.sock
root 25869 0.0 7.6 419084 158284 ? Sl Sep18 0:25 /var/tomcat5/jdk1.5.0_06/bin/java -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/var/tomcat5/conf/logging.properties -Djava.endorsed.dirs=/var/tomcat5/common/endorsed -classpath :/var/tomcat5/bin/bootstrap.jar:/var/tomcat5/bin/commons-logging-api.jar -Dcatalina.base=/var/tomcat5 -Dcatalina.home=/var/tomcat5 -Djava.io.tmpdir=/var/tomcat5/temp org.apache.catalina.startup.Bootstrap start

1. mysql has a virtual size of 400MB, now that could mean there is a lot of shared libraries that are mmap'd in or 240MB is swapped.
2. Tomcat has a virtual size of 400MB, similiar to mysql. Not suprising, Tomcat can be a tad bit of a pig. If you don't use it, shut it down.

As for mysql, why on earth did you tweak ALL of those settings? If you actually get 600 connections your server will ultimately die.

http://jcole.us/blog/mysql/proposal-for-constraining-memory-usage/

Based on Jeremy Cole's calculations, worst case scenario is mysql would consume somewhere around 58GB of RAM. Naturally it is not necessarily going to happen, because not all clients will be sorting and creating temp tables at the same time. MySQL functions best with few clients and fast fast fast queries, the goal should be to get them in and out as quickly as possible. One domain can cripple mysql with a badly tuned query.

My recommendation is reduce the number of connections, reel some of those parameters back into reality and turn of slow query logging and start knocking those out. Trust me, I have seen this scenario quite a few times. If you can't tune the queries, then you may want to identify those bad tables/domains and try moving them to InnoDB tables. InnoDB tables do not have the same concurrency issues that MyISAM tables do. MyISAM tables use read locks to achieve read consistency, where as InnoDB uses a transaction log to roll back changes so you achieve a read consistent view. Other databases, such as Oracle and PostgreSQL also use a transaction log to roll back changes. Oracle calls it undo or rollback segments, depending on the version of Oracle you are using.

jnarvaez · Sep 22, 2006

hi again wagnerch,

I really need tomcat, i have some domains using jsp so i need tomcat.

About mysql, as you can see i'm not an expert, i tweaked these settings trying to get better results. I need 600 connections, i had 500 and sometimes i got "Too many connections" message.

I had turned on slow query in mysql to try identify slow querys, i suppose i have to wait at least a day to see something in logs.

thank one more time.

wagnerch · Sep 22, 2006

I doubt you need 600 connections, you need to look at *why* you have 600 concurrent connections and figure out how to reduce it.

Otherwise plan on increasing physical RAM from 2GB to 8GB and hope for the best.

jnarvaez · Sep 22, 2006

i'm agree with you, i should not need 600 connections (i have 300 domains but only about 5 with medium traffic), i hope logging slow querys will help.

So, do you think most of my problems are related to mysql?

I don't want to spend money upgrading ram, i think is better spend time optimizing mysql.

wagnerch · Sep 22, 2006

I would say yes, but with anything there can be many factors at play. You really need to monitor the memory usage of mysql and see what happens there as the system degrades.

vmstat, iostat, mytop (mysql top), and mysqlreport are also good tools.

jnarvaez · Sep 28, 2006

sorry for my delay, i have been very busy these days. This is what my slow query log shows today:

[root@lincl89 log]# cat slow-queries.log

/usr/libexec/mysqld, Version: 4.1.15-log. started with:

Tcp port: 3306 Unix socket: /var/lib/mysql/mysql.sock

Time Id Command Argument

# Time: 060923 3:55:14

# User@Host: admin[admin] @ localhost []

# Query_time: 18 Lock_time: 0 Rows_sent: 5174 Rows_examined: 5174

use himilce;

SELECT /*!40001 SQL_NO_CACHE */ * FROM `CMS_BACKUP_CONTENTS`;

# Time: 060923 3:55:28

# User@Host: admin[admin] @ localhost []

# Query_time: 12 Lock_time: 0 Rows_sent: 3094 Rows_examined: 3094

SELECT /*!40001 SQL_NO_CACHE */ * FROM `CMS_OFFLINE_CONTENTS`;

/usr/libexec/mysqld, Version: 4.1.15-log. started with:

Tcp port: 3306 Unix socket: /var/lib/mysql/mysql.sock

Time Id Command Argument

/usr/libexec/mysqld, Version: 4.1.15-log. started with:

Tcp port: 3306 Unix socket: /var/lib/mysql/mysql.sock

Time Id Command Argument

what do u think?
I'm going to monitor other things. I have to say I haven't got any more problems since last week.

jnarvaez · Sep 28, 2006

this is the result of mytop:

MySQL on localhost (4.1.15-log) up 6+05:21:35 [17:43:46]
Queries: 9.2M qps: 18 Slow: 10.0 Se/In/Up/De(%): 53/01/04/01
qps now: 12 Slow qps: 0.0 Threads: 82 ( 1/ 62) 53/03/09/00
Cache Hits: 3.8M Hits/s: 7.5 Hits now: 3.0 Ratio: 77.4% Ratio now: 48.4%
Key Efficiency: 99.9% Bps in/out: 1.1k/ 1.8k Now in/out: 1024.1/ 3.9k

Id User Host/IP DB Time Cmd Query or State
-- ---- ------- -- ---- --- ----------
517641 admin localhost loading_tv 0 Query show full process
517688 caforner localhost eckermann- 14 Sleep
517684 hlpromoci localhost hlpromocio 17 Sleep
517619 homelife localhost homelifesp 37 Sleep
517516 homelife localhost homelifesp 38 Sleep
517620 homelife localhost homelifesp 48 Sleep
517618 hlpromoci localhost hlpromocio 49 Sleep
517603 homelife localhost homelifesp 54 Sleep
517571 homelife localhost homelifesp 70 Sleep
516646 homelife localhost homelifesp 71 Sleep
516290 homelife localhost homelifesp 103 Sleep
517518 homelife localhost homelifesp 103 Sleep
517485 homelife localhost homelifesp 122 Sleep
517473 hlpromoci localhost hlpromocio 127 Sleep
514229 homelife localhost homelifesp 189 Sleep

this is from vmstat:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 152188 29216 163892 731912 0 1 15 24 14 18 6 2 89 3

this is from iostat:
Linux 2.6.9-1.667smp 28/09/06

avg-cpu: %user %nice %system %iowait %steal %idle
6,18 0,00 2,16 2,84 0,00 88,81

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 31,27 252,43 764,49 226809670 686892510

wagnerch · Sep 28, 2006

You should use those tools when your having a problem, get familiar with them by reading their man pages.

Typically I use "vmstat 1 9999" and watch it while I have a problem. It is a good idea to get some baseline measurements, what are the typical ranges for these values during low-usage and high-usage periods. Once you see something out of the ordinary, then you need to use the other tools to investigate what is happening.

jnarvaez · Sep 29, 2006

ok, i will be trying these tools, thanks again

SWAP 0Kb, something is not well

jnarvaez

Basic Pleskian

wagnerch

Guest

jnarvaez

Basic Pleskian

wagnerch

Guest

jnarvaez

Basic Pleskian

wagnerch

Guest

jnarvaez

Basic Pleskian

wagnerch

Guest

jnarvaez

Basic Pleskian

wagnerch

Guest

jnarvaez

Basic Pleskian

wagnerch

Guest

jnarvaez

Basic Pleskian

wagnerch

Guest

jnarvaez

Basic Pleskian

jnarvaez

Basic Pleskian

wagnerch

Guest

jnarvaez

Basic Pleskian

Similar threads