RE: [Nolug] from Wimprine, Thomas on 2003-12-18 (nolugarchives)

From: Wimprine, Thomas <twimprine_at_stei.com>
Date: Thu, 18 Dec 2003 16:46:25 -0600
Message-ID: <30397D20E848D2119BA70008C724E28D118BFDE4@lajeffeex01.stei.com>

This box was built to do our email/spam/virus filtering, it's also running
BIND. It's not always stuck like this but when it does get hit it stays
there all day.
It's on server hardware Dual 1.0Ghz, 640MB RAM with hardware RAID 1, RAID 0
just doesn't work for me here. This is a case of a test going so well, there
was no time to plan until I realized it was running like crap, and now it's
critical. Now we have to do something.

Dustin, I'll order the book tomorrow after I get paid. My Christmas present
to me. ;)

Thanks everyone,
Thomas

-----Original Message-----
From: scotth@scottharney.com [mailto:scotth@scottharney.com]
Sent: Thursday, December 18, 2003 1:21 PM
To: nolug@joeykelly.net
Subject: Re: [Nolug]

"Wimprine, Thomas" <twimprine@stei.com> writes:

It looks that way. What do your ears tell you? ie. do you here the
telltale sound of disk "thrashing?". Then the question is, what's
causing it. How much RAM is in this box?

So then is you applicatoin being more heavily utilized than you
expected or perhaps there is a memory leak in the app. Perhaps there
are some application-level optimizations you can deploy to limit the
swapping. Generally speaking, try and identify the problem app and
try and fix it first before moving outward to Operating System and/or
hardware fixes.(though you should have a base level of OS performance
expected ie. DMA on IDE drives and things like that)

> Memory problem?
>
> procs memory swap io system
> cpu
> r b w swpd free buff cache si so bi bo in cs us
sy
> id
> 0 15 1 735528 24320 11512 50556 3 2 1 2 3 2 1
2
> 1
> 0 15 4 728744 6704 12112 46152 1282 990 1618 1071 683 870 15
5
> 80
> 0 14 2 756392 6728 11916 45508 674 3304 816 3345 791 431 6
3
> 91
> 0 18 2 774876 6640 11864 46400 966 2754 1260 2809 798 602 12
4
> 84
> 3 14 2 783808 6644 11856 49724 1184 1784 1600 2242 1052 714 15
> 5 80
> 1 17 2 806468 6636 11332 45736 1666 3054 2065 3107 688 601 12
> 3 84
> 0 16 2 827192 6648 11104 49448 1678 3085 2364 3153 715 818 18
> 4 78
> 0 14 2 832764 6648 11284 50132 879 2012 1334 2068 761 759 11
3
> 86
>
> -----Original Message-----
> From: -ray [mailto:ray@ops.selu.edu]
> Sent: Thursday, December 18, 2003 12:30 PM
> To: 'Nolug'
> Subject: Re: [Nolug]
>
> On Thu, 18 Dec 2003, Wimprine, Thomas wrote:
>
>> I've looked in the top man page and didn't see anything real quick. What
>> scale does the "load average" use? At what point is the system running at
>> 100%?
>
> The 3 load average numbers are the average number of processes in the run
> queue for the past 1, 5, and 15 minutes. There really is no "scale", it
> just depends on how powerful your machine is. A large machine might run
> fine with a consistent load avg over 10, whereas a smaller machine would
> start choking around 3.
>
> Running at 100% of what? The main things you want to check is cpu,
> memory, disk i/o, and network i/o. Vmstat is a quick and powerful tool to
> check cpu and memory. Run 'vmstat 10', wait 10 seconds, and look at your
> cpu columns (us=user, sy=system, id=idle) to see how much idle cpu time
> you have. If idle cpu time is always low (under 20%), then you have a cpu
> bottleneck. Then look at the swap columns (si=swapin, so=swapout). If
> you're constantly swapping, then you might have a memory problem. You can
> also run 'iostat -x 10' to check which partition/disk is busiest, to
> pinpoint any disk/controller i/o bottlenecks.
>
> ray
>
> ___________________
> Nolug mailing list
> nolug@nolug.org
> ___________________
> Nolug mailing list
> nolug@nolug.org
>

-- 
Scott Harney<scotth@scottharney.com>
"...and one script to rule them all."
gpg key fingerprint=7125 0BD3 8EC4 08D7 321D CEE9 F024 7DA6 0BC7 94E5
___________________
Nolug mailing list
nolug@nolug.org
___________________
Nolug mailing list
nolug@nolug.org

Received on 12/18/03