

Server health report ~]# cat health-hostlinuxmx-221115-1031.txt | more Last Reboot Time : `who -b | awk 'GB\t$(($FREESWAP * 100 / $TOTALSWAP ))%įILENAME="health-`hostname`-`date +%y%m%d`-`date +%H%M`.txt"Įcho -e "Reported file $FILENAME generated in current directory." $RESULTĮcho "The program 'mail' is currently not installed."Ĭat $FILENAME | mail -s "$FILENAME" $EMAIL #who command is used to get last reboot time, awk for processing output

#uptime command used to get uptime, and with sed command we cat process output to get only uptime. #uname command with key -r returns Kernel version Health Check Report (CPU,Process,Disk Usage, Memory) #Print header, hostname (hostname command used), Kernel version (uname -r), Uptime (from uptime command) and Last reboot time (from who command) #We will create function to easily manage what to do with output. If no email provided – log file will be just saved. #Here we put email address to send email with report. Let's check in detail about this script which helps to monitor the Linux server. If the load average calculation which use sampling to get the run queue value happen to pick the number at this very peak moment, the load average value will be strongly biased for several minutes or dozens of minutes.Note: To output all results correctly, make sure all the above commands working. Unfortunately, on Linux, the load average is considering a thread uninterruptible state to be CPU load while in fact, the CPU is idle and free to do other tasks so you should pay attention to that factor and identify potential cases where the load average is high but the actual contention is low.įinally, there might have situations where the run queue is very high but for a very limited period of time. This is what the load average is designed to show. measuring how many threads are using and competing for the vCPU resources. If multi threaded and CPU bound, it will be detected by both commands, but you have to make sure it is not a legit application or daemon which is loading your machine.Īnother more useful metric is derived from the CPU contention, i.e. What you can do is get the average load for each vCPU during a period of time (mpstat provides that) or the average for all vCPUs combined (vmstat).Įven fully CPU bound, if the hostile CPU consumer is single-threaded, it might not blatantly show up in the latter case because other vCPUs might be idle. There is no such thing as instantaneous 80% busy CPU.

Not to mention a single vCPU can only be either 100% idle or 100% busy at any given time. It is then not that simple to define how to compute a CPU load. Moreover, cores have often variable clock speed.

Modern CPUs have multiple cores and often each core supports multiple threads. Check CPU usage, if it is higher than 80% (as example) Do something.
