I recently started to test endian firewall.
But a week ago the System & Traffic Graphs started to fail, And I can't see any graph on the web (Proxy graphs keep working).
After searching and googleing I found out that the problem was in /usr/local/bin/rrdfix.sh. With top it appears to take several minutes and more that 10% CPU time to finish.
That script tries to delete all RRD from some dirs, so it shouldn't take too much time:
#!/bin/sh
DIRS="/var/log/rrd/ /var/ntop/"
rm /home/httpd/html/graphs/*.png &>/dev/null
find $DIRS -name "*.rrd" | \
while read F; do
NOW=$(date "+%s")
LAST=$(rrdtool last $F)
if [ "$LAST" -gt "$NOW" ]; then
echo "rrd file $F contains timestamps in future. Remove the file!"
rm -f $F
fi
done
The problem is that if you enable some options on Traffic Monitoring, you'll have thousands of rrd files on /var/ntop/rrd/interfaces/br0/hosts/. And that much files takes a lot of CPU time, a lot of HDD usage and a lot of time. That will "time out" the graph refresh. So the main point is to limit the time used with rrdfix.sh.
My dirty solution was to remove the /var/ntop/ from the previous script. System & traffic graphs are in /var/log/rrd/, so I suppose that Traffic Monitoring will handle rrd's in /var/ntop/, and therefore we don't need to delete rrd's from traffic monitoring on that script. Am I wrong with that?
#!/bin/sh
DIRS="/var/log/rrd/"
# Really needed? /var/ntop/"
rm /home/httpd/html/graphs/*.png &>/dev/null
find $DIRS -name "*.rrd" | \
while read F; do
NOW=$(date "+%s")
LAST=$(rrdtool last $F)
if [ "$LAST" -gt "$NOW" ]; then
echo "rrd file $F contains timestamps in future. Remove the file!"
rm -f $F
fi
done
Also it's important to limit rrd's on Traffic Monitoring to the hosts you need (
https://EFW_IP:3001/editPrefs.html , rrd.hostsFilter option), to reduce the rrd files on /hosts/ folder.
Now the script takes <1 second to finish, CPU usage drop and graph are again displayed.
Check the difference:
From 33.36% CPU time on TUE it drops to 0.39% on FRI. The white bands are when graphs are unavailable due to the rrdfix.sh timing out.