Now that Chavez can accurately report the temperature (see previous blog) I needed to know what was going on when the kernel reported a temperature spike. The kernel had been reporting, every several weeks messages along these lines:
Nov 17 08:02:39 chavez kernel: CPU0: Temperature above threshold Nov 17 08:02:39 chavez kernel: CPU0: Running in modulated clock mode
So, I installed Swatch, a tool that monitors log files and takes action depending on what it finds.
I created the following .swatchrc file and put it in the home directory of a user in the adm group (which allows them to read the log files):
watchfor /chavez kernel: CPU/ threshold on threshold chavez,type=limit,count=1,seconds=60 # $_ gets replaced with the syslog line that matches exec "echo $_ >> /home/mayfirst/swatch.log" exec "top -b -n 1 | head -20 >> /home/mayfirst/swatch.log" exec "sensors | grep temp >> /home/mayfirst/swatch.log"
Then, I ran it with:
swatch --tail-file /var/log/syslog --daemon
I love swatch with one major exception - I can't get it to properly restart if the log file has been rotated out from under it (it needs a deadly kill -9 to end the script).
Fortunately, it only took two days to get a temperature spike. The spike happened while upgrading Drupal from 4.6 to 4.7. The upgrade process drops and adds fields in tables that have massive amounts of records in them (or at least the drupal site I was upgrading did). So - the intense mysql activity seems to have caused the spike (hard drive activity or processor activity - I still am not sure).
The normal sensors temperature is about 40 - 50 C. The kernel alarm went off when the temperature reached 70 C. Running in modulated clock mode seemed to be very effective - as the temperature never got above 72 C.