User Tools

Site Tools


public:it:monitoring:ganglia

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
public:it:monitoring:ganglia [2021/02/11 14:43] – [My attempt at fixing the nvidia ganglia plugin] philpublic:it:monitoring:ganglia [Unknown date] (current) – removed - external edit (Unknown date) 127.0.0.1
Line 1: Line 1:
-====== Ganglia ====== 
  
- 
- 
-====== CLUSTER SUMMARY VIEW FIX ====== 
-Summary view does not appear without applying this fix. Your main page will be blank. 
- 
-In 18.04 and 20.04. Replace ''%%""%%'' with ''%%array()%%'' and summary view will appear. 
- 
-  * [[https://bugs.launchpad.net/ubuntu/+source/ganglia-web/+bug/1822048 | LaunchPad Bug Report ]] 
- 
-<code> 
-/usr/share/ganglia-webfrontend# diff cluster_view.php cluster_view.php.orig 
-26c26 
-<   $context_metrics = array(); 
---- 
->   $context_metrics = ""; 
-</code> 
- 
- 
-====== GPU Monitoring ====== 
-  * [[https://github.com/ganglia/gmond_python_modules/tree/master/gpu/nvidia| nvidia module]] 
-  * [[https://developer.nvidia.com/ganglia-monitoring-system|nvidia main ganglia page]] 
- 
-The above is old. Doesn't work anymore. The patch doesn't apply cleanly either. 
- 
-===== My attempt at fixing the nvidia ganglia plugin ===== 
-Basically a bunch of 2to3 and small edits to make it run properly. If it's enough to actually work is unknown. 
- 
-  * https://github.com/papamoose/gmond-python-module-gpu-nvidia 
-  * https://github.com/papamoose/nvidia-ml-py 
- 
-The page to apply to host_view.php and host_view.php on the webfrontend doesn't apply cleanly anymore so I think the above work was unnecessary. 
- 
-This code does output an xml file with a bunch of relevant data. 
-<code> 
-#!/usr/bin/python3 
-import nvidia_smi 
-print(nvidia_smi.XmlDeviceQuery()) 
-</code> 
- 
-Testing pynvml.py 
-<code> 
-#!/usr/bin/env python3 
-from pynvml import * 
-nvmlInit() 
-print("Driver Version:", nvmlSystemGetDriverVersion()) 
-deviceCount = nvmlDeviceGetCount() 
-for i in range(deviceCount): 
-   handle = nvmlDeviceGetHandleByIndex(i) 
-   print("Device", i, ":", nvmlDeviceGetName(handle)) 
- 
-nvmlShutdown() 
-</code> 
public/it/monitoring/ganglia.1613076231.txt.gz · Last modified: 2021/02/11 14:43 by phil