The Metrics Add-on for Infrastructure (TA-linux-metrics) can be used on Linux Forwarders to send Operating System metrics to Splunk without using collectd or the HTTP Event Collector (HEC) and it is fully compatible with the "Splunk App for Infrastructure":
https://splunkbase.splunk.com/app/3975/
Note: the output is formatted for multiple-measurement metric data points (Splunk v8.x only) which allows for significant license savings as a single metric data point can now contain multiple measurements and dimensions.
One of the most powerful features of the add-on is the ability to add custom dimensions to each metric.
Use the built-in Setup Page to configure the inputs on a Standalone Instance, or use a Deployment Server to push the add-on to your forwarders.
You can configure the following custom dimensions and they will be added to all of the metrics as above:
Example indexes.conf :-
[metrics_linux]
coldPath = $SPLUNK_DB/metrics_linux/colddb
homePath = $SPLUNK_DB/metrics_linux/db
thawedPath = $SPLUNK_DB/metrics_linux/thaweddb
datatype = metric
Install the add-on on your Linux servers and enable the inputs. Either use the built-in Setup Page, or copy the input stanzas from the default directory to the local directory (i.e. local/inputs.conf) and enable them as required:
If you enable process monitoring, configure the relevant processes to monitor for your environment. Copy the stanza from the default directory to the local directory (i.e. local/process_mon.conf) and configure them as required:
[process_mon]
allowlist = bash,zsh,sshd,python.*
blocklist = splunkd
allowlist
and blocklist
should be comma separated without spacesConfigure the relevant dimensions for your environment. Copy the dimensions from the default directory to the local directory (i.e. local/dims.conf) and configure them as required:
Note: you can set cloud
to aws
or gcp
and the built-in scripts will auto-discover the Region and Availablity Zone of the instance, e.g.
[all]
cloud = gcp
Shell environment variables are also supported, e.g.
[all]
environment = $Deploy_Environment
Note: the region
and dc
do not need to be configured if cloud is aws or gcp, i.e. only set these dimensions if cloud = false
Install the "Splunk App for Infrastructure" on your Search Head
IMPORTANT: Update the 'sai_metrics_indexes' macro, e.g. index=metrics_linux
If you don't see any Entities under 'Investigate' in the Splunk App for Infrastructure :-
Error when enabling inputs via the Setup Page:
Encountered the following error while trying to update: Error while posting to url=/servicesNS/nobody/TA-linux-metrics/data/inputs/script/.%252Fbin%252Fcpu_usage.sh
Run the following search to confirm that metrics are being indexed :-
| mcatalog values(metric_name)
If no results are found, run the following search and specificy your metrics index, e.g.
| mcatalog values(metric_name) WHERE index=metrics_linux
Add the 'metrics_linux' index to "Indexes searched by default" :-
If you see similar errors to the following in 'splunkd.log' on the forwarder :-
11-10-2020 16:26:45.553 +1100 WARN IndexProcessor - The metric name is missing for source=/opt/splunk/etc/apps/TA-linux-metrics/bin/cpu_usage.sh, sourcetype=cpu_usage, host=foo, index=metrics_linux. Metric event data without a metric name is invalid and cannot be indexed. Ensure the input metric data is not malformed. raw=["_time","metric_name:cpu.user","metric_name:cpu.system","metric_name:cpu.nice","metric_name:cpu.idle","metric_name:cpu.wait","metric_name:cpu.interrupt","metric_name:cpu.softirq","metric_name:cpu.steal","model","cloud","region","dc","environment","ip","os","os_version","kernel_version"]
11-10-2020 16:26:45.553 +1100 WARN IndexProcessor - The metric value=<unset> is not valid for source=/opt/splunk/etc/apps/TA-linux-metrics/bin/cpu_usage.sh, sourcetype=cpu_usage, host=foo, index=metrics_linux. Metric event data with an invalid metric value cannot be indexed. Ensure the input metric data is not malformed. raw=["_time","metric_name:cpu.user","metric_name:cpu.system","metric_name:cpu.nice","metric_name:cpu.idle","metric_name:cpu.wait","metric_name:cpu.interrupt","metric_name:cpu.softirq","metric_name:cpu.steal","model","cloud","region","dc","environment","ip","os","os_version","kernel_version"]
metrics_csv
and your forwarder is at least v8.xIf you have set "allowlist = " to monitor all processes but the "process_usage.sh" script uses 100% CPU and takes a long time to run, you may have hit a $PATH bug in one of your system profile scripts :-
# sudo chmod 0750 /etc/profile.d/jdk.sh
Developer:
Contributors:
Bug Fix: removed unnecessary tail command from process_total.sh
Optimized process_total.sh, df_usage.sh, & df_inodes.sh to use a subshell instead of a temp file
Optimized process_usage.sh to use arrays instead of a temp file
Added Inode Usage to the Setup page.
Process monitoring now uses allowlist & blocklist, Updated df usage to use -P option (POSIX compliant), Added new input for df inode usage, and bug fixes.
Initial Release
As a Splunkbase app developer, you will have access to all Splunk development resources and receive a 10GB license to build an app that will help solve use cases for customers all over the world. Splunkbase has 1000+ apps from Splunk, our partners and our community. Find an app for most any data source and user need, or simply create your own with help from our developer portal.