These steps needs to be done in order.
1. Get the data in. Use one of these sourcetype names "access_common", "access_combined", "iis", "apache:access" or "aws:cloudfront:accesslogs".
2. Configure the sites that you want to monitor.
3. Run "Generate Session" and "Generate pages" lookup searches.
4. Enable Data Model Acceleration for Web datamodel.
5. Configure goals (Optional)
The Splunk App for Web Analytics currently supports data from Apache, IIS and AWS Cloudfront logs. Make sure you use the sourcetypes access_common, access_combined, iis, apache:access or aws:cloudfront:accesslogs for this data. If you already have data in Splunk under a different sourcetype you can use sourcetype renaming or by modifying the eventtype web-traffic to include the names of your sourcetypes.
If you plan on using the sourcetype apache:access, you need to install the prerequisite Add-on as this app builds on top of the base field extractions from the Add-on for Apache:
If you plan on using the sourcetype aws:cloudfront:accesslogs, you need to install the prerequisite Add-on as this app builds on top of the base field extractions from the Splunk Add-on for Amazon Web Services:
The app comes with two sets of sample data for Apache and IIS. You can enable these static sample inputs by going into Settings->Data inputs->Files & Directories
If your data is stored in an index that is not searched by default for your Splunk user, you need to add All non-internal indexes (or the specific index in question) to the Selected indexes in Access controls -> Roles -> [ROLE NAME]
The Splunk App for Web Analytics works in a multi website environment. Websites are configured from a combination of the host and the source field. Each event with that unique combination will be tagged with the corresponding website name in the field "site". You can use wildcards (*) in the Source and Host field to select multiple files matching a pattern. There is a website setup form page that allows you to add these in an easy way.
Here are some examples of valid website configurations with or without wildcards
Site Host Source
roadrunner.com server1 /var/log/httpd/access_log
roadrunner.com server2 /var/log/httpd/access_log
Site Host Source
roadrunner.com server /var/log/httpd/access_
The data in the setup form will be stored in the lookup file called WA_settings.csv. You can also manually edit this file. The websites setup page can be found under Setup->Websites.
Once the data has been imported run the two lookups "Generate user sessions" and "Generate pages". These will be used throughout the app. Once run the first time, they will automatically be updated via two scheduled searches that runs every 10 minutes that adds any new data coming into the app. Running these lookup searches might take a long time depending on how much data you have in Splunk but its important you let the searches finish before you move on to the next step. If you have too much data to run these for everything you can modify the time period to something less than "All time" which is the default time period. The lookup reports can be found under Setup->Lookups or by using the links above. It's important that thes searches return results. If not, the app will not work.
The Splunk App for Web Analytics uses data model acceleration extensively to power the dashboards. Once the lookups in the previous step has completed you should enable acceleration for the data model "Web". The data model can be found under Settings->Data models. Set the summary range appropriately depending on how long you want to keep the data, > 1 Month. The data model is updated every 10 minutes in order for the sessions to get picked up properly. The data model acceleration needs to finish before you will see any data in any dashboard except the "Real-Time" dashboard which uses raw log data as source. That means that you initially might not see data until the data model has finished building. This could initially take many hours depending on how much data it is trying to build.
If you want to monitor certain browsing paths or pageviews you can configure goals. This is used if you for instance want to get conversion rates or funnel abandonment rates. You can find the Goals setup page under Goals->Goals Setup.
The goals are stored in a summary index called "goal_summary".
When enabling goals, the app will start monitor goal completions from the time you save the goal. To backfill goals there is a search called "Generate Goal summary - Backfill" which can be found under the Goals menu. Please note that running this search multiple times will mean the goal completions will be duplicated. To reset the goals you need to clean the "goal_summary" index.
The app uses a third party user agent parsing library that gets updated regularly. If you want to manually update these definitions you can download a yaml file here:
Copy the file "regexes.yaml" the folder
This update should take affect immediately.
Version 2.0.0 of the app has made a small change to the Web datamodel to increase compatibility with more Splunk versions. This might trigger a data model rebuild when upgrading. If you want to prevent this from happening, use the old data model definition file "web.conf" and delete the one provided wby this app. It's recommended to use the new version of web.conf as this is the version that will be used moving forwards.
The User Journey Flow dashboard now uses the official Sankey vizualisation add-on that needs to be downloaded separately in order for this dashboard to work. You can find this add-on here: https://splunkbase.splunk.com/app/3112/
The goal_summary index is now not created by the app. You need to manually add this index if you are using this feature in the app. All old data will be retained even if the index is not created by the app, just create the index manually and it will work.
There is a new dashboard, "Response Times" which help you find the slowest resources on your site. Your web server might not output the response time in the log by default so this needs to be enabled in order to make this dashboard work. On IIS this is often pre-enabled but on Apache and NGINX you need to add %D to the end of the log format settings of the server. More details on how to enabled this here:
If you add the %D to the end of the log format for the access_combined sourcetype, the field extractions will work by default.
Version 1.6 of the app uses the KV Store for the session lookups instead of a CSV file. This feature will only work on Splunk version 6.3 and above.
For 6.2 support of the app, you need to continue using a CSV file for the lookup.
To enable this you should replace the contents of, or the file:
with the corresponding 6.2 compatible file that can be found under
Restart Splunk after this is done.
Splunk Answers thread on Splunk App for Web Analytics.
A lot of the problems customers have with the app have already been solved.
In the context of the app, try and do the search for:
Based on the output of this search check the following
No data returned
If this is not returning any results I suspect you are not seeing the data because it is stored in a non-default index and the user in Splunk does not search in non-default indexes automatically. Another issue might be that you are not using any of the pre-configured sourcetypes. See Setup point 1 above.
Site field not present
If this is returning results, double check that each entry has the "site" field populated. It's crucial that this field exists in your data. See Setup point 2 above.
File field not present
Another field that is known to cause problems is the "file" field. This needs to be present in your field extractions and if it is not, you will not see the "eventtype-=pageview" which is necessary for the app to work. Make sure this is extracted correctly.
As the app relies heavily on data model accelerations you will not see anything in any dashboards (except the "Real-Time" ones) until this acceleration has completed. Initially this could take a while. There is a "Data Model Audit" dashboard that will tell you if the acceleration is complete or not.
The user agent parsing is based on an add-on developed by David Shpritz (TA-user-agents) who in turn uses a Python module from:
Added support for the sourcetypes "apache:access" and "aws:cloudfront:accesslogs"
Added new panel "Session Time Distribution" on the Behavior dashboard
Fixed bugs in the comparison feature for the timecharts
Updated the user agent string parsing
- Added comparison time period throughout. Now easy to compare to last week or yesterday etc.
- The User Journey Flow dashboard now uses the official Sankey vizualisation add-on that needs to be downloaded separately in order for this dashboard to work.
- There is a new dashboard, "Response Times" which help you find the slowest resources on your site.
- Engagement Time per page now calculated
- Updated user agent parsing to take into account new browsers and mobile devices.
Read "Considerations for Upgrading to v2.0.0" in the documentation before upgrading.
Splunk AppInspect evaluates Splunk apps against a set of Splunk-defined criteria to assess the validity and security of an app package and components.
As a Splunkbase app developer, you will have access to all Splunk development resources and receive a 50GB license to build an app that will help solve use cases for customers all over the world. Splunkbase has 1000+ apps and add-ons from Splunk, our partners and our community. Find an app or add-on for most any data source and user need, or simply create your own with help from our developer portal.