Accept License Agreements

This app is provided by a third party and your right to use the app is in accordance with the license provided by that third-party licensor. Splunk is not responsible for any third-party apps and does not provide any warranty or support. If you have any questions, complaints or claims with respect to this app, please contact the licensor directly.

Thank You

Downloading Website input
MD5 checksum (website-input_421.tgz) 3b4093653878c4281fd31e61a72b7ed2 MD5 checksum (website-input_42.tgz) 0437f064974d0edc0168e7ceeeb95144 MD5 checksum (website-input_413.tgz) 87c10e5b9292c2cf8c66102e2c3d70bc MD5 checksum (website-input_412.tgz) 0c06f248d25c2d46800f2ff31037feca MD5 checksum (website-input_411.tgz) 19e9ff9b6866c39c8e4695f039352958 MD5 checksum (website-input_41.tgz) d5885935c83d02ffff9d331d6125280d MD5 checksum (website-input_402.tgz) cd34b8d882eca84102ba2d3d924a862a MD5 checksum (website-input_401.tgz) f335344e534dd631ee584ee44356f2d1 MD5 checksum (website-input_40.tgz) dfcf76e9a88d7ff6b86e62222e680155 MD5 checksum (website-input_321.tgz) b7bc99e9b6e803f9fa408dd1d471a510 MD5 checksum (website-input_32.tgz) 2f108d3831a9cf38d25fe5472c11d6d4 MD5 checksum (website-input_312.tgz) ffdd263c9f413033472b1d34627aa138 MD5 checksum (website-input_311.tgz) d7b984e6da608a024e6624a67c522696 MD5 checksum (website-input_31.tgz) 1998961a0262208007fb7aee30e4462a MD5 checksum (website-input_30.tgz) ffba3c1d1401639e9ab4cfd729e65a0e MD5 checksum (website-input_21.tgz) 7576b3cdddcb788c621ebf480ed0dfbe MD5 checksum (website-input_20.tgz) a48aaa636e668b3500a08ffd49befbdd MD5 checksum (website-input_120.tgz) c06f56e5e0fe2697108bd19db3bef7e4 MD5 checksum (website-input_113.tgz) 08f3c27eb4f454ab568719d045ce2d2b MD5 checksum (website-input_112.tgz) 1fce4da2098c3b87fab26a31f1432a78 MD5 checksum (website-input_111.tgz) 86b9425167c241e0e49285a36acf7f0f MD5 checksum (website-input_11.tgz) 91196d59d8da90f2463989909d85fbf8 MD5 checksum (website-input_105.tgz) b64eaafd18386111bb869a5ea949f792 MD5 checksum (website-input_104.tgz) 331a6dc3209af93f8a57f1fbbd87ee51 MD5 checksum (website-input_103.tgz) b7a0bb0e45abde2828e6abb338524315 MD5 checksum (website-input_102.tgz) 01836a0c1dbda9e2eadadc2abfa02644 MD5 checksum (website-input_101.tgz) b0180f88b1789894b8f4c8526039f761 MD5 checksum (website-input_10.tgz) b0c8e07047009a4ed4fde59572020890 MD5 checksum (website-input_09.tgz) 08071814a023213e2f66825c3717f997 MD5 checksum (website-input_08.tgz) 551315d7e250a8d27e0a8238d7e439b1 MD5 checksum (website-input_07.tgz) 67f6f64eca04d04487cf780167d5c8d1 MD5 checksum (website-input_06.tgz) aeb0c0972d9c9ee8178a27637aba3ed6 MD5 checksum (website-input_05.tgz) 5960ff1fdf7027c21a04003fd46003c3
To install your download
For instructions specific to your download, click the Details tab after closing this window.

Flag As Inappropriate

Website input

Overview
Details
The Website Input app provides an mechanism for scraping web-pages for data and indexing it in your Splunk instance to make it searchable.

Features

  • Website Data Extraction: setup an input that will extract data from a web-page and get it into Splunk
  • Data Preview: select data from a web-page that you would like to extract and preview results to get a sample of the what the output would look like before you save the configuration
  • Website crawling: you can have the input crawl web-pages to automatically discover related content in other pages

Configuration

Initial setup

Once you install the app, it will ask you to set it up on the app configuration page. The setup only contains options related to configuring a proxy server. If no proxy server is used, you can just press save.

Creating an input

You will need to create an input to define the websites that you would like to extract information from. You can setup a new input using the wizard or using the page in Splunk's manager at Settings » Data Inputs » Web-pages or by using the GUI provided in the app itself. The most difficult part of configuring the app is making the CSS selector that will capture the data you want. See W3schools for information on how to create CSS selectors.

You can usually ignore the "Output" section. This is only necessary if you want to name the fields that the input will get based on content within the page (see "Can I use attributes to set the field names?" for details).

The "Authentication" can be left blank unless the web-page requires authentication. Only HTTP authentication is supported at the current time.

Known Issues

The UI shows matches for a selector does preview shows none and the input matches nothing

The preview window may show that a selector matches in the UI even though the selector doesn't match when executed in preview due to the fact that web-browsers sometimes manipulate the HTML before rendering it. This can happen sometimes when tables do not have a tbody element (which they are supposed to). The browser adds the tbody element even though it doesn't exist in the original HTML.

To fix this, you can do one of the following:

  1. Use a selector that matches the original HTML even though it doesn't match in the preview page
  2. Make your selector more generic (like converting "font > table > tr" to "font table tr")
  3. Making a selector that matches both (like "font > table > tr,font > table > tbody > tr"

FAQs

See the links below for answers to frequently asked questions:

Can I specify more than one selector (to match different things on a single page)?

Can I use attributes to set the field names?

I changed the sourcetype and now the match field is no longer a multi-value field; what do I do?

The input isn't extracting content, even though I can see it in my web-browser

More Information

This project is open source. See GitHub for the source or LukeMurphey.net for more information.

Release Notes

Version 4.2.1
May 4, 2017

1) Improved compatibility with Splunk 6.6
2) Fixed issue where users could not enable inputs some times

Version 4.2
April 7, 2017

Adding ability to only output results when they change

Version 4.1.3
April 3, 2017

1) Fixed issue where the host field could not be overridden
2) Reduced some unimportant log messages to debug level

Version 4.1.2
March 19, 2017

Added support for running the app on a Splunk free license

Version 4.1.1
March 13, 2017

Fixed issue where Firefox driver was not correctly added to the path on Windows

Version 4.1
March 9, 2017

1) Fixed issue where some sites could not be previewed
2) Fixed issue where selectors would not match an ID that was not lowercase
3) Added ability to include empty matches
4) Added ability to delete inputs

Version 4.0.2
Jan. 18, 2017

1) Fixed issue where HTTP authentication didn't work with Firefox
2) Fixed issue where Firefox rendering didn't work on headless environments
3) Other minor changes

Version 4.0.1
Dec. 3, 2016

Various bug fixes and minor improvements

Version 4.0
Dec. 1, 2016

Vastly updated UI, various bugs fixes and lots of smaller enhancements

Version 3.2.1
Nov. 24, 2016

1) Improved compatibility with versions of Splunk
2) Fixed overly restrictive URL validation
3) Fixed issue where some parts of the stash file may not have been indexed, losing parts of large result sets
4) Fixed controller logs which were not sourcetyped correctly

Version 3.2
Sept. 21, 2016

* Added ability to view results in search from the modular input creation page
* Improved documentation on the search command options

Version 3.1.2
Sept. 20, 2016

Fixed problem where matches were not visible when the content is very long

Version 3.1.1
July 14, 2016

Fixed problem where you could not create new inputs

Version 3.1
July 11, 2016

Added ability to grant access to make inputs to non-admin users

Version 3.0
May 26, 2016

* Added ability to rendering using a browser (to get page contents after JS rendering has executed)
* MD5 and SHA224 hashes are now included in the results
* Added ability to output matches as separate fields
* Matches are now listed in results in order that they discovered

Version 2.1
May 24, 2016

* Simplified the data input configuration screen
* Added ability to include the raw content in case you want to do your own parsing in SPL
* Added ability to specify a custom string that will separate extracted values
* Fixed incorrect reporting of matches count

Version 2.0
May 3, 2016

* Added ability to crawl websites

Version 1.2.0
Jan. 3, 2016

* Added the ability to use the tag names as the field names
* Fixed issue where the selector would sometimes not match if the content was upper-case and the selector wasn't
* Added a BNF file for the search command

Version 1.1.3
Dec. 16, 2015

Password no longer must be re-typed every time an input is modified

Version 1.1.2
Nov. 30, 2015

Fixed issue where fields without spaces were not being extracted as multi-value fields by default

Version 1.1.1
Sept. 7, 2015

Updated to the latest version of the modular input library; should fix problems where the input crashes

Version 1.1
Aug. 24, 2015

Added ability to specify the user-agent string

Version 1.0.5
June 22, 2015

* Fixed issue where web input controller used the incorrect logger name
* Fixed issue where you could not select the sourcetype correctly in some cases
* Added a search command for performing web scrapes from the search page

Version 1.0.4
March 28, 2015

* Fixed issue where some files could not be parsed because lxml won't parse correctly encoded files sometimes
* Enhanced logging for when interval gap is too large and when checkpoint file could not be found

Version 1.0.3
Jan. 9, 2015

* Fixed issue where the input would not stay on the interval because it included processing time in the interval
* Fixed issue where the modular input logs were not sourcetyped correctly

Version 1.0.2
Nov. 29, 2014

Fixed issue where the input would:
* sometimes fail due to exception thrown from sleep() being interrupted
* sometimes fail due to splunkd connection failure
* ignore the host field that was set on the configuration page

Version 1.0.1
Nov. 12, 2014

Fixed issue where preview did not work

Version 1.0
Oct. 28, 2014

Added ability to use a proxy server

Version 0.9
Aug. 17, 2014

* Fixed issue where not all matches were returned
* Added preview dialog to modular input page
* Added raw_match_count to output which counts CSS matches, even they included no text
* Fixed incompatibility with other apps that also import the modular_input base class
* Fixed issue where entering and then clearing the sourcetype causes an error
* Added ability to specify attributes that should be used for the field names

Version 0.8
July 13, 2014

Fixed problem where websites in non-Ascii encoding did not get decoded correctly

Version 0.7
July 11, 2014

Version 0.6
July 8, 2014

* Switched to multi-value output of matches and added transform for parsing match field
* Fixed exception that could happen if the web-page was not available
* Put authentication fields on a separate location on the manager page

Version 0.5
July 7, 2014

A Splunk input for retrieving and indexing information from web-pages

310
Installs
4,059
Downloads
Share Subscribe LOGIN TO DOWNLOAD

Subscribe Share

Splunk Certification Program

Splunk's App Certification program uses a specific set of criteria to evaluate the level of quality, usability and security your app offers to its users. In addition, we evaluate the documentation and support you offer to your app's users.

Are you a developer?

As a Splunkbase app developer, you will have access to all Splunk development resources and receive a 50GB license to build an app that will help solve use cases for customers all over the world. Splunkbase has 1000+ apps and add-ons from Splunk, our partners and our community. Find an app or add-on for most any data source and user need, or simply create your own with help from our developer portal.

Follow Us:
© 2005-2017 Splunk Inc. All rights reserved.
Splunk®, Splunk>®, Listen to Your Data®, The Engine for Machine Data®, Hunk®, Splunk Cloud™, Splunk Light™, SPL™ and Splunk MINT™ are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners.