icon/x Created with Sketch.

Splunk Cookie Policy

We use our own and third-party cookies to provide you with a great online experience. We also use these cookies to improve our products and services, support our marketing campaigns, and advertise to you on our website and other websites. Some cookies may continue to collect information after you have left our website. Learn more (including how to update your settings) here.
Accept Cookie Policy

Accept License Agreements

This app is provided by a third party and your right to use the app is in accordance with the license provided by that third-party licensor. Splunk is not responsible for any third-party apps and does not provide any warranty or support. If you have any questions, complaints or claims with respect to this app, please contact the licensor directly.

Thank You

Downloading Website input
SHA256 checksum (website-input_458.tgz) f7cd84d5488fac0fb3aeac8b47e8880655ef1c925fc7f3f33eef3046a3af14aa SHA256 checksum (website-input_457.tgz) dda5d148056e27a09d6b0fa7e00698370b1b70809d70ff80391521b49c80f5b1 SHA256 checksum (website-input_456.tgz) c71432c8ab6fcf8045fc27bad0738c75d9f41aff38baa299f726512bc2b3d7b8 SHA256 checksum (website-input_455.tgz) 2dbef96af14868b3fe5d13db65774b7c1f08b1788ff234017a922b69532a9ea5 SHA256 checksum (website-input_454.tgz) 1582cec4e2aa7c6dc6bce0e18fa8ba8d0aea6092398ca2a07257af4e6fc92244 SHA256 checksum (website-input_453.tgz) edf88fe329ec45562a5d9d8184dad39af687b7d8478f40639356e160776189e5 SHA256 checksum (website-input_452.tgz) 6679ab6d45dfa3d304f40c253085a3d373d20305a52b59034eeeb70ba39f1f0e SHA256 checksum (website-input_451.tgz) 07ae6dfd1ffc23129ecc82dcf167f0ac18cd502537f79fdf95adacc96aba6dfd SHA256 checksum (website-input_45.tgz) 84abe5c392c251356ce53a46c254ef73ffb6437141b29284e25d47d8106b3ff2 SHA256 checksum (website-input_44.tgz) b9f4955357efba0c840677dcd334eab8a3351d10c57b8485c0119ae91c46cffa SHA256 checksum (website-input_430.tgz) e71b5b9845041362c4c97df4cf7f44e30ada8dbf1159511da158321e93b2a49f SHA256 checksum (website-input_421.tgz) 3e1b4f228f386341d1e9c309e55495c164be0e5e3b55841e1e7c4db618646142 SHA256 checksum (website-input_42.tgz) f4eb9fdd802e56535f8f85cc6c55a35df308174db072eea9d20baff5060c10d3 SHA256 checksum (website-input_413.tgz) d649258230b3c2b64070bb61c65b5ec4f08594cd1045a35057bf11cc5c44aa47 SHA256 checksum (website-input_412.tgz) a80d752d14e3db2eacdc88b5901696356eb7812d92be1810d170a675d3e13bae SHA256 checksum (website-input_411.tgz) 3f7fe78697063314035e89659a3f691e0026e67954b4f6e8165c76bb0f7df22f SHA256 checksum (website-input_41.tgz) 1510b50f9a254224f635af1d06aa09a1f7cc12e07fdb6f817bf8e145c85ec50d SHA256 checksum (website-input_402.tgz) 70820a1eeabd22b5e3433a193faa8e939540a6deb83f7e5dcb8209b89fc3e597 SHA256 checksum (website-input_401.tgz) a6732627910624c34addfbf33b291bcc3e857a11075825f0d5a3119b95c38b07 SHA256 checksum (website-input_40.tgz) a9ae3c4187af978e0d641537f1450cea6786e83f216049ba51099ad5597a80ab SHA256 checksum (website-input_321.tgz) 7598c756c37985ef251a8f7d35bbef047a71ad3684846fee8e5b9ecdeb003dd7 SHA256 checksum (website-input_32.tgz) 7f88b635991bfecc8ace6e384bd5ce4cb997191b11206b4997de0fded8608f33 SHA256 checksum (website-input_312.tgz) eaf125f98867173ff89879cd740aeeb1c9beec5219b8ba4a0ddcff036aaf08ef SHA256 checksum (website-input_311.tgz) 60ca9c19d4581dbb9fdddd1ebd7220529e67df7c8725245207721bdd9e84605b SHA256 checksum (website-input_31.tgz) c4b9362cf9692bb4598bc275f15abc2b0440f7a10fe070fac8b7a888438c13e0 SHA256 checksum (website-input_30.tgz) 6fb2c23e068cf13991d42fc8b404a98532e95c998fa12158478cf5b56f5e90b6 SHA256 checksum (website-input_21.tgz) 805b56e0d19f01b2af2fc51dd9cd144f32004e837e413f72f24d13fed648b7ef SHA256 checksum (website-input_20.tgz) 3e36dc558071598e2400c443a6a67f386c645f3e701a758c955e42903908c1e4 SHA256 checksum (website-input_120.tgz) 437dcd951ae0883eb12fe8cbc246ff876af6a1ae7b97da9acfeafa3a17b0039f SHA256 checksum (website-input_113.tgz) c3315a44f76f0ee21dcd89a8c97f39792f08a1f28d622361edffa28bc07a4b5c SHA256 checksum (website-input_112.tgz) 85d37845c0ca47c300e13722807f4bbc0f58b1380636ae81ffa9dccf668808a3 SHA256 checksum (website-input_111.tgz) 0e06d5be93e8639ef62acebf18baab86d9bf91b681562dc1c50e1ad3df7aeaa3 SHA256 checksum (website-input_11.tgz) 3baebf5e91aef9ce1f01307beedc665ea6d74a22d29030dafd66475c9eeb7df3 SHA256 checksum (website-input_105.tgz) 033f5415ea210e72149c25e17f3f96a20eb35d6cac4597e9dc117cdfc6f9290f SHA256 checksum (website-input_104.tgz) 93c8a6732b7c278ce612937df6a4b8020f0f42a40aa76137c889be10b945fff9 SHA256 checksum (website-input_103.tgz) fc6df51786d28e77f6e3e776b79e5c5e51f5b10bb0da252ea4983a7a95eef383 SHA256 checksum (website-input_102.tgz) 0ec1c68298529b04dab44469704848247b3646f14ede6e42d30d5dce05fedc35 SHA256 checksum (website-input_101.tgz) 0367f3f0be12974fd4ba92eeea19db93e409ce5e794c21367e48a25af0383bb9 SHA256 checksum (website-input_10.tgz) 2e2438396470624527a1903359eb7709c845ff5ec921bb0db9dc31c701c35f33 SHA256 checksum (website-input_09.tgz) 80037d45973699ed4147101c5a9caade4d87070c8c69ca797f5dd3f4f1d7bcf8 SHA256 checksum (website-input_08.tgz) d96a822655179b38e581aa8bd8729198d4037e48f4a138f1b4be5be568b9d27a SHA256 checksum (website-input_07.tgz) a90c74602a84128fbee685d4424c783f2c2ad7610b99f96b74b5b7283b4c3521 SHA256 checksum (website-input_06.tgz) 007200e60f474ccc5f0c2d3afc51effb1f043dc5d633590b0ed7442a043135ee SHA256 checksum (website-input_05.tgz) 7a49315df33210e42658a71f6b4a4042e67f4f7ccc301015dbc619e856d16ae7
To install your download
For instructions specific to your download, click the Details tab after closing this window.

Flag As Inappropriate

Website input

Admins: Please read about Splunk Enterprise 8.0 and the Python 2.7 end-of-life changes and impact on apps and upgradeshere.
Overview
Details
The Website Input app provides an mechanism for scraping web-pages for data and indexing it in your Splunk instance to make it searchable.

Features

  • Website Data Extraction: setup an input that will extract data from a web-page and get it into Splunk
  • Data Preview: select data from a web-page that you would like to extract and preview results to get a sample of the what the output would look like before you save the configuration
  • Website crawling: you can have the input crawl web-pages to automatically discover related content in other pages

Configuration

Initial setup

Once you install the app, it will ask you to set it up on the app configuration page. The setup only contains options related to configuring a proxy server. If no proxy server is used, you can just press save.

Creating an input

You will need to create an input to define the websites that you would like to extract information from. You can setup a new input using the wizard or using the page in Splunk's manager at Settings » Data Inputs » Web-pages or by using the GUI provided in the app itself. The most difficult part of configuring the app is making the CSS selector that will capture the data you want. See W3schools for information on how to create CSS selectors.

You can usually ignore the "Output" section. This is only necessary if you want to name the fields that the input will get based on content within the page (see "Can I use attributes to set the field names?" for details).

The "Authentication" can be left blank unless the web-page requires authentication. Only HTTP authentication is supported at the current time.

Known Issues

The UI shows matches for a selector does preview shows none and the input matches nothing

The preview window may show that a selector matches in the UI even though the selector doesn't match when executed in preview due to the fact that web-browsers sometimes manipulate the HTML before rendering it. This can happen sometimes when tables do not have a tbody element (which they are supposed to). The browser adds the tbody element even though it doesn't exist in the original HTML.

To fix this, you can do one of the following:

  1. Use a selector that matches the original HTML even though it doesn't match in the preview page
  2. Make your selector more generic (like converting "font > table > tr" to "font table tr")
  3. Making a selector that matches both (like "font > table > tr,font > table > tbody > tr"

FAQs

See the links below for answers to frequently asked questions:

Can I specify more than one selector (to match different things on a single page)?

Can I use attributes to set the field names?

I changed the sourcetype and now the match field is no longer a multi-value field; what do I do?

The input isn't extracting content, even though I can see it in my web-browser

More Information

This project is open source. See GitHub for the source or LukeMurphey.net for more information.

Release Notes

Version 4.5.8
Nov. 14, 2019

1. Adding support for Python 3
2. Fixing issues on Splunk 8.0.0
3. Updated the Geckodriver for Mac and Linux to version 0.26.0

Version 4.5.7
June 14, 2019

1) Fixed another error that occurred when output values as multi-valued fields
2) Updated the geckodriver to 0.24 so that newer versions of Firefox work
3) Added link to search logs to determine why browser test failed
4) Fixed issue where integrated browser test failed on the input wizard

Version 4.5.6
Feb. 22, 2019

1) Fixed error that occurred when output values as multi-valued fields
2) Fixed issue where proxy password from secure storage was not being used

Version 4.5.5
Nov. 10, 2018

1) Fixed issue where passwords were not loaded if there were more than 30
2) Improved styling on Splunk 7.0+

Version 4.5.4
July 9, 2018

1) Fixed the "when_matches_change" setting of "output_results" made results even the matches hadn't changed
2) Fixed issue where the severity chart on the health page filtered based on the severity filter and thus didn't show all entries

Version 4.5.3
June 15, 2018

Updating the styling to work better on Splunk 7.0 and 7.1

Version 4.5.2
March 14, 2018

1) Input now handles large files much better by only downloading the first 512 KB of the file
2) Updated the Chrome driver so that the input works with newer versions of Chrome
3) The input creation wizard auto-suggests a URL filter now when using spidering
4) Output is not streamed (as opposed to being cached) in order to reduce memory usage
5) The input now gracefully handles websites that return a bad encoding
6) Fixed issue where you could not drill-down on logs from the health dashboard

Version 4.5.1
Oct. 5, 2017

1) Input is now resilient to transient Splunkd outages
2) Fixed issue where index selection input was super-wide on Splunk 7.0

Version 4.5
Sept. 2, 2017

1) Added support for forms authentication with browsers
2) Fixed issue where user-agent string was not set for Firefox and Chrome
3) Fixed issue where the browser testing functionality on the UI didn't use the proxy server

Version 4.4
Aug. 7, 2017

1) Added support for forms authentication
2) Added ability to set a default value for the user-agent globally
3) Removed support for proxy authentication on Splunk Cloud

Version 4.3.0
July 21, 2017

1) Passwords are now stored using Splunk secure storage
2) Setup page has been updated to make it easier to use
3) Pages can now be rendered using Google Chrome
4) Added help page to guide users on how to use a web browser for rendering; added browser test to input page
5) Fixed a couple small bugs on the Overview dashboard

Version 4.2.1
May 4, 2017

1) Improved compatibility with Splunk 6.6
2) Fixed issue where users could not enable inputs some times

Version 4.2
April 7, 2017

Adding ability to only output results when they change

Version 4.1.3
April 3, 2017

1) Fixed issue where the host field could not be overridden
2) Reduced some unimportant log messages to debug level

Version 4.1.2
March 19, 2017

Added support for running the app on a Splunk free license

Version 4.1.1
March 13, 2017

Fixed issue where Firefox driver was not correctly added to the path on Windows

Version 4.1
March 9, 2017

1) Fixed issue where some sites could not be previewed
2) Fixed issue where selectors would not match an ID that was not lowercase
3) Added ability to include empty matches
4) Added ability to delete inputs

Version 4.0.2
Jan. 18, 2017

1) Fixed issue where HTTP authentication didn't work with Firefox
2) Fixed issue where Firefox rendering didn't work on headless environments
3) Other minor changes

Version 4.0.1
Dec. 3, 2016

Various bug fixes and minor improvements

Version 4.0
Dec. 1, 2016

Vastly updated UI, various bugs fixes and lots of smaller enhancements

Version 3.2.1
Nov. 24, 2016

1) Improved compatibility with versions of Splunk
2) Fixed overly restrictive URL validation
3) Fixed issue where some parts of the stash file may not have been indexed, losing parts of large result sets
4) Fixed controller logs which were not sourcetyped correctly

Version 3.2
Sept. 21, 2016

* Added ability to view results in search from the modular input creation page
* Improved documentation on the search command options

Version 3.1.2
Sept. 20, 2016

Fixed problem where matches were not visible when the content is very long

Version 3.1.1
July 14, 2016

Fixed problem where you could not create new inputs

Version 3.1
July 11, 2016

Added ability to grant access to make inputs to non-admin users

Version 3.0
May 26, 2016

* Added ability to rendering using a browser (to get page contents after JS rendering has executed)
* MD5 and SHA224 hashes are now included in the results
* Added ability to output matches as separate fields
* Matches are now listed in results in order that they discovered

Version 2.1
May 24, 2016

* Simplified the data input configuration screen
* Added ability to include the raw content in case you want to do your own parsing in SPL
* Added ability to specify a custom string that will separate extracted values
* Fixed incorrect reporting of matches count

Version 2.0
May 3, 2016

* Added ability to crawl websites

Version 1.2.0
Jan. 3, 2016

* Added the ability to use the tag names as the field names
* Fixed issue where the selector would sometimes not match if the content was upper-case and the selector wasn't
* Added a BNF file for the search command

Version 1.1.3
Dec. 16, 2015

Password no longer must be re-typed every time an input is modified

Version 1.1.2
Nov. 30, 2015

Fixed issue where fields without spaces were not being extracted as multi-value fields by default

Version 1.1.1
Sept. 7, 2015

Updated to the latest version of the modular input library; should fix problems where the input crashes

Version 1.1
Aug. 24, 2015

Added ability to specify the user-agent string

Version 1.0.5
June 22, 2015

* Fixed issue where web input controller used the incorrect logger name
* Fixed issue where you could not select the sourcetype correctly in some cases
* Added a search command for performing web scrapes from the search page

Version 1.0.4
March 28, 2015

* Fixed issue where some files could not be parsed because lxml won't parse correctly encoded files sometimes
* Enhanced logging for when interval gap is too large and when checkpoint file could not be found

Version 1.0.3
Jan. 9, 2015

* Fixed issue where the input would not stay on the interval because it included processing time in the interval
* Fixed issue where the modular input logs were not sourcetyped correctly

Version 1.0.2
Nov. 29, 2014

Fixed issue where the input would:
* sometimes fail due to exception thrown from sleep() being interrupted
* sometimes fail due to splunkd connection failure
* ignore the host field that was set on the configuration page

Version 1.0.1
Nov. 12, 2014

Fixed issue where preview did not work

Version 1.0
Oct. 28, 2014

Added ability to use a proxy server

Version 0.9
Aug. 17, 2014

* Fixed issue where not all matches were returned
* Added preview dialog to modular input page
* Added raw_match_count to output which counts CSS matches, even they included no text
* Fixed incompatibility with other apps that also import the modular_input base class
* Fixed issue where entering and then clearing the sourcetype causes an error
* Added ability to specify attributes that should be used for the field names

Version 0.8
July 13, 2014

Fixed problem where websites in non-Ascii encoding did not get decoded correctly

Version 0.7
July 11, 2014

Version 0.6
July 8, 2014

* Switched to multi-value output of matches and added transform for parsing match field
* Fixed exception that could happen if the web-page was not available
* Put authentication fields on a separate location on the manager page

Version 0.5
July 7, 2014

A Splunk input for retrieving and indexing information from web-pages

464
Installs
7,016
Downloads
Share Subscribe LOGIN TO DOWNLOAD

Subscribe Share

AppInspect Tooling

Splunk AppInspect evaluates Splunk apps against a set of Splunk-defined criteria to assess the validity and security of an app package and components.

Are you a developer?

As a Splunkbase app developer, you will have access to all Splunk development resources and receive a 10GB license to build an app that will help solve use cases for customers all over the world. Splunkbase has 1000+ apps and add-ons from Splunk, our partners and our community. Find an app or add-on for most any data source and user need, or simply create your own with help from our developer portal.

Follow Us:
© 2005-2019 Splunk Inc. All rights reserved.
Splunk®, Splunk>®, Listen to Your Data®, The Engine for Machine Data®, Hunk®, Splunk Cloud™, Splunk Light™, SPL™ and Splunk MINT™ are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners.