icon/x Created with Sketch.

Splunk Cookie Policy

We use our own and third-party cookies to provide you with a great online experience. We also use these cookies to improve our products and services, support our marketing campaigns, and advertise to you on our website and other websites. Some cookies may continue to collect information after you have left our website. Learn more (including how to update your settings) here.
Accept Cookie Policy

Accept License Agreements

This app is provided by a third party and your right to use the app is in accordance with the license provided by that third-party licensor. Splunk is not responsible for any third-party apps and does not provide any warranty or support. If you have any questions, complaints or claims with respect to this app, please contact the licensor directly.

Thank You

Downloading URLParser
SHA256 checksum (urlparser_100.tgz) e559324d55393941d8aefefb18ddb18bfbdf060baba91e86ef943d7e040763c9
To install your download
For instructions specific to your download, click the Details tab after closing this window.

Flag As Inappropriate

URLParser

Overview
Details
URLParser is a custom search command designed to parse URLs. Because it relies on the new chuncked protocol for custom search commands, URLParser is compatible starting with Splunk 6.4.0 and above.

URLParser is a community supported app and compared to UTBox, URLParser is faster, extract more fields and is easier to use.

URLParser

URLParser is a custom search command designed to parse URLs. Because it relies on the new chuncked protocol, URLParser is compatible starting with Splunk 6.4.0 and above.

 ... | urlparser [field=fieldname] [listname="*|iana|mozilla|..."] [mode=[simple|extended]]

URLParser is a community supported app and compared to UTBox, URLParser is faster, extract more fields and is easier to use.

Extracted fields

URLParser will extract the following fields form the submitted URLs:

  • url_domain
  • url_domain_without_tld
  • url_fragment
  • url_hostname
  • url_netloc
  • url_params
  • url_password
  • url_path
  • url_port
  • url_query
  • url_scheme
  • url_subdomain
  • url_subdomain_depth
  • url_subdomain_parts
  • url_tld
  • url_username

The field url_subdomain_parts can also be processed by Splunk spath command to access to individual parts of the subdomain (url_subdomain.1, url_subdomain.2, ...).

Usage

The command signature is the following:

... | urlparser [field=fieldname] [mode=[simple|extended]] [listname="listname1|listname2|..."]

All arguments are optional and default values are set to the following:
field: url
mode : extended
* listname: mozilla

The simplest way to call urlparser is as follow:

... | urlparser

In the previous example, urlparser will automatically works with the field 'url', load the 'mozilla' suffix list and perform an 'extended' extraction of the fields.

Example

This example demonstrates the parsing of a 'complex' URL and how the Splunk spath command can be used to leverage the url_subdomain_parts field.

 | stats count
 | fields - count
 | eval url = "hTTp://je@n:pass:w@rd@images.www.gOOGle.Co.uk:256/iDNex.php?var=CALue32&ouech=gros#pouet"
 | urlparser
 | spath input=url_subdomain_parts
 | transpose

This simple example also illustrates that the case of the input URL is unchanged by URLParser, which is a fundamental to work with URLs containing Base64 data for example (exfiltration scenarios and alike). Users willing to normalize URL in lower case can easily do it by using Splunk's eval command and it's lower() function.

Pro Tips

It is a good habit to filter URLs prior sending them to urlparser to avoid empty url fields, or url set as '-' (often seen in proxy logs).

... | search url=* url!="-" | urlparser

In some situation, using the stats command to deduplicate repeted url can be desirable.

Scripted Lookup

URLParser is also accessible as a scripted lookup. This will be useful for situations where the custom search command cannot be used like if you are building a datamodel. The scripted lookup is slower than the custom search command.

... | eval list="iana|mozilla" | lookup urlparser_lookup url list

To pass a string argument to a scripted lookup, a little trick need to be used as illustrated with the previous example. In this example, the lists to use are set to 'iana' and 'mozilla' by a prelimerary call to the Splunk eval command.

Where are the statistical functions from UTBox?

URLParser will focus on everything about URL Parsing. In short, computing the shannon entropy of a word, whether that'd be a domain name or not, is not part of the process of parsing a URL.

Options

mode

The mode option, admit two values: 'simple', or 'extended' so it's usage is straightforward:

  • mode=simple
  • mode=extended

In case of an unknown submission, the default mode 'extended' is used.

The mode 'simple' only call python's method urlparse() to extract basic elements from URLs and the mode 'extended' extract many more elements like the TLD, the subdomain, the domain without the TLD, etc.

listname

The listname option allows to specify one or more lists of known TLDs to load. URLParser is shipped with two default lists, the IANA list and the Mozilla Public Suffix List but users can define their own custom lists to either complement, or replace, the default lists. Multiple lists can be loaded by specifying the separator "|" (pipe).

Examples:

  • listname="iana" : load the TLD from the list 'iana' (one of the default lists)
  • listname="custom" : load the TLD from the list 'custom' (user defined list)
  • listname="mozilla|iana" : load the TLD from both 'iana' and 'mozilla' lists (default provided lists)
  • listname="iana|pouet" : load the TLD from the list 'iana' (default list) and the list 'pouet' (user defined list)
  • listname="iana|pouet|custom|mylist" : load the TLD from the lists 'iana', 'pouet', 'custom' and 'mylist'
  • listname="*" : load the TLD from all available lists (lists present in the suffix_lists directory)

There is no limit to the number of lists one can load and the TLDs present in multiple lists are loaded only once (the underneath logic is a boolean OR).

Lists files are stored under the application directory ($APP_DIR/suffix_lists) and must be named following this syntax: suffix_list_\<name lowercase\="">.dat

Examples:

  • suffix_list_mozilla.dat
  • suffix_list_iana.dat
  • suffix_list_custom.dat
  • suffix_list_pouet.dat

Creating a custom list

This section describes what is the formalism expected for the content of a custom list:

  • One TLD per line
  • Comments are ignored (lines starting by "#" or "//")
  • TLD must NOT start with a dot (".com" is wrong, "com" is correct)
  • Support the Mozilla Suffix List logic (wildcards and question marks)

Example:

// This is my custom list
pouet
\*.yata
!coco.yata

Line 1: define "pouet" as a TLD.
www.domain.pouet: TLD=pouet, Domain=domain.pouet

Line 2: define that everything under ".yata" is part of the TLD
www.domain.cw.yata: TLD=cw.yata, Domain=domain.cw.yata
www.domain.hehe.yata: TLD=hehe.yata, Domain=domain.hehe.yata

Line 3: define an exception for the .yata TLD: coco.yata is NOT a TLD.
www.domain.coco.yata: TLD=yata, Domain=coco.yata

Is it fast?

Those tests are just an indication of performances and were realized on a MacBook Pro over a sample dataset of proxy logs with Splunk 6.5.1.

URLParser (scripted lookup)

search url!=- url=* | head 200000 | eval list="mozilla|iana"| lookup urlparser_lookup url list

This search has completed and has returned 5,123 results by scanning 204,129 events in 81.6 seconds

URLParser (custom search comand)

search url!=- url=* | head 200000 | urlparser listname="mozilla|iana"

This search has completed and has returned 5,123 results by scanning 204,129 events in 26.91 seconds

As a reference point for comparaison, here are the results with UTBox:

search url!=- url=* | head 200000 | eval list="*" | lookup ut_parse_extended_lookup url list

This search has completed and has returned 5,123 results by scanning 204,129 events in 83.123 seconds

Troubleshooting

URLParser execution logs can be found under $SPLUNK_HOME/var/log/splunk/urlparser.log

History

  • v1.0, December 2016
    • First release

Release Notes

Version 1.0.0
Nov. 29, 2016

312
Installs
912
Downloads
Share Subscribe LOGIN TO DOWNLOAD

Subscribe Share

AppInspect Tooling

Splunk AppInspect evaluates Splunk apps against a set of Splunk-defined criteria to assess the validity and security of an app package and components.

Are you a developer?

As a Splunkbase app developer, you will have access to all Splunk development resources and receive a 50GB license to build an app that will help solve use cases for customers all over the world. Splunkbase has 1000+ apps and add-ons from Splunk, our partners and our community. Find an app or add-on for most any data source and user need, or simply create your own with help from our developer portal.

Follow Us:
© 2005-2019 Splunk Inc. All rights reserved.
Splunk®, Splunk>®, Listen to Your Data®, The Engine for Machine Data®, Hunk®, Splunk Cloud™, Splunk Light™, SPL™ and Splunk MINT™ are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners.