icon/x Created with Sketch.

Splunk Cookie Policy

We use our own and third-party cookies to provide you with a great online experience. We also use these cookies to improve our products and services, support our marketing campaigns, and advertise to you on our website and other websites. Some cookies may continue to collect information after you have left our website. Learn more (including how to update your settings) here.
Accept Cookie Policy

We are working on something new...

A Fresh New Splunkbase
We are designing a New Splunkbase to improve search and discoverability of apps. Check out our new and improved features like Categories and Collections. New Splunkbase is currently in preview mode, as it is under active development. We welcome you to navigate New Splunkbase and give us feedback.

Accept License Agreements

This app is provided by a third party and your right to use the app is in accordance with the license provided by that third-party licensor. Splunk is not responsible for any third-party apps and does not provide any warranty or support. If you have any questions, complaints or claims with respect to this app, please contact the licensor directly.

Thank You

Downloading Mail-Parser Plus
SHA256 checksum (mail-parser-plus_101.tgz) 91dc54b32766fa70a02c85ebd1a879037caa1c7ce1bd5e32966272b9a658d4a5 SHA256 checksum (mail-parser-plus_10.tgz) 251cbedbf4dd969f9fa066cb5d0fd4c3498808751fc66dee758a3bd7e639d6f9
To install your download
For instructions specific to your download, click the Details tab after closing this window.

Flag As Inappropriate

splunk

Mail-Parser Plus

This app is NOT supported by Splunk. Please read about what that means for you here.
Overview
Details
This app provides parsing and feature extraction for RFC5302 compliant emails. Acts as a wrapper for the python library mail-parser (https://github.com/SpamScope/mail-parser) and provided additional features that can be used to feed to a ML algorithm. Note that the attachment payload does not have to be in Splunk, just its headers for it to extract attachment information like content-type.

The intent of this app is to act as a wrapper for the python library mail-parser (https://github.com/SpamScope/mail-parser) to parse a RFC5302 compliant email in Splunk as well as extract other features from an email in order to feed to a ML algorithm. The app provides a single custom command.

Available at:
Github

Version: 1.0

Author: Nathan Worsham

Created as part of MSDS692 Data Science Practicum II at Regis University, 2018
See associated blog for detailed information on the project.

Description and Use-cases

This app provides parsing and feature extraction for RFC5302 compliant emails. Note that the attachment payload does not have to be in Splunk, just its headers for it to extract attachment information like content-type.

How to use

Install

Normal app installation can be followed from https://docs.splunk.com/Documentation/AddOns/released/Overview/AboutSplunkadd-ons. Essentially download app and install from Web UI or extract file in $SPLUNK_HOME/etc/apps folder.

Custom Commands

mailparser

Description

A wrapper for for the python library mail-parser (https://github.com/SpamScope/mail-parser) to parse a RFC5302 compliant email in Splunk as well as extract other features from an email in order to feed to a ML algorithm. Default settings: all options turned on, true/false values returned as 0/1, and message in the _raw field.

Syntax

* | mailparser [messagefield=<field>] [all_headers=<bool>] [adv_attrs=<string>] [true_label=<1|true|True|T|yes|Yes|Y>]

Optional Arguments

messagefield
Syntax: messagefield=\<field>
Description: The field containing the entire email, normally this is _raw which is the default if not set.
Usage: Option only takes a single field
Default: _raw

all_headers
Syntax: all_headers=\<boo;>
Description: If false, returns only basic header information: To, From, Subject, Date. If true, all header information is parsed.
Usage: Boolean value. True or False; true or false, t or f, 0 or 1
Default: True

adv_attrs
Syntax: adv_attrs=\<bool>
Description: If true, extracts features such as from_tld, body_len, has_masq_link, etc. See below for complete list and description.
Usage: Boolean value. True or False; true or false, t or f, 0 or 1
Default: lxml

true_label
Syntax: true_label=\<1|true|True|T|yes|Yes|Y>
Description: String value, determines how true and false values will appear with default being 1 for True and 0 for False.
Usage: 1|true|True|T|yes|Yes|Y
Default: 1

Extracted Features (using adv_attrs option)

from_tld: String - Top Level Domain of the sender
return_path_match_from: Boolean - If Return_Path matches From
body_len: Integer - Character count of the body
subject_len: Integer - Character count of the subject
link: String - Multi-value list of HTML <a> tags
num_link: Integer - Count of HTML <a> tags
num_uniq_link: Integer - Count of distinct HTML <a> tags
has_repeat_link: Boolean - If any href values from HTML <a> tags are repeated
has_unsubscribe_link: Boolean - If any text values from HTML <a> tags have the word unsubscribe within
num_email_link: Integer - Count of href values from HTML <a> tags that are mailto:
masq_link: String - Multi-value list of href values (actual target) from HTML <a> tags where the link text purports to be a URL but the target does not match the link text (masquerading link)
masq_link_tld: String - Multi-value list of Top Level Domains of masquerading link targets
has_masq_link: Boolean - If any HTML <a> tags' link text purports to be a URL but the target does not match the link text (masquerading link)
num_masq_link: Integer - Count of any (includes duplicates) HTML <a> tags' link text purports to be a URL but the target does not match the link text (masquerading link)
mail_text: String - Subject and Body text combined, usually without HTML formatting (from bs4)
has_html_content: Boolean - If content in body is set to html
has_javascript: Boolean - If script type in body is set to javascript
has_inline_img: Boolean - If content in body has an inline image
url: String - Multi-value list of URLs from body
has_url: Boolean - If URL exists in the body
num_url: Integer - Count of URLs in the body
num_uniq_url: Integer - Count of distinct URLs in the body
url_len: Integer - Multi-value list of length of URLs from body
url_tld: String - Multi-value list (including duplicates) of Top Level Domains of URLs from body
url_uniq_tld: String - Multi-value list of distinct Top Level Domains of URLs from body
num_repeat_url: Integer - Count of repeating URLs in the body
email_addr: String - Multi-value list of email addresses from body
has_email_addr: Boolean - If an email address exists in the body
num_email_addr: Integer - Count of email addresses in the body
num_email_addr_url: Integer - Count of distinct email addresses in the body

Support

Support will be provided through Splunkbase (click on Contact Developer) or Splunk Answers or submit an issue in Github.

Documentation

This README file constitutes the documentation for the app and will be kept up to date on Github as well as on the Splunkbase page.

Release Notes

Version 1.0.1
Sept. 14, 2018
Version 1.0
Aug. 15, 2018
24
Installs
369
Downloads
Share Subscribe LOGIN TO DOWNLOAD

Subscribe Share

Are you a developer?

As a Splunkbase app developer, you will have access to all Splunk development resources and receive a 10GB license to build an app that will help solve use cases for customers all over the world. Splunkbase has 1000+ apps from Splunk, our partners and our community. Find an app for most any data source and user need, or simply create your own with help from our developer portal.

Follow Us:
Splunk, Splunk>,Turn Data Into Doing, Data-to-Everything, and D2E are trademarks or registered trademarks of Splunk Inc. in the United States and other countries. All other brand names,product names,or trademarks belong to their respective owners.