Welcome to the new Splunkbase! To return to the old Splunkbase, click here.
Retrieval Information from Unstructured Documents app icon

Retrieval Information from Unstructured Documents

a simple way to check the presence of certain information within a set of documents present in a folder. Documents can be provided in any form (Office, PDF, OpenOffice, etc.). The app searches, extracts and indexes information such as: • email addresses • tax identification codes (default: italian format) • telephone numbers • names / entities • bank account numbers (default: italian format) • postal addresses (default: italian format) Use a Python script to invoke Apache Tika libraries, apply regex rules to identify the information and send to Splunk HEC to ingest only that information, avoiding to recording in Splunk the rest of documents contents.

Built by Fabio Saulli
splunk product badge

Latest Version 1.0.1
August 5, 2024
Compatibility
Not Available
Platform Version: 9.4, 9.3, 9.2, 9.1, 9.0, 8.2, 8.1, 8.0
Rating

0

(0)

Log in to rate this app
Support
Retrieval Information from Unstructured Documents support icon
Developer Supported app
a simple way to check the presence of certain information within a set of documents present in a folder. Documents can be provided in any form (Office, PDF, OpenOffice, etc.). The app searches, extracts and indexes information such as: • email addresses • tax identification codes (default: italian format) • telephone numbers • names / entities • bank account numbers (default: italian format) • postal addresses (default: italian format) Use a Python script to invoke Apache Tika libraries, apply regex rules to identify the information and send to Splunk HEC to ingest only that information, avoiding to recording in Splunk the rest of documents contents.

Categories

Created By

Fabio Saulli

Type

app

Downloads

72

Resources

Login to report this app listing