Auto-Classify Record

The Auto-Classify Documents action allows you easily move documents into their corresponding Category, SubCategory and Name

This action is part of our Indexing Automation tool kit. If you have records where all document types are grouped into one document you can use this system to split them out into their proper document types.

Options

Commit Classify - Turn this ON to have the system split the document(s) into their respective Category, etc. Turn it OFF to have the system document the classifications and allow a user to preview and change them before being persisted.

Split Page Handling - When the system determines that it needs to split you can have it keep the page where the split marker is found or discard it.

Classifications (Enterprise Feature)

Break Codes

Use these to tell the indexing engine to find classifications by OCR, Bar Code or both. Use the proper fields in this section based on your needs. When a match is found then the system knows it needs to classify here.

You can use REGEX and LIKE values for matching. This allows you to work with patterns instead of exact value matches.

LIKE statements are NOT case sensitive. Here are the special characters you can use with some examples:

? = Any single character. Example: "SM?TH" will match "Smith" or "SMYTH".

* = Zero or more characters. Example: "SM*TH" will match "SMITH" or "SMYTH" or "SMabcdefgTH".

# = Any single digit. Example: "SMITH#" will match "SMITH1" or "SMITH8" but not "SMITH10" unless you do "SMITH##" or "SMITH#*".

REGEX means using Regular Expressions for pattern matching. The system utilizes the IsMatch() method to look for strings that match your pattern. To use REGEX in the system you must preface the pattern with "REGEX:" so the system knows you want to use REGEX logic. For instance:

"REGEX:\d{10}" will match any word that has a numeric value inside of it that is at least 10 characters long

"REGEX:^\d{10}$" will match any word as long as it is a numeric value that is exactly 10 characters long

"REGEX:[a-zA-Z0-9]" will match any word as long as it contains any single alphanumeric character

"REGEX:^[a-zA-Z0-9]{3,6}$" will match any word that is made up only of alphanumeric characters and is from 3 to 6 characters long

Use the Negation Codes to cancel a match. For instance, you may want to split on the word "INVOICE" but not when it is also on a page that contains "SUMMARY". This this case, enter "INVOICE" in the OCR, Barcode or Both fields and "SUMMARY" in the Negation Codes field.

By Pages

If you know the pages of the classifications ahead of time, you can add them here and skip having to configure text to split on. This setting uses numbers, lists and ranges of numbers. Ranges can be specified using "1,2,3" or "1-3" syntax. Use [PAGECOUNT] to get the page count for the document being processed.

Regions

You can program the system to look for the values (Se above) only at specific regions on the page. This will greatly reduce the chances of a false positive match. For instance, if you are looking for "Bill of Sale" anywhere on the page then a comment in the middle of a document that says "check the bill of sale" could trigger the split. However, if you limit the zone to the top middle area of the page, this should not cause a split.

Regions are only available when using break codes.