
FILE DEDUPLICATION SOFTWARE PDF
PDF Portfolio is an archive of other files, not a regular PDF document. The output is a single PDF Portfolio file with emails converted into PDF, but all attachments remain in the native file format. This is a standard functionality provided by the Adobe® Acrobat®. Export email messages (or whole folders) from the Microsoft® Outlook® (or any other email app) into a PDF Portfolio format.There can be multiple unique emails in a single email thread, if an original email or any of the replies contain attachments. This can happen when the same email is received from multiple recipients or was emailed to a group of people and was received by the same person more than once. Email Handling The algorithm is specialized for processing email text to avoid comparing email headers that may be different while the email text is the same. The de-duplication can also instantly detect files that are totally identically on the "binary level". There are other types of processing available for finding duplicate pages where comparison is performed "visually" without using the actual text. The de-duplication process also does not compare images. It is not possible to use this process for scanned PDF files that have not been run through the text recognition. What is a Duplicate File? Any PDF file that has text that is either identical to or is fully contained in another PDF file is considered a duplicate. The conversion into PDF format is provided by both the Adobe® Acrobat® and the AutoPortfolio™ plug-in. This allows using both emails and their attachments in the de-duplication process. The emails need to be converted into PDF format in order be used in the de-duplication. The process is specifically fine-tuned for handling emails. These can be PDF files created from emails or any other kinds of text documents. Introduction The AutoPortfolio™ plug-in provides functionality for de-duplication of PDF documents. Detecting and discarding documents that are redundant can greatly reduce the number of documents/emails that need to be prepared during the electronic discovery process. The process of finding unique documents (emails) is often referred to as "de-duplication". It is sufficient to keep only the last email from each "thread" and discard the intermediate emails. This is due to the fact that email replies almost always include the content of the previous emails. Typically, there is a significant number of emails that are part of the email "threads" and are redundant. It is often necessary to compile hundreds or even thousands of emails for a single court case. Deduplicating PDF Files (Emails) Using the AutoPortfolio™ Plug-in For Adobe® Acrobat® What is Email/Document De-Duplication? Emails are one of the most important types of litigation documents.
