The Processors Module offers solutions for data loading, parsing, and transforming, facilitating the handling of dialogues, documents, and URLs. Below are detailed explanations of each class and their supported parameters for customization.
Dialog
object into separate Dialog
objects for each message, allowing independent processing based on the roles of the messages.acceptedRoles
: List of message roles to include in the split. If empty, all messages are accepted.Dialog
into separate Dialog
objects for each line of text, enabling line-by-line processing.lastMessageCount
: Number of last messages to consider for splitting.acceptedRoles
: List of message roles to consider for splitting. If empty, all roles are included.Dialog
into separate Dialog
objects for each list item, aimed at processing each list item individually.lastMessageCount
: Number of last messages to consider for splitting.acceptedRoles
: List of message roles to consider for splitting. If empty, all roles are included.removeListMarks
: Boolean indicating whether to remove list markers (e.g., bullets, numbers) from the beginning of each list item.Dialog
based on a regular expression pattern, facilitating customized text segmentation.pattern
: Regular expression pattern used for splitting text prompts.lastMessageCount
: Number of last messages to consider for splitting.acceptedRoles
: List of message roles to consider for splitting. If empty, all roles are included.removePattern
: Boolean indicating whether to remove the matched pattern from the beginning of each segmented text.docxFile
: Path to the DOCX file to be loaded.convert_urls
: Boolean indicating whether URLs in the text should be converted to URL prompts.xlsxFile
: Path to the XLSX file to be loaded (optional if xlsxBytesArray
is provided).xlsxBytesArray
: Byte array of the XLSX file content (optional if xlsxFile
is provided).convert_urls
: Boolean indicating whether URLs in the text should be converted to URL prompts.urlContainingContent
: URL(s) from which the text content will be loaded.Dialog
objects based on rows or cells.splitMethod
: Method to split the content (‘row’ or ‘cell’).lastMessageCount
: Number of last messages to consider for splitting.acceptedRoles
: List of message roles to consider for splitting. If empty, all roles are included.dirPath
: Directory path where the content will be downloaded. If not provided, a default /downloads/
directory in the current working directory will be used.This documentation aims to provide a comprehensive guide to utilizing the Processors Module effectively, with an emphasis on the flexibility offered by each class’s parameters.