AAI_2025_Capstone_Chronicles_Combined
ResolveAI
● answer (text): Includes the agent’s response. Entries lacking an answer were dropped to ensure completeness. ● type (categorical/string): Classifies the ticket intocategories such as Incident, Request, Problem, or Change. ● queue (categorical/string): Identifies the supportdepartment handling the ticket (e.g., Billing, Technical Support, Customer Service, Product Support). ● priority (categorical/string): Indicates the urgencylevel, typically labeled as Low, Medium, or High. ● language (categorical/string): Specifies the language in which the ticket was submitted. ● tag_1 to tag_8: Represent additional descriptors; these columns were consolidated into a single list-based format for streamlined analysis. The data cleaning and preprocessing begins with an initial review of the raw text fields like subject, body, and answer, to identify inconsistencies and noise such as irregular casing, extraneous punctuation, and special characters (including emojis and formatting artifacts). The text is then converted to lowercase to standardize the content, and unwanted characters are removed to ensure that the analysis focuses solely on meaningful words. Subsequently, tokenization is applied to break the text into individual words or tokens. As part of this process, common stopwords (words that typically do not add substantial meaning) are removed to reduce noise and improve the efficiency of the subsequent modeling. After these cleaning steps, the text is transformed into sequences of integers using a custom tokenizer, and the sequences are padded to a consistent maximum length to create a
6
54
Made with FlippingBook - Share PDF online