Using Machine Learning to Overcome Sentiment Analysis Challenges

Cornerstone Research

Email

312.345.7300

Bio and Articles

Find Your Next Job !

Paralegal - Illinois

Associate Litigation Attorney

Social Services Attorney

Explore More Job Openings

Overcoming Sentiment Analysis Challenges with Machine Learning

by: Cornerstone Research of Cornerstone Research - Reports

Sunday, July 30, 2023

Print Mail Download info_icon_img

/>i

When utilized by highly skilled data scientists to engineer a rightsized solution for a given application, sentiment analysis can be a highly effective tool in litigation.

What is sentiment analysis?

Sentiment analysis produces estimates of the attitude or tone present in a natural language excerpt. Often these estimates will fall along a single dimension, a continuum from purely negative to purely positive. A more complex form of sentiment analysis, known as emotional analysis, may generate estimates along multiple, more nuanced emotional dimensions like anger, joy, or embarrassment.

Sentiment analysis applications in litigation

Sentiment analysis provides valuable insights when applied in litigation. For example, the impact of alleged marketing misrepresentations may be measured by the change in public sentiment toward a product prior to and following the allegedly misleading marketing campaign. Similarly, analyses in defamation matters may measure public sentiment toward an entity before and after the allegedly defamatory statements appeared. Sentiment analysis can also provide an objective measure of the sentiment contained in the allegedly defamatory statements themselves. For matters concerning the quality or defectiveness of specific product features, sentiment analysis can provide an assessment of the average consumer’s perception of the quality of at-issue features relative to other product features or similar features of competing products.

Sentiment analysis data sources

Sentiment analysis can be applied to any text content, but user-generated content is the most abundant and commonly used. This content originates on social media sites, retail sites that host consumer reviews, and other forums for public discussion. Additional content, like marketing materials or internal communications, may also be evaluated using sentiment analysis. The smaller volume generally associated with these data types may limit the value of efficient programmatic sentiment analysis, but the objectivity provided by such an analysis can be beneficial.

While the most common form of sentiment analysis is text, methods also exist for detecting sentiment in audio and video. These methods supplement the content of subject statements with additional features, like the tone of voice and facial movements. Extratextual information can increase the accuracy of the analysis, especially in the context of linguistic complexities like sarcasm.

Sentiment analysis approaches

Several approaches to sentiment analysis exist, and the discipline remains an area of active research.

Lexicon

The simplest and oldest form of sentiment analysis is lexicon-based. In this approach, the researcher obtains or creates a list of terms associated with negative and positive sentiments. The researcher then identifies the number of positive and negative terms in each text. These counts may be aggregated into a single comprehensive score normalized by the length of the text or the number of relevant terms.

Lexicon-based, with rules

Sentiment analyses based on a lexicon alone are simple to explain, but they need to be improved. Specifically, purely lexicon-based approaches ignore the context that could significantly impact the interpretation of relevant terms. In response, approaches evolved to combine carefully curated lexica with additional rules to modify the sentiment according to the surrounding terms and style. These rules adjust for punctuation, capitalization, intensifiers (e.g., “very”), and negation.

Custom classifier models

General purpose lexica may perform adequately in many different environments but fail in the presence of highly domain-specific language. Such language may be completely absent from general purpose lexica, and certain terms may have opposite connotations from their more typical uses. A custom sentiment classifier model trained on language from a similar domain can learn these distinctions in these situations.

In the example below, slang terms like “swole” are missing from the general-purpose lexicon, and terms with positive connotations in the fitness space, like “ripped,” “shredded,” and “failure,” express negative sentiment more generally. By training a model on fitness-specific language, the custom classifier is able to properly identify the relevance and sentiment associated with these terms.

Aspect-based sentiment with dependency parsing

Basic custom sentiment classifiers perform well on short, consistent text. In many real-world use cases, however, text subject to sentiment analysis often contains multiple individual sentiments directed toward different aspects. For example, a product review of a cell phone might express a strong positive sentiment toward the camera and screen and a strong negative sentiment toward the battery and responsiveness. Analyzing such a review in its entirety might inappropriately result in an overall neutral sentiment, as the sentiment toward one feature negates the opposite sentiment toward another.

Some aspect-based sentiment models resolve this issue by parsing and diagramming each text to identify the language most closely related to each aspect, then analyzing that language separately. Aspects may be tagged manually, or complementary models can extract aspects automatically.

In assessing only the most relevant language, these models can identify multiple potentially conflicting sentiments toward different aspects that an analysis of overall sentiment might combine and incorrectly consider neutral.

Aspect-based sentiment with attention

Rather than parse formal linguistic syntactical dependencies, recent state-of-the-art language models use attention to learn the strength of the relationships between terms. This attention component is fundamental to transformer-based large language models like Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformers (GPT) derivatives. This structure’s capacity to efficiently remember complex dependencies over long sequences contributes to the unprecedented accuracy of these models.

Applying this attention concept to aspect-based sentiment analysis further enhances model accuracy. The matrix below depicts an example of attention between each term in a review.

As shown, by using attention, the model tracks relationships through several levels of noun-adjective and pronoun-antecedent pairs.

Sentiment analysis in practice

With several options for sentiment analysis approaches, each with their own advantages and disadvantages, selecting the best sentiment analysis method must include careful consideration of the task at hand. The simplest models may need to perform more adequately on lengthy, complex text with specialized vocabulary. For simple texts, state-of-the-art models may require too much time to implement, be too costly to customize, and be too complex for lay audiences to comprehend and trust. An efficient and effective sentiment analysis requires a deep understanding of these different model mechanisms and their corresponding trade-offs. When utilized by highly skilled data scientists to engineer a rightsized solution for a given application, sentiment analysis can be a highly effective tool in litigation.