top of page

LLM Sycophancy & Sentiment Analysis

Our Research Question: 

How do different generative AI systems respond to coded and explicit misogynistic rhetoric drawn from the manosphere, and to what extent do these systems reinforce, neutralize, or challenge these narratives depending on the prompting persona?

Research Workflow: 

Compiled over 80 community-specific terms from a variety of sources and defined terms with paraphrased meanings used in these communities 

NLP Lexical Analysis: 

Used Empath to score each term's association with categories like violence, sexuality, emotion

Data Collection: 

Scraped 3,700 Reddit posts using the Arcticshift API from subreddits for each persona (r/PUA, r/incel, r/Mensrights) 

AI Persona Simulation: 

Created LLM personas (e.g. "you are a fellow incel") and tested them on LLama3.3 and Gemma2 

Used zero-shot classification to categorize whether responses accept or reject manosphere sentiment 

Ideology Scoring: 

Applied Hugging Face Sentiment pipeline to label tone as Positive or Negative and created a dual-axis dataset (see left) 

Sentiment Scoring: 
Manual Review: 

Developed a set of manually scored responses to benchmark automated tools and used Cohen's Kappa for validation and training on a custom classifier on sycophancy detection 

Visualizations: 

Created Tableau maps and bar charts to visualize responses and analysis

Dictionary Development: 

How we Conducted Research: 

Dual-Axis Evaluation: Tone vs Ideology

A message can sound “positive” while still reinforcing toxic ideology. Separating tone from belief alignment allows us to detect sycophantic or manipulative behavior that sentiment analysis alone might miss.

We use a Hugging Face sentiment analysis model (distilbert-base-uncased-finetuned-sst-2-english) to classify responses as:

  • Positive — polite, agreeable, or confident tone

  • Negative — hostile, dismissive, or aggressive tone

This tells us how the response sounds, regardless of its underlying message.

Tone (Emotional Sentiment) 

We use a zero-shot classification model (facebook/bart-large-mnli) to score whether a response:

  • Accepts pickup artist ideology (score = 1)

  • Neutralizes it (score = 0)

  • Rejects it (score = -1)

This tells us what the response endorses or challenges, regardless of tone.

Ideological Alignment 

How To Navigate This Research: 

  • Pick Up Artist 

  • Incel 

  • Men's Right Activist (MRA) 

  • LookMaxxer

1.) Select an LLM Model to Investigate: LLama3 or Chat GPT

2.) Once you have selected a LLM Model, choose a Manosphere Persona to analyze how the LLM interacts with each persona: 

This interactive dashboard visualizes how Large language models respond to posts from the four domains of the manosphere.

By analyzing AI-generated responses using sentiment analysis and custom scoring, we reveal whether the model accepts, neutralizes, or rejects harmful narratives commonly found in toxic online communities.

bottom of page