2026-04-11: PostGuard: The Hackathon-Winning AI That Stops Career-Ending Posts


From March 23rd to the 31st, the Computer Science Graduate Society (CSGS) at  Old Dominion University hosted their Spring 2026 Hackathon. The competition brought together teams across master's and PhD categories to tackle different research topics, mainly in artificial intelligence (AI). Our team, the Attention Bros (Sandeep Kalari and Dominik Soós), chose to compete in Track 6: Privacy-Preserving AI, alongside four other great teams in the PhD category.

We are grateful to announce that we won first place in the PhD category with our project, PostGuard!

This was a fast-paced challenge completed over a single week. Despite the time constraints, we successfully engineered a novel architecture that balances AI utility. In this blog post, we provide an overview of the problem we tackled, the Privacy Paradox and existing methods, our system architecture; and the mathematically proven findings that secured our victory.

For more details, you can explore our Github repository containing the code, detailed report, and the dataset used in our analysis. 


Online Comments Have Lasting Consequences

Social media platforms serve as both a public forum and a digital newsstand. We started by looking at the problem: online comments can cause irreversible, real-world damage. In the heat of the moment, you post something, and before moderation can catch it, someone takes a screenshot. People lose their jobs over 280 characters posted online. 

Current moderation systems are 100% reactive, so they only act after the fact. We wanted to build a preventative system where it warns you before you hit send. 

The Privacy Paradox and Existing methods

To give users a specific and actionable warning about how a post violates their employee's policies, the system needs to know their personal context, like their job role and employer. However, collecting and processing that data creates a massive surveillance and privacy risk.

When we looked at how existing research handles this problem, we found a significant gap. To prevent someone from posting something that will ruin their career, you have three standard options, all of which fail:

  • Content moderation is reactive. It doesn't warn the user; it just punishes them after the fact, and it's also not user-specific. 
  • Differential Privacy works great for aggregate data but is useless for individual, consequence-based warnings.
  • Text Anonymization frameworks like RUPTA are great at removing personally identifiable information (PII) from text, but they strip away the exact context the LLM needs to generate a personalized warning. 
We needed a system that acted pre-posting, user-specific, and privacy-aware. That's why we built PostGuard. 
Approach Reactive? Pre-posting? User-specific? Privacy-aware?
Content Moderation
Differential Privacy
Text Anonymization (RUPTA) ➖ Partial
PostGuard (Ours)

Building a Dataset

To rigorously test our system, we couldn't just use standard benchmarks, so we built a dataset grounded in reality. We spent the first phase of the hackathon collecting a custom dataset:
  1. 15 Real Incident Cases: We pulled verified, real-world firings that were covered by major outlets. 
  2. 20-Article Vector Corpus: We embedded 15 signal articles and intentionally injected 5 noise articles to rigorously test our retrieval precision.
  3. Synthetic Personas: We generated 15 synthetic users with escalating post histories spanning from 2024 to 2026, mapping them one-to-one with real corporate policies. 

The Architecture

PostGuard intercepts risky posts before the user hits submission. To do this without leaking the user's data to the web, we built a four-stage pipeline:
  1. Risk Extraction: We use a lightweight LLM (Gemini Flash) to quickly extract risk factors from the draft and generate a targeted search query.
  2. RAG layer: We use an embedding model to search our custom vector database for relevant corporate policies and real-world firing precedents.
  3. Warning Generation: A secondary LLM (Gemini Pro) synthesizes the retrieved precedents and generates a customized, user-facing warning.
  4. RUPTA Evaluation: Finally, we run a dual-evaluation loop. A P-Evaluator scores the re-identification risk of the data we just processed, and a U-Evaluator scores the utility of the generated warning.
Figure 1. System architecture detailing the four stages of evaluation: (1) Initial comment ingestion, (2) Contextual policy retrieval via RAG, (3) Severity classification, and (4) Generation of the intent-preserving warning

Three Privacy Modes

The core of our privacy-preserving approach is user control. The system operates in three modes that dictate what data is forwarded through the pipeline. 

Mode Data Sent to System Privacy Utility
Anonymous Comment text only. No role, no employer, no history. High — poster nearly unidentifiable Lower — generic warnings
Contextual Comment + platform + job role. Medium — role narrows the field Medium — role-specific warnings
Full Profile Comment + role + employer + recent history. Low — nearly identifiable High — employer-policy specific warnings

Evaluation Results

The evaluation of our moderation and warning system demonstrates a highly effective balance between accuracy, user intent preservation, and privacy. To understand the system's true performance, we analyzed it across four core dimensions: Privacy vs. Utility, RAG Retrieval Accuracy, Severity Classification, and Warning Quality. Here is a breakdown of the metrics we used and why they matter.

Privacy vs. Utility

Protecting user identity is just as important as providing accurate warnings. By calculating the Relative Utility Threat (RUT) score, adapted from Soonseok Kim's 2025 MDPI Electronics paper, "Quantitative Metrics for Balancing Privacy and Utility in Pseudonymized Big Data", we proved mathematically that Contextual mode (RUT: 0.824) delivers higher AI accuracy than Full Profile Mode, while exposing only 46% of the relative privacy risk.

Mode RUT Score Utility Re-id Risk Interpretation
Anonymous 0.908 88 0.05 Excellent — high utility, almost no re-id risk.
Contextual 0.824 94 0.35 Best balance — recommended deployment threshold.
Full Profile 0.652 92 0.75 Utility gain does not justify the massive privacy cost.
The RUT framework validates that feeding the LLM maximum personal data ("Full Profile") yields diminishing returns by transforming various risk and utility metrics into a unified, probabilistic scale. Contextual mode sits right at the optimal deployment threshold, giving the AI just enough context, like the job role and platform, to generate highly specific warnings without sacrificing anonymity.

RAG Retrieval Accuracy

To evaluate RAG Retrieval Accuracy, the system was tested on 15 real-world incident cases against a mixed corpus. The results demonstrate highly effective document sourcing, achieving a Hit Rate@1 score of 0.80, meaning that the correct article ranked first in 12 out of the 15 cases. Furthermore, the system also achieved perfect retrieval with a Hit Rate@3 score of 1.00, ensuring the relevant articles was always surfaced within the top three results. 

Metric Score What it means
Hit Rate@1 0.80 Correct article ranked first in 12/15 cases.
Hit Rate@3 1.00 Correct article always in top 3 — perfect retrieval.
Mean Reciprocal Rank 0.90 Average rank position is very high.

This strong performance is reinforced by a Mean Reciprocal Rank of 0.90, which confirms that the average rank position of the correct information remains consistently high across all queries. 

Severity Classification

The system prioritizes user trust and accuracy in Severity Classification, which measures its binary classification performance for detecting high and critical violations. It achieved a perfect Precision score of 1.000, guaranteeing zero false alarms so the system never wrongly warns a safe comment. 

Metric Score Interpretation
Precision 1.000 Zero false alarms — the system never wrongly warns a safe comment.
Recall  0.533 7 cases under-classified — the system is intentionally conservative.
F1 Score 0.696 Overall classification quality.

The overall classification quality is represented by an F1 Score of 0.696. The Recall score of 0.533 reflects 7 cases that were under-classified. However, this is a deliberate design choice to make sure the system remains conservative rather than over-restrictive. 

Warning Quality

Traditional metrics like exact-match or BLEU scores fail to capture the nuance of rewritten text. for this reason, we used an "LLM-as-Judge" framework to score the qualitative aspects of the AI's output on a 5-point scale. This allowed us to measure subjective dimensions like Relevance, Policy Accuracy, Rewrite Safety, and Prevention Impact at scale. 

Dimension Mean Score What it measures
Relevance 4.67 Does the warning correctly identify the actual violation?
Policy Accuracy 4.67 Does it cite the correct policy or law for this specific case?
Rewrite Safety 4.67 Does the rewrite preserve intent while removing the risk?
Prevention Impact 4.67 Would this warning likely have prevented the real firing?
Overall 4.67 Holistic quality score
The overall mean score of 4.67/5 confirms the system acts as a helpful, accurate coach that preserves the user's original intent while effectively neutralizing the career risk. We found that the system successfully rewrites drafts to preserve their intent while removing career risks.

Looking Forward

The internet doesn't have to be a trap door. With PostGuard, we proved that we can give users specific and potentially career-saving warnings without turning AI tools into surveillance machines.

We are incredibly grateful to the CSGS organizers for putting together such a challenging and rewarding event. Earning first place in the PhD category was the peak of a long, exhausting, and incredibly fun week of research. 

Thanks for reading, and watch what you comment!

~Dominik Soós (@DomSoos)

Comments