2026-04-11: PostGuard: The Hackathon-Winning AI That Stops Career-Ending Posts
From March 23rd to the 31st, the Computer Science Graduate Society (CSGS) at Old Dominion University hosted their Spring 2026 Hackathon. The competition brought together teams across master's and PhD categories to tackle different research topics, mainly in artificial intelligence (AI). Our team, the Attention Bros (Sandeep Kalari and Dominik Soós), chose to compete in Track 6: Privacy-Preserving AI, alongside four other great teams in the PhD category.
We are grateful to announce that we won first place in the PhD category with our project, PostGuard!
This was a fast-paced challenge completed over a single week. Despite the time constraints, we successfully engineered a novel architecture that balances AI utility. In this blog post, we provide an overview of the problem we tackled, the Privacy Paradox and existing methods, our system architecture; and the mathematically proven findings that secured our victory.
For more details, you can explore our Github repository containing the code, detailed report, and the dataset used in our analysis.
Online Comments Have Lasting Consequences
Social media platforms serve as both a public forum and a digital newsstand. We started by looking at the problem: online comments can cause irreversible, real-world damage. In the heat of the moment, you post something, and before moderation can catch it, someone takes a screenshot. People lose their jobs over 280 characters posted online.
Current moderation systems are 100% reactive, so they only act after the fact. We wanted to build a preventative system where it warns you before you hit send.
The Privacy Paradox and Existing methods
To give users a specific and actionable warning about how a post violates their employee's policies, the system needs to know their personal context, like their job role and employer. However, collecting and processing that data creates a massive surveillance and privacy risk.
When we looked at how existing research handles this problem, we found a significant gap. To prevent someone from posting something that will ruin their career, you have three standard options, all of which fail:
- Content moderation is reactive. It doesn't warn the user; it just punishes them after the fact, and it's also not user-specific.
- Differential Privacy works great for aggregate data but is useless for individual, consequence-based warnings.
- Text Anonymization frameworks like RUPTA are great at removing personally identifiable information (PII) from text, but they strip away the exact context the LLM needs to generate a personalized warning.
| Approach | Reactive? | Pre-posting? | User-specific? | Privacy-aware? |
|---|---|---|---|---|
| Content Moderation | ✅ | ❌ | ❌ | ❌ |
| Differential Privacy | ❌ | ❌ | ❌ | ✅ |
| Text Anonymization (RUPTA) | ❌ | ❌ | ➖ Partial | ✅ |
| PostGuard (Ours) | ❌ | ✅ | ✅ | ✅ |
Building a Dataset
- 15 Real Incident Cases: We pulled verified, real-world firings that were covered by major outlets.
- 20-Article Vector Corpus: We embedded 15 signal articles and intentionally injected 5 noise articles to rigorously test our retrieval precision.
- Synthetic Personas: We generated 15 synthetic users with escalating post histories spanning from 2024 to 2026, mapping them one-to-one with real corporate policies.
The Architecture
- Risk Extraction: We use a lightweight LLM (Gemini Flash) to quickly extract risk factors from the draft and generate a targeted search query.
- RAG layer: We use an embedding model to search our custom vector database for relevant corporate policies and real-world firing precedents.
- Warning Generation: A secondary LLM (Gemini Pro) synthesizes the retrieved precedents and generates a customized, user-facing warning.
- RUPTA Evaluation: Finally, we run a dual-evaluation loop. A P-Evaluator scores the re-identification risk of the data we just processed, and a U-Evaluator scores the utility of the generated warning.
Three Privacy Modes
| Mode | Data Sent to System | Privacy | Utility |
|---|---|---|---|
| Anonymous | Comment text only. No role, no employer, no history. | High — poster nearly unidentifiable | Lower — generic warnings |
| Contextual | Comment + platform + job role. | Medium — role narrows the field | Medium — role-specific warnings |
| Full Profile | Comment + role + employer + recent history. | Low — nearly identifiable | High — employer-policy specific warnings |
Evaluation Results
Privacy vs. Utility
Protecting user identity is just as important as providing accurate warnings. By calculating the Relative Utility Threat (RUT) score, adapted from Soonseok Kim's 2025 MDPI Electronics paper, "Quantitative Metrics for Balancing Privacy and Utility in Pseudonymized Big Data", we proved mathematically that Contextual mode (RUT: 0.824) delivers higher AI accuracy than Full Profile Mode, while exposing only 46% of the relative privacy risk.
| Mode | RUT Score | Utility | Re-id Risk | Interpretation |
|---|---|---|---|---|
| Anonymous | 0.908 | 88 | 0.05 | Excellent — high utility, almost no re-id risk. |
| Contextual | 0.824 | 94 | 0.35 | Best balance — recommended deployment threshold. |
| Full Profile | 0.652 | 92 | 0.75 | Utility gain does not justify the massive privacy cost. |
RAG Retrieval Accuracy
| Metric | Score | What it means |
|---|---|---|
| Hit Rate@1 | 0.80 | Correct article ranked first in 12/15 cases. |
| Hit Rate@3 | 1.00 | Correct article always in top 3 — perfect retrieval. |
| Mean Reciprocal Rank | 0.90 | Average rank position is very high. |
This strong performance is reinforced by a Mean Reciprocal Rank of 0.90, which confirms that the average rank position of the correct information remains consistently high across all queries.
Severity Classification
| Metric | Score | Interpretation |
|---|---|---|
| Precision | 1.000 | Zero false alarms — the system never wrongly warns a safe comment. |
| Recall | 0.533 | 7 cases under-classified — the system is intentionally conservative. |
| F1 Score | 0.696 | Overall classification quality. |
The overall classification quality is represented by an F1 Score of 0.696. The Recall score of 0.533 reflects 7 cases that were under-classified. However, this is a deliberate design choice to make sure the system remains conservative rather than over-restrictive.
Warning Quality
| Dimension | Mean Score | What it measures |
|---|---|---|
| Relevance | 4.67 | Does the warning correctly identify the actual violation? |
| Policy Accuracy | 4.67 | Does it cite the correct policy or law for this specific case? |
| Rewrite Safety | 4.67 | Does the rewrite preserve intent while removing the risk? |
| Prevention Impact | 4.67 | Would this warning likely have prevented the real firing? |
| Overall | 4.67 | Holistic quality score |
Comments
Post a Comment