Hanna Shcharbakova

✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨

Hi👋, I’m Hanna Shcharbakova, a Senior AI Engineer at Pleias, working on open science and leading AI-enhanced digital education projects with a focus on synthethic data generation and small language models pretrataining. I am interested in multilingual NLP, disinformation, responsible AI, and AI safety (especially evals & gradual disempowerment).

I hold a double MSc degree from the Erasmus Mundus European Masters Program in Language and Communication Technologies at University of Saarland 🇩🇪 and University of Lorraine 🇫🇷, with exchange studies at University of Groningen 🇳🇱.

During my master’s, I worked as an ML Engineer at Transcrime on EU-funded projects focusing on fake news detection, illicit firearms trafficking, and terrorist content identification online.

I obtained my BA in Fundamental and Computational Linguistics at HSE University where I worked in 2 research labs: the Learner Corpora Lab and Laboratory of Methods for Big Data Analysis.

News 📌

2026-02: 📜 Completed the Technical AI Safety Course from BlueDot Impact - deepening my expertise in AI safety!

2025-12: 🎉 Joined Pleias as a Senior AI Engineer! Working on open science projects with focus on synthetic data and small language models, and leading AI-enhanced digital learning initiatives for Senegal.

2025-11: 📚 Completed the Introductory EA Program!

2025-08: 🎓 Invited to the EM LCT Summer School to share my experience studying at this program with LCT students!

2025-08: 💥 Delighted to share that Cross-Lingual Fact Verification: Analyzing LLMs Performance Patterns Across Languages accepted at RANLP 2025. See you in Bulgaria! 🇧🇬

2025-07: 🎓 Graduated from University of Saarland with an excellent grade of 1.5!

2025-07: ✈️ Attended the ACL 2025 to present my research - incredible insights and networking opportunities!

2025-07: 🏆 Thrilled to receive a fully funded grant to attend the EEML Summer School and honored to receive the Best Poster Award!

2025-06: 🎤 Selected for oral presentation at ACL FEVER Workshop 2025 for When Scale Meets Diversity: Evaluating Language Models on Fine-Grained Multilingual Claim Verification. Achieved new state-of-the-art results with 57.7% macro-F1 using XLM-R - a 15.8% improvement over previous best!

2025-05: 🏫 Selected to participate in the M2L Summer School!

2025-04: 🏆 Excited to share that our team achieved remarkable results in the SemEval Mu-SHROOM shared task, placing in the top 10 across 9 language tracks for hallucination detection, including 2nd place for Mandarin Chinese! Our paper was also accepted for publication!

2025-02: 🎓 Successfully defended my thesis at University of Saarland and received a 1.1 grade!

2025-02: 🚀 Completed major milestone in ALLIES project - our real-time terrorism content detection system is now successfully deployed. We presented the results at a conference in Spain!

2024-12: 🇳🇱 Completed my Dutch course at A2 level at University of Groningen!

2024-11: 📊 Started working on a SemEval shared task at University of Groningen!

2024-10: 🤝 Partnered with a startup to improve AI-generated text detection for the community!

2024-09: 🎓 Started Erasmus+ exchange at University of Groningen, Netherlands. Grateful for the Erasmus+ and Santander Scholarships supporting this academic journey.

2024-09: 🏫 Attended Athens NLP Summer School - amazing learning experiences!