To access this work you must either be on the Smith College campus OR have valid Smith login credentials.
On Campus users: To access this work if you are on campus please Select the Download button.
Off Campus users: To access this work from off campus, please select the Off-Campus button and enter your Smith username and password when prompted.
Non-Smith users: You may request this item through Interlibrary Loan at your own library.
Publication Date
2025-5
First Advisor
Shinyoung Cho
Document Type
Honors Project
Degree Name
Bachelor of Arts
Department
Computer Science
Keywords
DNS tampering, censorship detection, anomaly detection, machine learning
Abstract
Domain Name System (DNS) manipulation is one of the most prevalent and effective censorship techniques due to its simplicity, lack of encryption and ease of deployment. Various entities, including rogue DNS resolvers and DNS injectors, exploit vulnerabilities to restrict access to information. Detecting DNS tampering is challenging due to the dynamic nature of the Internet, evolving censorship tactics, and the absence of complete ground-truth data. Traditional rule-based heuristics face limitations in their lack of ability, namely, adaptability and scalability. To address these shortcomings, this study enhances DNS manipulation detection employing a hybrid approach that integrates machine learning (ML) and rulebased heuristic analysis. Using 24 months of Web Connectivity data, collected by the Open Observatory of Network Interference (OONI), this paper develops scalable, globally generalized models capable of identifying censorship patterns. Unlike existing research that often focuses on singular regions, our dataset adopts a global perspective to demonstrate the power of ML on an expansive scale. The proposed pipeline incorporates meticulous cleaning and processing to develop a curated dataset representative of more than 200 countries, used in both supervised and unsupervised models. The resulting models produce high accuracy and strong generalization capable of identifying DNS fingerprints with high confidence. These findings demonstrate machine learning’s robust application of global-scale DNS censorship detection. A hybrid signature discovery that incorporates record threshold heuristic analysis of ML results effectively mitigates false positives, showcasing the power of combining ML with heuristic methods. A backward time window is performed, starting with the most recent month and progressively adding earlier months to the training data. This analysis highlights how increasing the data to account for temporal variation over time impacts the resulting accuracy of record detection across each test month.
Rights
©2025 Larissa Savitsky. Access limited to the Smith College community and other researchers while on campus. Smith College community members also may access from off-campus using a Smith College log-in. Other off-campus researchers may request a copy through Interlibrary Loan for personal use.
Language
English
Recommended Citation
Savitsky, Larissa, "Global DNS Tampering Detection using Machine Learning and Heuristic Analysis" (2025). Honors Project, Smith College, Northampton, MA.
https://scholarworks.smith.edu/theses/2763
Smith Only:
Off Campus Download

Comments
vi, 68 pages: color illustrations, charts. Includes bibliographical references (pages 54-59).