To access this work you must either be on the Smith College campus OR have valid Smith login credentials.

On Campus users: To access this work if you are on campus please Select the Download button.

Off Campus users: To access this work from off campus, please select the Off-Campus button and enter your Smith username and password when prompted.

Non-Smith users: You may request this item through Interlibrary Loan at your own library.

Publication Date

2025-5

First Advisor

Shinyoung Cho

Document Type

Honors Project

Degree Name

Bachelor of Arts

Department

Computer Science

Keywords

DNS tampering, censorship detection, anomaly detection, machine learning

Abstract

Domain Name System (DNS) manipulation is one of the most prevalent and effective censorship techniques due to its simplicity, lack of encryption and ease of deployment. Various entities, including rogue DNS resolvers and DNS injectors, exploit vulnerabilities to restrict access to information. Detecting DNS tampering is challenging due to the dynamic nature of the Internet, evolving censorship tactics, and the absence of complete ground-truth data. Traditional rule-based heuristics face limitations in their lack of ability, namely, adaptability and scalability. To address these shortcomings, this study enhances DNS manipulation detection employing a hybrid approach that integrates machine learning (ML) and rulebased heuristic analysis. Using 24 months of Web Connectivity data, collected by the Open Observatory of Network Interference (OONI), this paper develops scalable, globally generalized models capable of identifying censorship patterns. Unlike existing research that often focuses on singular regions, our dataset adopts a global perspective to demonstrate the power of ML on an expansive scale. The proposed pipeline incorporates meticulous cleaning and processing to develop a curated dataset representative of more than 200 countries, used in both supervised and unsupervised models. The resulting models produce high accuracy and strong generalization capable of identifying DNS fingerprints with high confidence. These findings demonstrate machine learning’s robust application of global-scale DNS censorship detection. A hybrid signature discovery that incorporates record threshold heuristic analysis of ML results effectively mitigates false positives, showcasing the power of combining ML with heuristic methods. A backward time window is performed, starting with the most recent month and progressively adding earlier months to the training data. This analysis highlights how increasing the data to account for temporal variation over time impacts the resulting accuracy of record detection across each test month.

Rights

©2025 Larissa Savitsky. Access limited to the Smith College community and other researchers while on campus. Smith College community members also may access from off-campus using a Smith College log-in. Other off-campus researchers may request a copy through Interlibrary Loan for personal use.

Language

English

Comments

vi, 68 pages: color illustrations, charts. Includes bibliographical references (pages 54-59).

Share

COinS