REPORTS

Exploiting URL Parsers: The Good, Bad, and Inconsistent

January 11, 2022

The Uniform Resource Locator (URL) is integral to our lives online because we use it for surfing the web, accessing files, and joining video chats. If you click on a URL or type it into a browser, you’re requesting a resource hosted somewhere online. As a result, some devices such as our browsers, applications, and servers must receive our URL, parse it into its uniform resource identifier (URI) components (e.g. hostname, path, etc.) and fetch the requested resource.

The syntax of URLs is complex, and although different libraries can parse them accurately, it is plausible for the same URL to be parsed differently by different libraries. The confusion in URL parsing can cause unexpected behavior in the software (e.g. web application), and could be exploited by threat actors to cause denial-of-service conditions, information leaks, or possibly conduct remote code execution attacks.

In Team82’s joint research with Snyk, we examined 16 URL parsing libraries, written in a variety of programming languages, and noticed some inconsistencies with how each chooses to parse a given URL to its basic components. We categorized the types of inconsistencies into five categories, and searched for problematic code flows in web applications and open source libraries that exposed a number of vulnerabilities.

We learned that most of the eight vulnerabilities we found largely occurred for two reasons:

1. Multiple Parsers in Use: Whether by design or an oversight, developers sometimes use more than one URL parsing library in projects. Because some libraries may parse the same URL differently, vulnerabilities could be introduced into the code.

2. Specification Incompatibility: Different parsing libraries are written according to different RFCs or URL specifications, which creates inconsistencies by design. This also leads to vulnerabilities because developers may not be familiar with the differences between URL specifications and their implications (e.g. what should be  checked or sanitized).

Our research was partially based on previous work, including a presentation by Orange Tsai “A New Era of SSRF” and a comparison of WHATWG vs. RFC 3986 by cURL creator, Daniel Stenberg. We would like to thank them for their innovative research.

SHARE:
Price: FREE

About the Provider

Claroty
Claroty bridges the industrial cybersecurity gap between information technology (IT) and operational technology (OT) environments. Organizations with highly automated production sites and factories that face significant security and financial risk especially need to bridge this gap.

TOPICS

Log4j, URL, vulnerabilities