Sunday, December 8, 2013

Search engine reconnaissance - obtaining usernames from metadata

In my current role as a Senior Security Engineer I am often refereed to as “Lead Penetration Tester” because I have the joy of attempting to infiltrate websites; bypass or brute-force authentication mechanisms, obtain usernames/passwords, discover logic flaws, inject SQL... essentially identify and exploit vulnerabilities. This is called Web Application Penetration Testing – its what I do, targeting websites that my company owns with explicit written permission to do so while utilizing a four-step methodology: Reconnaissance, mapping, discovery and exploitation.

This post is going to cover ground in the area of reconnaissance, as taking the time to conduct proper and thorough reconnaissance can greatly enhance the likelihood of a fruitful penetration test. Specifically, this post will discuss the topic of leveraging the work that search engines have already performed in terms of spidering a site, caching content and indexing pages in a queryable fashion, for the purpose of obtaining valid usernames of the in-scope application and organization/company. Note the differentiation between application and organization usernames. Application usernames can be completely different than the usernames of employees of an organization; think of a personal email account (application account) contrasted with the the Active Directory Domain account (company account) of the system administrator working for the company that provides the email service, two separate accounts, two different ways of logging in.

Since this post is centered around search engine reconnaissance and most all internet search engines make use of a robot to spider the web, let's take a moment to mention the difference between a well-behaved web crawler (in this context synonymous with bot/search robot/ web spider) and the opposite without delving into the intricacies of the bot and the various policies that govern how a bot works, as this will come into play later on.

A well behaved web crawler will reference the rules set fourth by the administrator of a website e.g. the robots.txt file if it exists and meta references in the HTML source code:


A misbehaving web crawler will disregard the aforementioned wishes of the administrator instead opting to download and parse everything that it can from a given site. This is worth mentioning because of the content contained within a robots.txt file – directories that are at least interesting and on the other end of the spectrum, potentially sensitive. To reiterate what this means: Website administrators (or default application settings) are sometimes guilty of specifying the files and folders, in the robots.txt file, that they DO NOT want to be public knowledge. A well behaved spider like the “googlebot” will honor this, a misbehaving spider will do the opposite.

Using popular search engines to gather information during the reconnaissance phase of a web application penetration test is par for the course. Leveraging the built-in capabilities (search operators) of a search engine can prove to be very useful to narrow down results and hone in on the specific information one is seeking. This information can be anything from determining what sites have links to the target domain to identifying directory browsing and cached SSL content including usernames and authenticated (logged in) session attributes.  

Identifying valid application users is almost always (in my experience) achievable via search engine reconnaissance or through a username harvesting vulnerability in the web application. It is very difficult to present a web interface that can conveniently let a valid user know she has failed to login, has successfully logged in/ advanced in the mulit-factor authentication process (MFA), has provided valid or invalid information in an attempt to reset a password all in a cumulative fashion– and not confirm that the provided user name is indeed a valid application user. Identifying valid users of the target web application is useful, identifying valid user names of the target organization (which may require a different approach) can also prove to be useful for social engineering and SQL injection attacks to name two.

Organizational usernames typically follow a naming convention such as first initial followed by last name “cjohnson” or first name dot last name “calvin.johnson” and determining this is usually a doable task. Once again relying on the work search engines have already performed, it is time to couple that work with the knowledge that metadata is often found in documents that have been published to the web and metadata will sometimes include organizational usernames. A penetration tester can discern not only the naming convention that the target organization has established, but also a plethora of usernames depending on certain circumstances such as the amount of employees the target employs, the web footprint and availability of web-facing documentation (think PDFs). Take for instance the following search query: ext:pdf

The “site” operator tells the search engine to only search whereas the “ext” operator specifies to only return results for PDF documents; if there are any PDFs on, the above query would return those results. Try replacing “” with the name of your own organization's website and.... If your organization's website has web-facing PDFs, and the PDFs are allowed to be indexed per the rules set fourth by your website administrator then you may see some results. If your company has web-facing PDFs, or other documents like .doc, .docx, .xls, .xlsx, etcetera, and a policy that does not allow its web content to be searched and indexed, try the same search query (altering the “ext” operator as needed) from a misbehaving search engine, one that does not honor robots.txt or the HTML meta tags, and compare results. Download the results and parse the metadata of the documents to look for “author” for instance... note any valid usernames?

After manually performing the above queries, changing the extension operator, downloading files one by one and parsing metadata, one ponders a method of automation. To that end I wrote a script that takes the target website as input and proceeds to leverage search engines to determine if the target has documents (DOC, PDFs, spreadsheets, etc) on it's website. Discovered documents are downloaded and subsequently have their metadata parsed for potentially interesting information such as usernames.

When performing search engine reconnaissance it is important to vet the results and understand what you are looking at as well as looking for i.e “Adobe Photoshop” is not a valid username. Keep in mind the most popular search engines usually yield the most and best results, they also typically honor the robots.txt file which can limit the results. That's all for now, lots more ground to cover...

Friday, October 25, 2013

SANS SEC 542: Web App Penetration Testing and Ethical Hacking

SANS SECURITY 542: Web App Penetration Testing and Ethical Hacking

The SANS Mentor Program is coming to the Denver metro area with SECURITY 542: Web App Penetration Testing and Ethical Hacking. (

Meeting once a week after work, you'll learn SANS Web App Penetration Testing and Ethical Hacking in our popular Mentor multi-week format, with time between classes to absorb and master the material. You also receive downloadable MP3 files of the full class being taught to enhance your studies.

Course Details:
SECURITY 542: Web App Penetration Testing and Ethical Hacking
Start Date: Thursday April 3rd, 6:00-8:00pm
Location: Aurora Colorado 
Registration and Tuition details:

Each week your local Mentor, Serge Borso will highlight the key concepts you need to know and assist you with hands on labs and exercises.  From attack methodology to server-side discovery, you'll be learning the exploits and tools needed to protect your systems from attack. The class wraps up with a Capture the Flag event where the students will be able to use the methodology and techniques explored during class to find and exploit the vulnerabilities within an intranet site.  Each week you will be able to show off your knowledge the next day at the office!

Train Local and Save on the same material taught at SANS six-day conferences.

This could turn out to be an opportunity of a lifetime! Really, its going to be awesome!

Thursday, October 17, 2013

Speaking of javascript

From an authorized Penetration Tester’s perspective, wanted to talk a little bit about cross site scripting (XSS) cross site request forgery (XSRF or CSRF) the differences between the two and show some examples of these server-side vulnerabilities in action. Let’s start with cross site scripting and weave our way into cross site request forgery later on...
Cross site scripting is an attack and it’s also commonly referenced as a vulnerability (as is XSRF) so let’s take a moment to clarify. When a web application is susceptible to XSS it’s said that the website (web application) has a XSS vulnerability. When an attacker injects a malicious script into the vulnerable website; that action is the XSS attack.
In order for XSS to exist there has to first be a vulnerability in the web server or web application that allows malicious users (attackers) to upload/inject code. The vulnerability itself is precipitated by a serious underlying issue: lack of input validation/sanitation. It's the failure of the web application to validate user supplied input prior to rendering it for the end-user that causes the vulnerability in the first place.
One point that should be made clear is that while it's the web application that exhibits the XSS vulnerability, it’s the end-user of that web application that is the “target” of the attack i.e. the recipient of the malicious script mentioned above is the end-user. One last quick note before showing some examples; two things are being exploited in a successful XSS attack: one or more vulnerabilities on the website that exposes the XSS attack vector, and the trust the end-user/client has in that website to allow scripts to execute.
This is my browser, it’s also called a user-agent and I, as a consumer of what my browser renders, am referred to as the end-user.

I am on a fictional banking website which is being served from a VM on my internal testing network. This banking website is host to a vulnerable web application which should prove perfect for showing XSS examples. And now is the perfect time to mention that no packets are ever sent to a target without explicit authorization from the owners and those responsible for the target asset/network. During the reconnaissance phase of a web application penetration testing engagement, one of the first things I like to do is check the 404 handling, how graceful is it, and how exactly how does the application or web server handle notifying a user when a requested resource is not available on the server... will the user be redirected to the home page, will the requested resource be echoed back to the user in a message, will the server return a 503 or will something else happen?

In the case of, the requested resource is echoed back to the user. The resource "blahblah" does not exist on the server and the bank website lets us know that by displaying a message to that effect. This can be tested for and reproduced by altering the end of the URL to a value that you are confident does not exist – remember we are simply requesting a web page from the server.
To recap, I put in some pseudo random text at the end of the URL, pressed enter, and the web site echoed it back to me when the page rendered... let’s find out if the input is sanitized before being rendered as output. This can be done by requesting something like the infamous: <script>alert("xss");</script>

The JavaScript pop-up lets us know that the webserver is vulnerable to XSS. In order to exploit this, one could E-mail the target victim the URL shown above in the screenshot for example, and if:
  • The victim was on my internal testing network
  • Clicked on the link
  • Was allowing scripts to execute in her browser
  • And her browser chose to execute this specific script (keep in mind browsers have built-in protections for a lot of this stuff)
Then she would be presented with a javascript pop-up with the letters x-s-s.
A few things have to line-up for this attack to work, the pop-up box is just a quick way to identify the cross site scripting vulnerability; if these efforts get the XSS pop-up to execute then additional efforts can get malicious code to execute as well.
Using just a browser to find and validate a reflected XSS vulnerability did not require anything particularly noteworthy, a simple typo in the URL would cause similar results – unfiltered user data submitted via a GET request in the URL being echoed back to the user's browser. There is however a lot more to XSS, this is just the tip of the iceberg.

XSRF is more challenging for me to explain succinctly, which is motivation to write this. When a web application allows a browser to initiate a sensitive transaction without validating the end-user actually wanted to initiate said transaction, the web application is considered to be vulnerable to XSRF. Sticking with the bank theme, I will log back into my fictitious account and examine the wire transfer section to learn what options are available and how to interact with the form to transfer funds.
During the login process a session cookie was set on my machine which grants me authenticated access to the website, this cookie will be sent with every HTTPS request my browser makes to the website, whether I click a link or my browser is otherwise convinced into sending a request to the website.
 In viewing the wire transfer page I first observe what it looks like in a browser, then view the source code, then view the underlying HTTP traffic in an interception proxy; all the time looking for predictable parameters.

The wire transfer form consists of several fields that are required to be populated with data such as the dollar amount of the transfer, recipient account, routing and account information…. But no additional password prompt or hidden random token to further authenticate the transaction. Therein lies the issue, predictable form parameters coupled with the lack of transaction validation, this is how it works:
  • A malicious authorized banking user*, examines the wire transfer form just as above, and crafts a single webpage that will serve as the attack platform. Written as such:

  • Next the attacker must wait until the target user logs into his online banking account and then prompt the user to visit the crafted webpage above
  • The crafted webpage uses javascript to submit a wire transfer request of $130,487 from the target/victim account to an account of the attackers choosing via a hard-coded post request
  • The request is coming from the newly opened tab in the victims browser, the banking site treats the request as a legitimate wire transfer request submitted willfully by the user
  • The request however was made surreptitiously unbeknownst to the victim
  • At this point the victim has lost $130,487 which is evident once the balance page is refreshed

This example used a specially crafted webpage that the victim had to be coaxed into browsing to. Other attack platforms include public message boards where user-provided image links are used to launch XSRF attacks on unsuspecting visitors of the site. Once again however, many variables have to be just right for this attack to work but javascript being enabled is not one which is a big difference when contrasted with XSS. One of the biggest distinctions between XSS and CSRF is that when dealing with XSS, malicious code is executed in the browser and in the case of CSRF, code is executed on the server. That's all for now, still lots of ground to cover... next time in less than 1000 words hopefully.
* This could be any customer of the bank enrolled in online banking