The client has accepted our proposal, and we’re going to carry out the pentesting assignment together! The target will be the web application (owned by his healthtech company): example.com.
Now it’s time for us to get down to business.
Let’s return to Etienne’s metaphor, which draws a parallel between a penetration test and a bank robbery.
A bank’s main vault is where most of its assets are stored, so it’s usually heavily protected. So, before charging in full force, you’ll want to gather as much information as possible. The same applies to a penetration test. And what better way to gather initial information about your target than by googling it!
Harness the Full Power of Google
Google’s algorithm makes it easy for anyone to find answers to most of their questions, whether they’re a computer newbie or a seasoned expert.
Broadly speaking, we know that Google:
indexes the web by crawling every page its bots come across.
understands your question.
runs all that through the Google mill to come up with an answer.
But did you know that you could also use Google to:
find vulnerabilities.
harvest sensitive data.
all based on the results of the Googlebot’s crawling and indexing of websites?
Imagine you want to search for all the PDF documents indexed on the root-me.org
site that contain the word pass
(for password
). You never know when a Root Me operating manual might have been inadvertently exposed on the internet and indexed!
Just type site:root-me.org ext:pdf "pass"
in the Google search bar:
In this search request, we’re telling Google the following:
site:root-me.org
: search only on theroot-me.org
domain and its subdomainsext:pdf
: search only PDF files"pass"
: search PDF files containing the exact string pass
Note that you can also exclude certain results using the -
character. For example, if we had wanted to search for all files, EXCEPT PDF files, we could have used -filetype:pdf
( filetype
and ext
are aliases in Google’s language).
Okay, no luck this time, this isn’t how we’re going to compromise the root-me.org site. But don’t discount this technique in the future and the human errors that your targets may make!
Okay, but I’d still like to see what happens when you find a vulnerability this way. Isn’t there a way to see what the results would look like?
Ah! We suspected you’d be a little disappointed. Don’t worry, you’ll feel better once you’ve done this next little experiment!
Over to You!
Challenge
Here’s a message received from an anonymous source:
“Please help me! I think I remember an article written by someone called ‘Lebrun’ on the Root Me site, but my memory is so bad I can’t remember the exact subject of the article, only that it was a PDF.”
As a pentester-in-training, can you help?
Solution
Use Other Data Sources
Google is just one way of searching some of these sources, whether with a normal search request or with Google Dorks.
To carry out your passive reconnaissance without touching the healthtech application directly, you need to search for as much information as possible:
Who are the employees of the software company that produces example.com, and what are their email addresses?
Would these email addresses have been included in a leaked database?
Does the company have a code repository on GitHub or GitLab?
Have any messages relating to the target application been posted on specialist forums?
For example, here’s the start of theHarvester console output when you search on the root-me.org domain:
But we can take things a step further when looking for sources of information about the target. We can use services to:
look in database leaks to see whether the users we’ve identified have left their passwords unchanged.
retrieve data directly from leaks on the dark web or specialized deep web.
The authenticated portion of Root Me, for example, may be part of the deep web!
Let’s Recap!
It is possible to find security-sensitive information about an application or information system from publicly available sources indexed by Google.
You can find this type of information by using what are known as Google Dorks.
You may also find other sensitive information (such as login details) from sources that are not indexed but are still available, such as commit histories on GitHub or leaked databases.
In the next chapter, we’ll take a look at how we might expand the attack surface, or scope, provided we have the client’s permission.