Conduct a Web Penetration Test

10 hours
Medium

Free online content available in this course.

course.header.alt.is_certifying

Got it!

Last updated on 9/27/24

Research Information on the Target and Its Ecosystem

The client has accepted our proposal, and we’re going to carry out the pentesting assignment together! The target will be the web application (owned by his healthtech company): example.com.

Now it’s time for us to get down to business.

Let’s return to Etienne’s metaphor, which draws a parallel between a penetration test and a bank robbery.

A bank’s main vault is where most of its assets are stored, so it’s usually heavily protected. So, before charging in full force, you’ll want to gather as much information as possible. The same applies to a penetration test. And what better way to gather initial information about your target than by googling it!

Harness the Full Power of Google

Google’s algorithm makes it easy for anyone to find answers to most of their questions, whether they’re a computer newbie or a seasoned expert.

Broadly speaking, we know that Google:

indexes the web by crawling every page its bots come across.
understands your question.
runs all that through the Google mill to come up with an answer.

But did you know that you could also use Google to:

find vulnerabilities.
harvest sensitive data.
all based on the results of the Googlebot’s crawling and indexing of websites?

Imagine you want to search for all the PDF documents indexed on the root-me.org site that contain the word pass (for password ). You never know when a Root Me operating manual might have been inadvertently exposed on the internet and indexed!

Just type site:root-me.org ext:pdf "pass" in the Google search bar:

In this search request, we’re telling Google the following:

site:root-me.org : search only on the root-me.org domain and its subdomains
ext:pdf : search only PDF files
"pass" : search PDF files containing the exact string pass

Note that you can also exclude certain results using the - character. For example, if we had wanted to search for all files, EXCEPT PDF files, we could have used -filetype:pdf ( filetype and ext are aliases in Google’s language).

Okay, no luck this time, this isn’t how we’re going to compromise the root-me.org site. But don’t discount this technique in the future and the human errors that your targets may make!

Okay, but I’d still like to see what happens when you find a vulnerability this way. Isn’t there a way to see what the results would look like?

Ah! We suspected you’d be a little disappointed. Don’t worry, you’ll feel better once you’ve done this next little experiment!

Over to You!

Challenge

Here’s a message received from an anonymous source:

“Please help me! I think I remember an article written by someone called ‘Lebrun’ on the Root Me site, but my memory is so bad I can’t remember the exact subject of the article, only that it was a PDF.”

As a pentester-in-training, can you help?

Solution

Use Other Data Sources

Google is just one way of searching some of these sources, whether with a normal search request or with Google Dorks.

To carry out your passive reconnaissance without touching the healthtech application directly, you need to search for as much information as possible:

Who are the employees of the software company that produces example.com, and what are their email addresses?
Would these email addresses have been included in a leaked database?
Does the company have a code repository on GitHub or GitLab?
Have any messages relating to the target application been posted on specialist forums?

For example, here’s the start of theHarvester console output when you search on the root-me.org domain:

Start of theHarvester console output for the root-me.org domain

But we can take things a step further when looking for sources of information about the target. We can use services to:

look in database leaks to see whether the users we’ve identified have left their passwords unchanged.
retrieve data directly from leaks on the dark web or specialized deep web.

The authenticated portion of Root Me, for example, may be part of the deep web!

Let’s Recap!

It is possible to find security-sensitive information about an application or information system from publicly available sources indexed by Google.
You can find this type of information by using what are known as Google Dorks.
You may also find other sensitive information (such as login details) from sources that are not indexed but are still available, such as commit histories on GitHub or leaked databases.

In the next chapter, we’ll take a look at how we might expand the attack surface, or scope, provided we have the client’s permission.