Shane Hartman – CISSP, GCIA, GREM Suncoast Security Society
Google Operators / Directives Ten Google searches Other Google searches and Info gathering Automated Tools Protecting Yourself
Always use these characters without surrounding spaces! • ( + ) force inclusion of something common • ( ‐ ) exclude a search term • ( “ ) use quotes around search phrases • ( . ) a single‐character wildcard • ( * ) any word • ( | ) boolean ‘OR’ • Parenthesis group queries (“master card” | mastercard)
Intitle
Allintext
Inurl
Site
Filetype
Link
Inanchor
Cache
Numrange
Daterange
Info
Related
Author
Group
Insubject
Msgid
Stocks
Define
Phonebook
Searches for specific words within the title or with Allintitle a series of words or phrase Ex. Intitle:”Index of” “backup files”
Returns results with “Index of” or “backup files” Ex. Allintitle:”Index of” “backup files”
Returns results with “Index of” and “backup files” Not as commonly used unless looking for something very specific
Locates a string(s) within a webpage(s) Ex allintitle:google security
restricts results to those containing all the query terms you specify in the URL
Searches only within the given domain Reads from right to left Ex. apple.com – reads com then apple to return results under apple.com Ex1. store.apple.com – reads com, apple, then store to return results under store.apple.com Commonly used with other operators*
Allows you to search for specific files based on type. ie. Doc, Xls, Pdf Ex. Filetype:doc doc
13 Main File Types According to filext.org there are 0ver 8,000 known file types on the net This is commonly used in conjunction with other operators, such as SITE to find files
Allows you to search for pages that link to other pages. Ex. link:defcon.org
You can also search for links to deep links Ex. link:www.blackhat.com/html/blackpages/blackpages.html
When improperly put together such as link:linux Google will treat it as a regular search, although the results may not look normal
Restrict the results to pages containing the query terms you specify in the anchor text or links to the page Inanchor:smashingthestack
Displays Google’s cached version of a web page, instead of the current version of the page cache:backhat.org
Cache can have some unpredictable results – You might be better off doing regular search and then accessing the cache from there As an alternative you can use archive.org and the Wayback Machine
Requires two parameters, a low and high number Ex. Numrange:12344‐12346 Ex1. Numrange:12344..12346
It has been suggested that this is one of the most dangerous searches. That could be used to harvest phone numbers, credit cards, etc. In fact in Google help doesn’t make mention of this directive – Be careful using this
These operators are for the google groups If used in a normal search the operator is dropped and a regular search is performed. *The Insubject operator sometimes works in a normal search, however the behavior is unpredictable.
Ext:rdp rdp Hits for Remote Desktop Protocol Clicking links opens RDP client
Site:[.domin root] intitle:index.of passwd –ftp Finds password files Bash_history also reveals good stuff
http://*:*@www domain Finds URLs that include username and password (user:
[email protected])
Intitle:”AXIS 240 Camera Server” intext:”server push” –help Open AXIS 240 Video Cameras
1. 2. 3. 4. 5.
site intitle:index.of error | warning login | logon username | userid | employee.id | “your username is”
6. 7. 8. 9. 10.
password | passcode | “your password is” admin | administrator ‐ext:html ‐ext:htm ‐ext:shtml ‐ext:asp ‐ext:php inurl:temp | inurl:tmp | inurl:backup | inurl:bak intranet | help.desk
Now that we have seen what can done Here are some of the more interesting results
These sites launch a VNC Java client so you can connect! Even if password protected, the client reveals the server name and port.
Print server administration, Google-style!
GooScan Wikto Goolag
Security Policy Blocking Crawlers – Robots.txt NOARCHIVE \ NOSIPPET Directory Listing FTP Log Files \Web Traffic Reports HTML \ Code Comments Hidden Form Fields, Javascript
This isn’t Google’s fault. Google is very happy to remove references. See http://www.google.com/remove.html
Follow the webmaster advice found at http://www.google.com/webmasters/faq.html
Determine what you want to out there and then write a security policy to match it. This can cover areas such as
Participation in forums, and user groups Out of office emails Open Employment opportunities Technologies in use
Provides a list of instructions for web crawlers User‐agent: * Disallow: /cgi‐bin/ Disallow: /images/ Disallow: /tmp/ Disallow: /private/ http://www.robotstxt.org/ http://www.mcanerin.com/EN/search‐engine/robots‐txt.asp
Prevents caching of webpage / site Done through meta tags <META NAME=“ROBOTS” CONTENT=“NOARCHIVE”> <META NAME=“GOOGLEBOT” CONTENT=“NOARCHIVE”>
Prevents the small amount of text listed below the title from being collected Done through meta tags <META NAME=“ROBOTS” CONTENT=“NOSNIPPET”>
One side effect of NOSNIPPET is the document will not be cached either
If servers are missing their default page type like: Index.htm, Index.html, default.asp, default.aspx
And directory browsing is enabled: Google will read the file system in and make it searchable intitle:index.of “.<extension of file type>"
Lock these files/directories down with NOARCHIVE/NOSNIPPET This will keep it out of the cache and the waybackmachine from archiving it So what is the big deal Log and traffic reports give the attacker insight in to what is going on with the site How much traffic When the traffic is at its peak What files are being accessed
Comments make code more legible and easier to understand The Problem Comments make code more legible and easier to understand
Just like HTML Comments:
Hidden form fields Javascript Includes Directory structure
Are all read by the bots for searching