search_google¶
A command line tool and module for Google API web and image search.
Tested for Python 2.6 and 3.5 using Anaconda 4.3.1.
search_google -h
search_google cats
search_google cats --searchType=image
Setup¶
- A CSE ID and a Google API developer key are required to use this package
- A Gmail account will also be required to create and access the ID and developer key
- When asked to sign in, use your Gmail account for access
Note: Instructions and links were written on May 20, 2017, and are subject to change depending on Google’s website and API.
Google Custom Search Engine¶
A Google Custom Search Engine (CSE) and a CSE ID can be setup with the following instructions:
- Go to the CSE Control Panel
- Click Add
- Enter a website in the box under Sites to search such as “www.google.com”
- Click Create
- Go back to the CSE Control Panel
- Select your created search engine
- Turn on Image search
- For Sites to search, select Search the entire web but emphasize included sites
- Under Sites to search, click checkbox next to Site
- Under Sites to search, click Delete
- Under Details, click Search engine ID
- Set
cx
by replacing “your_cse_id” with the Search engine ID
search_google -s cx="your_cse_id"
Google API¶
An API developer key for the Google Application Programming Interface (API) can be setup with the following instructions:
- Enable Google Custom Search Engine
- Go to Google API Console Credentials
- Click Create Credentials -> API Key
- Set
build_developerKey
by replacing “your_dev_key” with the API Key
search_google -s build_developerKey="your_dev_key"
Usage¶
For help in the console, use:
search_google -h
Please ensure that the Setup section was completed:
search_google -s cx="your_cse_id"
search_google -s build_developerKey="your_dev_key"
Web and Image Search¶
Perform a web search:
search_google cat
search_google "cat with hat"
Perform an image search:
search_google cat --searchType=image
search_google "cat with hat" --searchType=image
Search for 20
results:
search_google cat --n=20
search_google cat --searchType=image --n=20
Preview all 20 results:
search_google cat --n=20 --option_preview=20
search_google cat --searchType=image --n=20 --option_preview=20
Links and Metadata¶
Save metadata:
search_google cat --save_metadata=cat.json
search_google cat --searchType=image --save_metadata=cat_images.json
Save URL links:
search_google cat --save_links=cat.txt
search_google cat --searchType=image --save_links=cat_images.txt
Save links and metadata:
search_google cat --save_links=cat.txt --save_metadata=cat.json
search_google cat --searchType=image --save_links=cat_images.txt --save_metadata=cat_images.json
Default Arguments¶
Default arguments persist even after the console is closed. Defaults enable user customization of the search_google command without a long list of arguments every call.
View the defaults:
search_google -v
Increase number of search results previewed to 20
:
search_google -s option_preview=20
Turn off preview of search results:
search_google -s option_silent=True
Set the searchType
argument to default to image
search:
search_google -s searchType=image
Set the fileType
argument to default to jpg
images:
search_google -s fileType=jpg
Set to save a text file named links.txt
with search result links whenever used:
search_google -s save_links=links.txt
Remove default arguments:
search_result -r option_preview
search_google -r option_silent
search_google -r searchType
search_google -r fileType
search_google -r save_links
Reset the defaults:
search_google -d
After resetting defaults, the developer and CSE ID keys will have to be set again:
search_google -s cx="your_cse_id"
search_google -s build_developerKey="your_dev_key"
Additional Arguments¶
A number of optional arguments defined using --
are not shown when using search_google -h
. These can be used with the same names as the arguments passed to Google’s CSE method:
search_google -a
For example, the index of the first result to return can be set by argument start
which is a named argument in Google’s CSE method:
search_google cat --start=2
search_google cat --lr=lang_en
search_google cat --searchType=image --imgType=photo
search_google cat --searchType=image --imgDominantColor=brown
Module Import¶
The search_google package may also be used as a Python module:
import search_google.api
# Define buildargs for cse api
buildargs = {
'serviceName': 'customsearch',
'version': 'v1',
'developerKey': 'your_api_key'
}
# Define cseargs for search
cseargs = {
'q': 'keyword query',
'cx': 'your_cse_id',
'num': 3
}
# Create a results object
results = search_google.api.results(buildargs, cseargs)
For more details on module usage, see the example in api.
Modules¶
api¶
-
class
api.
results
(buildargs={'serviceName': 'customsearch', 'version': 'v1'}, cseargs={'num': 3, 'fileType': 'png'})[source]¶ Google Custom Search Engine (CSE) API results.
Uses the Google Custom Search Engine API to search webpages and images using queries.
- Args:
- buildargs (dict):
- Named arguments for googleapiclient.build.
- cseargs (dict):
- Named arguments for cse.list.
- Attributes:
- metadata (dict):
- object returned from cse.list.
- buildargs (dict):
- Same as argument
buildargs
for reference of inputs. - csedargs (dict):
- Same as argument
cseargs
for reference of inputs.
- Examples:
# Import the api module for the results class import search_google.api # Define buildargs for api api buildargs = { "serviceName": "customsearch", "version": "v1", "developerKey": "your_api_key" } # Define cseargs for search cseargs = { "q": "keyword query", "cx": "your_cse_id", "num": 3 } # Create a results object results = search_google.api.results(buildargs, cseargs) # Preview the search results results.preview() # Obtain the url links from the search # Links are inside results['items'] list links = results.get_values('items', 'link') # Obtain the url links from the search links = results.links # Save the search result metadata to a json file results.save_metadata('metadata.json') # Save the search result links to a text file results.save_links('links.txt') # Download the search results to a directory results.download_links('downloads')
-
download_links
(dir_path)[source]¶ Download web pages or images from search result links.
- Args:
- dir_path (str):
- Path of directory to save downloads of
api.results
.links
-
get_values
(k, v)[source]¶ Get a list of values from the key value metadata attribute.
- Args:
- k (str):
- Key in
api.results
.metadata - v (str):
- Values from each item in the key of
api.results
.metadata
- Returns:
- A list containing all the
v
values in thek
key for theapi.results
.metadata attribute.
-
links
¶ list of str: Web links to search results using
api.results.get_values()
.
-
preview
(n=10, k='items', kheader='displayLink', klink='link', kdescription='snippet')[source]¶ Print a preview of the search results.
- Args:
- n (int):
- Maximum number of search results to preview
- k (str):
- Key in
api.results
.metadata to preview - kheader (str):
- Key in
api.results
.metadata[k
] to use as the header - klink (str):
- Key in
api.results
.metadata[k
] to use as the link if image search - kdescription (str):
- Key in
api.results
.metadata[k
] to use as the description
-
save_links
(file_path)[source]¶ Saves a text file of the search result links.
Saves a text file of the search result links, where each link is saved in a new line. An example is provided below:
http://www.google.ca http://www.gmail.com
- Args:
- file_path (str):
- Path to the text file to save links to.
-
save_metadata
(file_path)[source]¶ Saves a json file of the search result metadata.
Saves a json file of the search result metadata from
api.results
.metadata.- Args:
- file_path (str):
- Path to the json file to save metadata to.
cli¶
-
cli.
run
(argv=['C:\\Tools\\Anaconda3\\Scripts\\sphinx-build-script.py', '-b', 'html', 'docs/source', 'docs'])[source]¶ Runs the search_google command line tool.
This function runs the search_google command line tool in a terminal. It was intended for use inside a py file (.py) to be executed using python.
- Notes:
[q]
reflects keyq
in thecseargs
parameter forapi.results
- Optional arguments with
build_
are keys in thebuildargs
parameter forapi.results
For distribution, this function must be defined in the following files:
# In 'search_google/search_google/__main__.py' from .cli import run run() # In 'search_google/search_google.py' from search_google.cli import run if __name__ == '__main__': run() # In 'search_google/__init__.py' __entry_points__ = {'console_scripts': ['search_google=search_google.cli:run']}
Examples:
# Import google_streetview for the cli module import search_google.cli # Create command line arguments argv = [ 'cli.py', 'google', '--searchType=image', '--build_developerKey=your_dev_key', '--cx=your_cx_id' '--num=1' ] # Run command line search_google.cli.run(argv)