Mepso Media Downloads Book Features Chapter List Videos Interviews Contact

Developing Bots with Selenium Python

for

Web Scraping, Test Engineering,
Data Mining, and Online Automation

Available at:
Amazon: Paperback
 
 
There isn't a website that you can't scrape, control, or automate with Selenium Python. These skills can be used to conduct Competitive Intelligence, Test Engineering, Web Scraping, Data Mining, or Market Analysis. Section one of this book provides background and configuration help. The Second Section details ten Bot projects that explore specific aspects of Selenium Python. Each project uses web pages developed specifically for this book. Additionally, each project has an associated Video on YouTube that provides a Bot Demo and Code Walk-through. Section Three contains Ten Chapters of theory that explore everything from Legal Obligations to Bot Architectures, Handling Big Data, How Webdriver works, and Fault Tolerance. All scripts are provided either online or in the book.
 
Book Features
  • Context for what your doing,
  • Anecdotes from a botDev veteran,
  • Assistance with configuring your development environment,
  • Ten detailed bot projects,
  • Custom websites, where you can freely practice your bot development skills,
  • Each project has a YouTube video with a bot demo and code walk-through, and
  • A whole section of Bot Theory, ranging from legal implications to Selenium locators.
  •  
    Chapter List
    SECTION I   Introductions
    Chapter 01 Why Write a Bot?
    Why write a bot? Good question...! Organizations of all types--ranging from Governments, Start-ups, to International News Agencies, have asked me to write bots that: Gather Market Information, Automate Process, Manage Inventory, Analyze Data, and Buy Inventory. Bots have physically taken me from Moscow to Silicon Valley. In other words, professional bot development can be a fine career choice.
    Chapter 02 Book Features
    In addition to the technical details of botDev, this book also provides context and theory behind the technologies. Included in the book are ten bot projects that range from solving CAPTCHAs to playing games. Each of these bot projects use web pages designed specifically for you to practice on. So you don't have to worry about target webpages changing or violating the site's terms of use. And, each project is accompanied by a YouTube Bot Demo and Code Walk-through, making the book a multimedia learning experience.
    Chapter 03 Configuration
    In our first project, we develop a minimal Selenium Python program, primarily for the purposes of validating our development environment.
    SECTION II   Projects
    Chapter 04 Project 01 Hello World...!
    In our first project, we develop a minimal Selenium Python program, primarily for the purposes of validating our development environment. Link to Bot #01 Demo and Code Walk-through on YouTube
    Chapter 05 Project 02 The Framework
    To make our work a little easier, going forward, this chapter explores the use of a "Framework", or a systematic way to ensure that our bots are correctly initialized, have access to bot-specific libraries and logging functions, and exit gracefully. Each bot going forward will use this framework. Link to Bot #02 Demo and Code Walk-through on YouTube
    Chapter 06 Project 03 Active Content
    One of the main attractions to Selenium is the way it manages active content, or web content that was downloaded from the webserver after the initial page flow is Complete. Link to Bot #03 Demo and Code Walk-through on YouTube
    Chapter 07 Project 04 Procurement
    This project puts good use of a private "test" online store to train a bot to purchase two types of plastic cups from an Industrial supply house. Bots like this could be tied into inventory and retail systemds to automate a large amout of retail management, Link to Bot #04 Demo and Code Walk-through on YouTube
    Chapter 08 Project 05 Action Chains
    The Selenium developers created a very clever way to script intricate mouse functions, like: Drag and Drop, Hover, Right Click, and more in a package called Action Chains. This project deploys Action Chains to control a variety of jQuery widgets. Link to Bot #05 Demo and Code Walk-through on YouTube
    Chapter 09 Project 06 Parsing and Aggregation
    This project simulates an experience that is very common to bot developers--that of, collecting data from disparate sources and parse and combine all the data in a common CSV file for later transport, Link to Bot #06 Demo and Code Walk-through on YouTube
    Chapter 10 Project 07 Regression Test
    The most common use for Selenium is for Test Engineering. This project explores the methodology and usage for performing Regression Tests on a multi-page, test, corporate website. Link to Bot #07 Demo and Code Walk-through on YouTube
    Chapter 11 Project 08 Solving CAPTCHAs
    In Project 08, the reader learns the legitimate reasons for writing bots that need to solve CAPTCHAs. We'll also learn the techniques that are most often used for solving these challenges. Link to Bot #08 Demo and Code Walk-through on YouTube
    Chapter 12 Project 09 Bots that play games
    This bot autonomously plays Tic-Tak-Toe against an online game. In addition to game play, this project employs very explicit logging, which will be used extensively in the next project. Link to Bot #09 Demo and Code Walk-through on YouTube
    Chapter 13 Project 10 Headless Mode
    In our final project, we'll leverage all the work we did in the last project (and the excellent logging) to demonstrate how easy it is to run the game playing bot without a browser. We'll also unearth the advantages, and disadvantages, for running Selenium without a browser. Link to Bot #10 Demo and Code Walk-through on YouTube
    SECTION II   Theory
    Chapter 14 Staying out of jail
    With freedom comes responsibility. And it's important to acknowledge that the same technologies that organizations invest in to perform business functions, are also the same technologies and techniques that hackers use to cause harm. While this chapter contains NO legal advice, it does contain what the Author has learned and been exposed to in over 25-years of bot development.
    Chapter 15 DOM and JavaScript
    An understanding of Selenium requires an understanding of DOM, or the Document Object Model. This chapter explores DOM, its relationship to JavaScript, and how it's used by Selenium.
    Chapter 16 Locators
    Selenium uses a variety of ways to locate web elements on a web page, This chapter explores them all, with special focus on XPATH.
    Chapter 17 Webdriver
    At the core of Selenium is Webdriver. We'll look at how Webdriver works and how it can emulate virtually any modern browser and revision. This is particularly useful for test purposes.
    Chapter 18 Parsing Text and the semantic web
    This chapter describes what the Author learned about parsing over 25 years of writing bots. Special attention is placed on the libParse.py library.
    Chapter 20 Fault Tolerance
    A happy bot is one that doesn't make mistakes. Read this chapter to avoid making beginner mistakes.
    Chapter 21 Machines as humans
    There are very legitimate reasons for making bots that are as stealthy and nondescript as possible. This chapter discloses those reasons and ways to accomplish those goals.
    Chapter 20 Big data big headache
    The public keeps hearing about the amazing things that can be accomplished with big data. What we don't hear is what a pain it is to move large datasets. This chapter defines that pain and describes technologies to ease the agony.
    Chapter 23 Chrome Inspect
    One of the reasons this book focuses on the Chromedriver flavor of Webdriver, is because Chrome Inspect is a very useful bot development tool. This chapter shows the benefits of using Chrome Inspect in botDev.
    Downloads (rev 1.0)
    This is where you can download an official set of Python libraries used in this book.

    If you'd like to stay informed of updates on a low-traffic list, please leave your email address below.
    NOTE: Downloads are provided in a single, zipped, collection of *.py files.

    Email address: (optional)
    Author Interviews
    Michael Schrenk is a software developer, author and instructor. He specializes in Competitive Intelligence and in developing bots to perform intelligence gathering and automation. His first book, "Webbots, Spiders, & Screen Scrapers" (2007, No Starch Press, San Francisco) was called by Dr. Dobbs Journal, "The definitive work on the subject." Michael uses the Internet in new and innovative (odd?) ways to provide competitive advantages for his clients in The US, Europe and Asia. He also helps journalists more effectively use computers to conduct online research through automation and by describing where and how to find otherwise hidden online information. Mike is also a regular at DEF CON--the world's largest gathering of hackers, where he has presented nine times (including DEF CON China). No stranger to Europe--he's spent much time in both Moscow Russia and Madrid Spain, both as a consultant and as a keynote speaker at security and software development conferences. He has also taught at The Centre for Investigative Journalism at City College, London, and for the VVOJ in Belgium and The Netherlands. He's been featured in The Christian Science Monitor, California Public Radio, Radiotelevisione italiana, BBC World Service, and many others. Mike has also written for: Computer World Magazine, php-Architect, and Linux Pro Magazine. You can wåtch his Profile Video here!

    Contact
    We're based in the beautiful American Southwest, but operate globally.
    Please write to Mepso Media at the address below. Or, comment on any of the videos we post.
    Mepso Media LLC
    4952 S Rainbow Boulevard #702
    Las Vegas, NV 89118
        • Interview opportunities,
    • Permission requests & Review Copies,
    • Speaker/Consulting/Training inquiries