GuerrillaBrowser Beginner's Guide, Part 1



GuerrillaBrowser isn't hard to use, but it is very different from your big, conventional web browser.  It doesn't look like it, work like it, or even have the same goal.

Your big web browser is designed to help the websites you visit put on a show.  When you click on a link in your regular browser (and sometimes even when you don't), a lot of things can happen.  What are they?  Are they all safe?  Who knows.  The whole thing is sort of a black box.

GuerrillaBrowser is the opposite of a black box. It's the totally transparent way to connect to web servers.  You get to see (and initiate) everything it does.  You tell it what resources (URLs) to get, and you can see exactly what appears on your hard drive in response.

GuerrillaBrowser is just one component of a larger toolkit, some of which comes with the GuerrillaBrowser package and some of which came with your computer.  You may also have other programs you found elsewhere.  Let's see how that works.

GuerrillaBrowser needs a place to store downloaded files (the GB Cache).  If you used the SETUP.EXE program, it created a GB Cache for you at C:\Program Files\GuerrillaBrowser\$Cache.txt, but let's make a new one so we can have a clean start for this exercise.

First we'll create a new C:\GB\ folder.  You could use Windows Explorer or some aftermarket file manager to do it.  If you've never tried the Command Prompt that comes with Windows, the process looks something like this ("md" means make a directory, "cd" means change to a directory, and "dir" means list a directory's contents):


C:\Program Files\GuerrillaBrowser>md \GB

C:\Program Files\GuerrillaBrowser>cd \GB

C:\GB>dir
 Volume in drive C has no label.
 Volume Serial Number is 386B-9C6D

 Directory of C:\GB

02/26/2008  09:07 PM    <DIR>          ..
02/26/2008  09:07 PM    <DIR>          .
               0 File(s)              0 bytes
               2 Dir(s)  44,066,013,184 bytes free

C:\GB>

Right-click anywhere on GuerrillaBrowser's client area (white background) to see the pop-up menu (or use the menu key on the keyboard), and choose the "Open Cache..." command:


GB menu "Open Cache..."


Use the "Open Cache" dialog to locate the C:\GB\ folder we just created.  There won't be any $Cache.txt file in it yet, but click the "Open" button anyway and GB will start a new Cache there.  Notice the "4001" pop-up message, and the grayed-out "Cache Path : C:\GB\" line in the Document View.


"Open Cache" dialog


"Open Cache" dialog : C:\GB\ empty


"4001" message box


We're going to start with Single-Index Mode ('B' button on the toolbar is up).  Single-Index Mode uses the Cache Index spinner and URL edit box on the toolbar, while Batch Mode gets its information from the $Batch.txt file.  Batch Mode allows you to operate on up to 100 Cache Indexes at one go.

The "Grab HTML" command will actually download whatever you've entered into the URL edit box on the toolbar.  It saves the server's response as $0.raw, but if it is a web page (HTML) the "Scrub HTML" command can scan it for possible links and it's those links we want to discover.  We need to choose a Cache Index in the spinner ("00" in this example), and paste a complete URL in the edit box ("http://www.ibiblio.org/wm/paint/auth/durer/" is the address of a web page).  Then we can choose the "Grab HTML" command (the pop-up message gives you a last chance to change your mind):


GB menu "Grab HTML"


"Index 00" message box


While GuerrillaBrowser is executing the "Grab HTML" command, it switches to Results View and shows the Index and URL you gave it in gray, changing to black when the download completes:


GB results black (done)


Because we had the "Auto Scrub" menu item checked, the "Scrub HTML" command was automatically performed on the $0.raw file.  We can see a brand new "00" subfolder in the Cache, and some work files in it.

Using the Command Prompt:


C:\GB>dir
 Volume in drive C has no label.
 Volume Serial Number is 386B-9C6D

 Directory of C:\GB

02/26/2008  10:04 PM    <DIR>          ..
02/26/2008  10:04 PM    <DIR>          .
02/26/2008  10:04 PM    <DIR>          00
               0 File(s)              0 bytes
               3 Dir(s)  44,043,231,232 bytes free

C:\GB>dir 00
 Volume in drive C has no label.
 Volume Serial Number is 386B-9C6D

 Directory of C:\GB\00

02/26/2008  10:04 PM            13,213 $0.raw
02/26/2008  10:04 PM    <DIR>          ..
02/26/2008  10:04 PM             2,767 $4.map
02/26/2008  10:04 PM             1,018 $5T.txt
02/26/2008  10:04 PM               754 $6P.txt
02/26/2008  10:04 PM               682 $8O.txt
02/26/2008  10:04 PM    <DIR>          .
               5 File(s)         18,434 bytes
               2 Dir(s)  44,043,231,232 bytes free

C:\GB>

Using Windows Explorer:


C:\GB\00\ contains workfiles


The $4.map file associates links with their thumbnail images (if any) and description (if any).  You can view it in your text editor, but it's normally accessed by choosing its Cache Index in the spinner, followed by the "Open Map" command.  (See the "Quick-Start Tutorial" for an example.)

The Scrub also sorts links by type into the remaining work files: thumbs in $5T.txt, pictures in $6P.txt, movies in $7M.txt, and everything else in $8O.txt (the "other" links list).  The "Grab Thumbs", "Grab Pics", "Grab Movies", and "Grab List" commands ('T', 'P', 'M' and 'L' buttons on the toolbar) are hooked directly to the $5T.txt, $6P.txt, $7M.txt and $9L.txt lists.  The Scrub found no movie links on this particular web page (so there's no $7M.txt file for the "Grab Movies" command to use), and the $9L.txt file is one you create yourself (using the "Save List" command to save selections you make in the Map file in GuerrillaBrowser's Document View).

For now, we're going to blow off downloading the thumbs, and just get all the pictures on the page.  We can see what their links are by looking at the $6P.txt list, using the Command Prompt:


C:\GB>type 00\$6P.txt
http://www.ibiblio.org/wm/paint/auth/durer/7sorrows.jpg
http://www.ibiblio.org/wm/paint/auth/durer/st-michel.jpg
http://www.ibiblio.org/wm/paint/auth/durer/paumgartner.jpg
http://www.ibiblio.org/wm/paint/auth/durer/hare.jpg
http://www.ibiblio.org/wm/paint/auth/durer/large-turf.jpg
http://www.ibiblio.org/wm/paint/auth/durer/magi.jpg
http://www.ibiblio.org/wm/paint/auth/durer/doctors.jpg
http://www.ibiblio.org/wm/paint/auth/durer/st-anne.jpg
http://www.ibiblio.org/wm/paint/auth/durer/self/self-26.jpg
http://www.ibiblio.org/wm/paint/auth/durer/adam-eve.jpg
http://www.ibiblio.org/wm/paint/auth/durer/adam-eve-1507.jpg
http://www.ibiblio.org/wm/paint/auth/durer/4holymen.jpg
http://www.ibiblio.org/wm/paint/auth/durer/portraits/father.jpg

C:\GB>

Using Notepad:


C:\GB\00\$6P.txt contents


The procedure for the "Grab Pics" command is similar to what we did for "Grab HTML".  We're still using the Cache Index spinner, but instead of the URL edit box we've got the $6P.txt list (which can contain dozens of links instead of just one, get it?).


GB menu "Grab Pics"


"Index 00" message box


GB results gray (in progress)


GuerrillaBrowser has no built-in network monitor, but you probably already have some other program that'll do the job.  The ZoneAlarm firewall has two: one on its main window and one in the system tray.  Windows' "Connection Status" dialog shows bytes sent and received (right-click the connection you're using in Control Panel's Network properties and choose "Status").


LAN status dialog


Windows' "Task Manager" (Ctrl+Alt+Del) network activity graph shows these files coming down from the server:


Task Manager networking


You can see that the downloaded pictures have been added to the folder for Cache Index 00, using the Command Prompt:


C:\GB>dir 00
 Volume in drive C has no label.
 Volume Serial Number is 386B-9C6D

 Directory of C:\GB\00

02/26/2008  10:04 PM            13,213 $0.raw
02/26/2008  10:04 PM             2,767 $4.map
02/26/2008  10:04 PM             1,018 $5T.txt
02/26/2008  10:04 PM               754 $6P.txt
02/26/2008  10:04 PM               682 $8O.txt
02/26/2008  10:30 PM           202,158 7sorrows.jpg
02/26/2008  10:30 PM           334,648 st-michel.jpg
02/26/2008  10:31 PM           155,064 paumgartner.jpg
02/26/2008  10:31 PM           147,133 hare.jpg
02/26/2008  10:31 PM           185,503 large-turf.jpg
02/26/2008  10:31 PM            66,986 magi.jpg
02/26/2008  10:31 PM           151,542 doctors.jpg
02/26/2008  10:31 PM           167,331 st-anne.jpg
02/26/2008  10:31 PM           144,540 self-26.jpg
02/26/2008  10:31 PM           253,795 adam-eve.jpg
02/26/2008  10:31 PM           116,940 adam-eve-1507.jpg
02/26/2008  10:31 PM           121,609 4holymen.jpg
02/26/2008  10:31 PM    <DIR>          ..
02/26/2008  10:31 PM    <DIR>          .
02/26/2008  10:31 PM           137,980 father.jpg
              18 File(s)      2,203,663 bytes
               2 Dir(s)  44,024,991,744 bytes free

C:\GB>

Using Windows Explorer:


C:\GB\00\ contains downloaded files


You can use your file manager to move the pictures somewhere else or delete them.  If you reuse Cache Index 00 with the "Grab HTML" command, they'll be automatically deleted, so be sure and change the spinner next time if that's not what you want.  You could also store them indefinitely where they're at.  If you do, the GuerrillaViewer companion program can use the same $6P.txt list to find them:


GV menu "Open List..."


"Open List" dialog select $6P.txt


Use the spacebar or little navigation buttons to walk through the list:


GV navigate the image list


GuerrillaBrowser keeps its current GB Cache in memory unless you open a different one or quit the program.  To write it out to the $Cache.txt file immediately, use the "Write Cache" command:


GB menu "Write Cache"


Notice that the Cache Index and URL you gave to the "Grab HTML" command is saved in the $Cache.txt file:


C:\GB>dir
 Volume in drive C has no label.
 Volume Serial Number is 386B-9C6D

 Directory of C:\GB

02/26/2008  10:31 PM    <DIR>          00
02/26/2008  10:52 PM               543 $Cache.txt
02/26/2008  10:52 PM    <DIR>          ..
02/26/2008  10:52 PM    <DIR>          .
               1 File(s)            543 bytes
               3 Dir(s)  44,019,077,120 bytes free

C:\GB>type $Cache.txt
00 http://www.ibiblio.org/wm/paint/auth/durer/
01
02
03
04
05
06
07
08


C:\GB\ contains $Cache.txt


If you create a number of GB Caches in one subtree of folders on your hard drive, you can use Windows Explorer's "Search" button to scan all of the $Cache.txt files for some link you used before, or all of the $8O.txt files for any available links to your favorite website, or whatever.  Storing your data in ASCII text files means you don't have to be limited to the GB suite programs when using it.

You don't have to download files sight unseen like we did in this exercise.  If you follow "Grab HTML" with the "Grab Thumbs" and "Open Map" commands, you can use GuerrillaBrowser's Document View to see all of the information it could find for each of the links on that web page, along with their URLs where you can plainly see them.  GuerrillaBrowser also has a "Thumb Scout" command that can find extra thumbs your big browser doesn't know about.

And notice that if we had 100 URLs to web pages like the one in this demonstration, Batch Mode would allow us to download more than a thousand pictures at one go.  Try getting your big web browser to do that!

No computer program can read your mind, it only knows what commands you give it.  You should pay particular attention to the state of the 'B' button, the Cache Index spinner (in Single-Index Mode), and the contents of the $Batch.txt file (in Batch Mode).  See the "Quick-Start Tutorial" and "User Guide" for additional examples, a command summary, troubleshooting tips, and more.