=head1 NAME
ThisProgram - does something good
=head1 SYNOPSIS
use ThisProgram qw( doThis doThat );
my $a = doThis("some string");
doThat("other string");
=head1 DESCRIPTION
Here's where you describe in more detail what it does.
=head1 AUTHOR
your name here
@weather = ( "rain" , "sun" , "clouds" ); $w = \@weather;
$a = "hello"; # a string $b = \$a; # a reference to the string print ref($a), "\n; # try it and see. print ref($b), "\n"; # ditto.
$numref = \1.2; $hashref = \%hash; $subref = \&someFunction; $refref = \$subref;
@a = qw( one two three );
$aref = \@a;
print @$aref; # the array
print @{$aref}; # also the array
print @{"a"}; # even this is the array! (a "soft" reference. "use strict" will give warnings if you do this.)
print $$a[0] # first element of a
print ${$a}[0] # a clearer way to write that.
print $a->[0] # using the arrow dereference. Note the [] array markers.
$a = [ "zero", "one" , "two" ]; print $a->[0]; # prints the word "zero".
$h = { jim => "mahoney@marlboro.edu",
brandt => "kurowski@marlboro.edu",
santa => "fat_guy@northpole.way.north.somewhere" };
print $h->{santa};
foreach $key ( sort keys %$h ) { print $k, "\n"; }
$students = {
bob => { fullname=>"Bob Smith" ,
pets=> ["cat", "dog", "mouse"] ,
},
jane => { fullname=>"Jane Doe",
pets=> ["cat, "python"] ,
}
};
which is a hash reference whose elements are hash
references whose elements are strings and array references.
$students->{jane}->{pets}->[0];
but this kind of construction is so common that the ->
between brackets is implied. Thus this can be written as
just
$students->{jane}{pets}[0];
my @matrix;
for my $row ( 1..5 );
for my $column ( 1..10 );
$matrix[$row][$column] = function($row, $column);
}
}
$my_join = sub { return join " | ", @_ }
@array = ( "Monday", "Tuesday" );
print &$my_join(@array); # Call the subroutine.
print $my_join->(@array); # Or call it this way.
sub give_five { return 5};
$a = \&give_five; # Note use of "&", which is requred here.
print &$a(); # one way to call it.
print &{$a}(); # Or this.
print $a->(); # Or this.
$b = make_array();
subroutine make_array { # This would be bogus in some languages -
my @a = ( "one", "two" ); # you'd end up returning a pointer to
$a_ptr = \@a; # something that went away when make_array did.
return $a_ptr;
}
$a = \$b;
$b = \$a; # In perl, this makes a loop that will never be garbage collected.
# Various modules help deal with this; read the camel for details.
# The inner { } look like a code block. @_ is returned.
sub hash_em { { @_ } }
# Here the elements of @_ are put into a hash ref.
sub hash_em { return { @_ } }
# the + implies an element follows, so this is like 2nd.
sub hash_em { +{@_} }
array hash subroutine
------------- ----------------- ---------------------
$a = \@array; $b =\%hash; $c = \&subroutine; # Making refs
$a = [1,2,3]; $b = {jim=>"Jim"}; $c = sub { }; # Anonymous refs
@$a; %$b; &$c; # Entire object
$a->[0]; $b->{jim}; $c->(@args); # Dereferencing
$data = [ [ name, passwd, userid, fullname, group, ... ],
[ name, passwd, userid, fullname, group, ... ],
];
$data = [ { user=>$username, passwd=>$passwd, id=>$id, ... },
{ user=>$username, passwd=>$passwd, id=>$id, ... },
];
I suggest you use the LWP::Simple "get" routine to fetch the web pages; that's the tool that makes this a manageable task.
Write a perl program that searches the web for a given string, starting at a given URL, and repeating the search recursively on the web pages mentioned on that page, down to a given depth. Have your program save all url's of the files it visits and the matches it finds in a data structure of some kind, so that it can print a summary at the end of what it's done. (That makes this another data structure assignment, as well as a chance to have some fun with an internet-oriented program.)
This assignment has a bit more meat to it than the short exercises in the book. If it feels like too much, latch up with someone else in the class and work in a group, or play with as much of it as you can.
The heart of the program will be a subroutine which will fetch a single web page, analyze it, and then call itself on each link it finds, if the search depth hasn't been reached.
Do include some POD documentation describing what your program does and how to run it.
The details I leave up to you. You should test this on a small set of html files of your own devising before running it out in the real world. Don't run it with the depth turned up too high. There are, in fact, politeness standards that real web robots like this should follow, though they're a bit outside the scope of this assignment. Some of the classes in the LWP CPAN library are designed to make robots like this, however, mastering that package will probably be more work than just doing one on your own as I've described here.
Running the program might look something like this:
shell> ./stfw
OK, let's search the web.
Starting URL? http://www.marlboro.edu/~mahoney/courses/Physics.html
String to search for? relativity
Depth? 2
working...
Done. I found 32 places where "relativity" was mentioned.
Here's where I looked:
depth 0: http://www.marlboro.edu/~mahoney/courses/Physics.html
search word found on lines: .....
.....
10 html links found: a, b, c, d, e, f, ...
----- looking deeper at those links ---------------
depth 1: a
search word found on lines ...
2 html links found: A, B
---- looking deeper at those links
depth 2: A
search word not found.
depth 2: B
search word not found.
depth 1: b
...
Feel free to post questions about this to me
via email or on the wiki.