up
Perl Lecture Notes - May 18, 2002
- Old business - comments/discussion on first homework assignment
- XML - take 1 - getting started
- Regexps
- For really simple things you can customize your own xml parser or transformer.
- Example 1
analyzing lines of the form <tag key="value">content<tag> file.
- Example 2
transforming simple HTML into (badly) formatted source
- But it's almost always easier not to try to re-invent the wheel.
So we turn to ...
- CPAN
- There are many, many CPAN modules for dealing with XML.
- Here's a dynamic search for them all:
Find all XML CPAN modules
(117 different XML distribution packages as of today.)
- Some noteworthy ones are
- XML::Parser - many of the others are built on top of this.
An interface to the expat C library for xml processing.
Powerful but complicated.
- conversion
- DBIx::XML_RDB - convert DBI SQL database into XML form
- XML::CSV - convert CSV (comma-seperated-value) files to XML
- XML::Clean - ensure that your HTML is XML compliant
- standard XML API's
- XML::DOM - build a DOM Level 1 compliant document
- XML::SAX - SAX stream event handling. (Several variations.)
- XML::SAX::Machines - pipelines of SAX fileters
- XML::XPath - a set of modules for parsing and evaluating XPath (i.e. node-search) statements
- perl-ish data structues as XML
- XML::Simple - perl object representation of XML tree
- XML::Twig - processing huge XML documents in tree mode.
"Combines an inventive Perlish interface with many of the features
found in the standard XML API's".
- XML::Grove - another tree-based object model for XML in perl
- Data::DumpXML - save and restore perl data structures as XML
- web utilities
- Apache::AxKit - transform XML to HTML on request based on XSLT or other rules
- SOAP::Lite - remote procedure calls
- Where to get an overview of all this?
Kip Hampton's Perl/XML column
- http://www.xml.com/pub/q/perlxml
(I particularly recommend the Quickstart, in three parts.)
- Here are some simple examples, straight out of Kip's columns.
To really see what's going on, you'll need to read
the documentation and/or tutorials for the various packages.
- Example 1: Converting from CSV to XML (in kip-hampton/quickstart_three/)
- Example 2: Parsing an XML file with XML::Simple
- in regexp/ directory
- input A -simple.xml
- source A -simple-simple.pl
- in kip-hampton/quickstart_one_two/
- input B - camelids.xml
- source B - xml-simple_read.pl
- Note that XML::Simple does not preserve the order of the XML tags,
since it puts it into an html hash. It also doesn't distinguish
well between attributes key="value" and contents of the tag.
- Example 3: Parsing camelids.xml with XML::Twig
- Example 4: Darwin (if time allows)
- Resources / Documentation
- Assignmnent
- Browse some of Kip Hampton's online articles. If nothing particularly
jumps out at you, I recommend the "Perl XML Quickstart", which is in 3 parts.
- Read the POD documentation or a tutorial
for at least one of the CPAN modules that converts XML to and from
a perl data structure.
If nothing catches your eye, I suggest either XML::Simple or XML::Twig.
- Adapt one of the examples from Kip's articles or the tutorials for
use in one of your cgi scripts from last semester,
either as the persistent database (as we did before with disk files
and SQL databases) or as a set of configuration options. You can either
install the modules on your own box, or use them on bob where they should
be already installed.
Are we having fun yet?