php-anthology II

From; https://developers.slashdot.org/story/04/08/04/1516258/the-php -anthology---volume-ii-applications

Intro	Chapter 1 Access Control	Chapter 2 XML	Chapter 3 Alt Content Types
Chapter 4 Stats and Tracking	Chapter 5 Chaching	Chapter 6 Dev Techniques	Chapter 7 Design Patterns


The PHP Anthology - Volume II, 'Applications' 100
Posted by timothy on Monday August 09, 2004 @03:37PM from the sounds-like
-frappe dept.
 
sympleko (Matthew Leingang) writes
"In Volume I of The PHP Anthology, Harry Fuecks showed some of the basic PHP
functionality to solve a few simple problems, including how to object-orient
your code, how to use PHP's hundreds of built-in functions, and how to use
well-developed existing classes, be they from PEAR or other sites. In Volume
II, he intends to 'blow your socks off by tackling some traditionally
complex problems with the same principles--to great effect.' It's summertime
and I'm sandals-only for the time being, so my socks remain safely in the
top drawer. But the volume is nonetheless exciting."
(Read on for the rest of Leingang's review, and check out last week's review
of Volume I.)

There are seven chapters in this volume, each dealing with real-world
problems. Many problems are those you've seen solved on sites you admire and
wondered "How did they do that?" Others are frameworks that allow your site
to run smoothly, with nobody getting accidentally logged out or having to
wait too long while your script gluttonously pulls the same data out of the
database for the Nth time. At the end, Fuecks goes back to the beginning, to
show how proper design and development can save you time when you start your
next project.



Chapter 1: Access Control
Authentication is the process by which users identify themselves. This is
difficult in HTTP, a stateless protocol in which the server handles one
request at a time and instantly forgets you. Luckily HTTP allows cookies,
which are bits of data the server sends to the client for to reveal upon
revisiting. At first cookies were used only to annoy ("Hello, Steve! You
have visited this page 3 times"), but a cookie can hold the ID of a session
record in a database, which contains any state-information that you like.

You can authenticate without sessions via HTTP server configuration, as long
as you like the dull dialog box the browser pops up when users enter a
restricted area. Oh, and you don't mind the fact that users won't be able to
"log out" without quitting their browser, nor can you force a logout after a
certain timeout value. Nor can you allow users to register themselves...
these are all existing, solved problems and the author shows some of the
best solutions. Common tasks like allowing users to change their passwords,
recover their passwords if (I mean when) they forget them, and arranging
users in groups to which you assign common permissions are also covered.

My favorite example from this chapter is the humans-only registration
application. Remember when online voting for the Major League Baseball All
-Star Game first started? Anyone who knew how to write a web client could
have automated a task to vote as many times as the server could handle, and
have his favorite players be the all-star team.* To bring it closer to home,
what if somebody decides to bog down your site by automatically registering
a huge number of times and filling up your database? You can keep these
things from happening by making users look at images which contain text but
are hard for computers to "read." PHP is in use at all stages of this game,
from writing the registration form's HTML to generating the obscured image
on-the-fly.



Chapter 2: XML
XML is a fact of life and, hype aside, is a great way to store and transmit
machine-readable data. One of the most visible applications is the thousands
of bloggers and news sites providing XML feeds of their headlines. You can
write portal sites that grab these headlines, parse them all and present
them on your site with links to the full text at the source.

There are two ways to parse XML: with events, or by using the Document
Object Model (DOM). The methodologies are similar to reading a plain-text
file line-by-line or all at once. Using events you can implement a finite
-state machine based on which tags and text come down the pike. Or you can
slurp the whole document into memory and find any part of it with ease. The
built-in library for the former is based on the popular Simple API for XML
(or SAX; don't you like those nested acronyms?), while the latter often uses
Xpath to find the particular document nodes you want.

The author shows how to parse RSS feeds with both SAX and the DOM, and how
to render a feed with DOM. Further, you can use Extensible Stylesheet
Language Transformations (known as XSLT) to transform XML -- whether it's to
XHTML for regular browser reading, WML (Wireless Markup Language) for
viewing on mobile phones, or even SQL to communicate with a database.

Another exciting XML application is in the area of web services, in which
agents (often but not necessarily web servers) communicate with each other
over an XML-based protocol built on top of PHP. The two most popular
protocols are XML-RPC (the RPC stands for "Remote Procedure Calls") and SOAP
(which used to stand for "Simple Object Access Protocol" but now is just a
name). Often-changing information such as stock prices and weather are often
offered through web services, but they can also be used as an object API
between agents over the network. What's cool about using SOAP is you can
publish to clients exactly what services you offer and how they can call
them using the Web Services Description Language (WSDL).



Chapter 3: Alternative Content Types
If you've ever printed out a web page that was designed for browser viewing,
you know the less-than-desired effect. The navigational elements, search
boxes, and banners, while necessary for the web page, are useless once a
static copy is printed. Furthermore, you need to extend your site to include
users with less-featured browsers, such as mobile phones.

Fortunately, PHP has been taught many languages. PDF is the standard for
print-quality documents, and there are several libraries (free and non-free)
which allow you to generate them. WML is the HTML of cell phone browsers, in
which screen space is at a premium and bandwidth scarce. SVG is an XML
application which allows vector-based images like PostScript does. The
coolest example, however, uses XUL (the XML User interface Language, not to
be confused with Zool) to make full GUI applications that you run through
Mozilla. This isn't useful for the outside world where you can't force your
users to use Mozilla (sigh), but works well for intranet applications that
run on a variety of platforms.

The author also brings up in this chapter an HTML SAX parser he has written.
You can process HTML pages chunk-by-chunk and extract the pieces you want. I
hadn't known about such a class until I read the book and I'm very excited I
know about it now. For sometimes it's necessary to parse a web page meant
for humans to read (perhaps to pretend to be a user and automate your all
-star voting), and most HTML pages won't validate as HTML, let alone XML.

A good point here is that a well-designed, tiered application will allow you
to swap out different presentation classes with little code rewrite.
Separating the tasks of extracting the data from the database and presenting
to the user in variety of formats is a common task that when done right
becomes subsequently easier.



Chapter 4: Stats and Tracking
Once your site is up and running, you'll be interested to know which parts
are the most active, and how much traffic you're getting. Into a dynamic
page you can obviously insert any logging mechanism, but a great place to
put it is inside your site's logo. PHP can send binary data as easily as
text. Why would you want to do this?

        The logo is usually on every page (or it should be). You don't have
to cut-and-paste code.
        You can serve the image, then use the flush command to send the
output on and do extra processing. This way logging doesn't get in the way
of page rendering.

There are lots of packages available to collect and analyze data. The author
goes through phpOpenTracker which is quite rich in features. There are also
ways to collect data on what links users follow to leave your site, and to
keep requests from search engines from cluttering your log files.



Chapter 5: Caching
Another possible knock against PHP is that, while it's good to have dynamic
pages, some pages are unnecessarily so. This is a waste of server resources
to keep rendering the same page anew. There are different ways to conserve.

On the client side, you can use HTTP 1.1 headers like Cache-control and
Expires to tell browsers when it's okay to store cached copies locally

On the server side, as can be expected, you have a greater level of control.
You can use output buffering to delay sending of output to the browser, then
save a copy of the output locally. On subsequent requests, you can serve the
file rather than generate the HTML all over again. This can be implemented
on a chunk (or block) level, so that you can keep some parts ultra-time
sensitive and others not so much. The package PEAR::Cache_Lite can help with
this.



Chapter 6: Development Technique
The last two chapters were my favorites of the two-volume set. They are on a
higher level of abstraction than the features of PHP's library of functions,
or previous five chapters on real-world solutions. After you've reached a
certain level of expertise in PHP coding, you being to wonder about the
"right" way to do things. The author shows how to use Xdebug to find
bottlenecks in your code, as well as a few quick optimization tips (for
instance, design your flow control so that the first choice is the one most
often taken).

He then discusses the principles of N-tiered design. N is usually 5, but the
data layer (usually a database or file system) and presentation layer
(usually the browser) are most often handled outside of PHP, so you normally
have three levels to worry about:

        Data Access: Getting data from the outside world into your
application
        Application Logic: Doing whatever unique thing your application is
supposed to do
        Presentation Logic: Forming a response in a format acceptable to
your client

Keeping these layers separate and restricting them to communicating through
well-defined interfaces allows you maximum flexibility. If you need to
change databases (say you just got venture capital money and can afford
Oracle now), you can do so only changing one layer. If you want to serve
different flavors of HTML, or different markup languages altogether, or
binary data, you can do so by only changing one layer. You can even strive
for maximum distributability by enabling your layers to "live" on physically
independent machines and communicate with XML-RPC or SOAP.

Documenting your code is essential. Anybody who's been programming for over
a year has gone back to code he or she's written and thought, "Now what the
heck was this supposed to do?" It's even more essential when you write
something and wish to distribute it for the benefit of others. You can
expect them to grok your code at an even lower rate since they didn't write
it the first time.

Luckily, scripted languages like PHP are excellent at parsing text files,
including PHP scripts themselves. Using well-defined documentation formats
akin to JavaDoc, you can embed documentation in your code inside comments,
and use tools like phpDocumentor to extract these documentation blocks and
format them as nice, cross-reference HTML. In fact, writing doc blocks
before your code is a good way to think ahead about how you want your
classes and methods to work.

Unit Testing, one of the most digestible dogmas of Extreme Programming, is
an awesome way to test your code for logic errors. You build up tiny test
cases (using mock objects to isolate the class you're testing) and build as
many as you like. Once you do this (PHPUnit and SimpleTest are two rich
frameworks), you keep your tests and each time you add features, you run
your test to make sure you haven't added bugs as well.



Chapter 7: Design Patterns
Design Patterns is one of the modern classics in information technology.
After having done OOP for a while, you will inevitably get the feeling of
deja vu that you've solved a problem before. Not so concretely as "I need a
database abstraction layer," or "I need a templating system," but as in "I
need a way to create objects without specifying exactly what class they
belong to," or "I'm tired of writing so many if statements." Design patterns
are common object architectures which can be used to solve common (though
unique) problems.

Many design patterns are more suited to state-equipped applications with
GUIs, but there are plenty to assist the PHP coder. The Factory Method is a
pattern through which an object can create other objects of varying classes.
So instead of writing mysql_connect everywhere, then having to change every
occurrence of that function, you can abstract all database interaction to a
class, then instantiate a database connection through a class method of
another class: $db = MyApp::getDatabaseConnection(). This is useful when the
connection (not just the RDBMS, but the actual database) you want varies
depending on whether you are developing, testing, or going live with your
application. Factory methods are also a good way to avoid global
configuration variables.

The Iterator Pattern and the Observer Pattern are two others mentioned in
this chapter. Iterators are used often in paging through database results.
Observers are used to let objects notify other objects of changes in their
state. This chapter will make you want to go read the whole Gang-of-Four
book if you haven't already.

My biggest beef with the book is that this wasn't presented earlier on,
perhaps at the beginning of Volume II. As a climax, it leaves me flat,
wondering how the rest of the volume could have been derived from this very
cool concept. But most PHP books conclude with chapters on how to extended
PHP on the C level, or giant case studies involving massive code dumps, and
I'm often not satisfied with them. This is a nice philosophical note to go
out on. And there's something to be said for the argument that books like
these aren't written to be read cover-to-cover.

Appendices
The book closes with the same indices as in Volume I. Since I don't know the
URL of my review of that volume, I'll just copy: You can read about which
configuration directives you're probably most interested in (the complete
list you can get on PHP's web site), some common security breaches, and how
to install PEAR, PHP's version of CPAN. My favorite appendix is the "Hosting
Provider Checklist," a great reference for evaluating whether
kewlhosting.com is going to give you the freedom and support you need to
make a great hosted web site.

This volume was informative, well-written, and inspirational in that it made
me want to go out and add cool and useful features to my web sites. Check it
out if you can.

*Not really (not that I tried or anything), but they've always been a little
bit smarter about it. You get my point, though. This did happen on an
ESPN.com Page 2 mascot popularity contest, but they noticed through request
headers that millions of votes were coming from the same place, and
invalidated all those votes.

In real life, Matthew Leingang is Preceptor in Mathematics at Harvard
University. He promises to review any book sent to him for free, and
sometimes actually does it. Both volumes of The PHP Anthology are available
from SitePoint. Slashdot welcomes readers' book reviews; to see your own
review here, carefully read the book review guidelines, then visit the
submission page.