[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Images from database
WOResourceManager and, in general, WebObjects 4.0 certainly provide a tool
set by which it is possible to achieve greatly accelerated performance
when handling images sourced from the database or from the filesystem
(under the control of the application).
However, WO4/ResourceManager cannot compare to the performance possible
or the optimization flexibility available when using a traditional
WebServer.
Consider the difference between serving a static image vs. serving an
image through WebObjects (regardless of whether or not the URL to the
image changes). Clearly, if the URL isn't changing, the client side
caching oppurtunity exists, but is diminished in a surprising number of
ways (see optimization section below).
In a static environment, the webserver:
1. listens on port 80 for requests
2. receives a request containing a path
3. verifies that the path doesn't reference something magic (cgi-bin or
/WebObjects/)
4. maps the resouce in the filesystem into memory
5. composes a reponse to the client
6. sends the response on its merry way
In a dynamic environment, that same request:
1-3 are the same, but 3 detects something magical
4. webserver dispatches to WebObjects adaptor (cgi or direct adaptor)
5. WO receives request and decodes
6. WO connects request to appropriate session
7. session handles request, etc...
8. WO generates response
9. response sent back to adaptor
10. adaptor hands response to webserver
11. webserver hands response back to client.
In terms of Performance:
The static environment is extremely effecient. If a particular static
resource is frequently referenced, it is likely that it will already be in
memory when the webserver decides to map it from the filesystem into
memory-- under OS X Server, the system will reserve up to 256MB of RAM for
just this purpose when the system is configured as a server-- and, as
such, the response generation won't involve a disk hit at all.
By moving to a system where the server must dispatch the URL through an
adaptor to an application server, you immediately have the performance hit
of having to do intra-process communication. Worse; the communication
involves passing both the request AND the response between two address
spaces and will, of course, involve at least three context switches (and
that is assuming that the task scheduler on the system is running at 100%
effeciency given our rather simplistic model of request/response
handling-- not very likely!).
Optimization proves to be a much more interesting problem:
In the environment with images served as static webserver resources (i.e.
files from a filesytem), there are numerous oppurtunities for
optimization. If the load on the initial webserver is too great, we can
easily drop in an el-cheapo Linux/BSD based system to serve all static
resources and completely move that particular load off of our application
servers or primary web servers. If the load on our image server is too
great, it is trivial (and cheap; again-- we can get away with a free OS
running on cheap hardware and achieve really great performance)-- just
split the server across two boxes.
Likewise, we can also split the webservers away from the application
servers. This isn't really a static resource optimization, but by
minimizing the amount of binary traffic between our web servers and our
application servers, we can minimize the cost of the networking
infrastructure on our server farm. On a typical site, it isn't uncommon
to encounter something like a 3:1 to 5:1 (or higher for some of the
"spiffier" sites) byte ratio for graphics:HTML content.
What isn't so obvious, is the number of performance oppurtunities that are
lost by serving the static content via the application servers. Some are
surprising:
Frequenstly, proxy servers and browsers will simply not cache any URL with
/cgi-bin/ in it. Particularly "smart" proxy servers won't cache URLs that
contain "/WebObjects" or any of a standard set of extensions/suffixes for
common app servers (.exe, for example). Why? The assumption is that the
content is dynamic and that it is likely that the expiration dates on the
content are wrong-- better err on the side of fresh data because the
performance bottle neck will never be traced to the proxy server because
it is far easier to blame the internet or the remote server...
It is also rare that a site upon which we would actually be worrying about
performance to this degree would only be running a single application
server. It is far more common to have numerous application instances
running on a single box or across a farm of boxes.
That basically eliminates any in-memory caching of blob like data outside
of the filesystems already extremely effecient means of doing so. With
EOF, we are already tangling with the possibility of having n copies of
any given entity in memory (n being the number of app servers).
Thankfully, MOST EO data is relatively small-- until you start sticking
lots of BLOBs in the database!
Memory is cheap (excpet for high end sparc boxes)... but their are limits.
That combined with the voracious appetite for memory of Java and the
slightly lesser, but still very significant, appetite of EO and one can
quickly exceed the memory limitations of a machine.
Another bottleneck is the network infrastructure on the server farm.
Again, since we are working at the magnitude where we are concerned about
performance, it is likely that we have the application servers running on
a different machine than the database servers.
As such, any BLOB data coming out of the database will have to live in two
different memory spaces (at least-- again, this will be multiplied as we
increase the number of application instances) and will have to be copied
across the wire. In the most optimal case for retrieving an image from
the database, the client library would pass the pages of memory containing
the image between the database server and the application server-- but
that would require that both are running on the same machine and that the
client library and EO both support this capability (do they?? I don't
know).
Even then, we still have to move the image data from the application
server to the web server. In this case, it is highly unlikely that it
will be a case of simply mapping pages between processes (assuming that is
even possible)-- the response typically has a header and the webserver
typically has some kind of buffered write for sending the response back to
the client.
---
While it is certainly possible to build relatively large scale sites using
a model that includes images served from the database, it requires careful
design and planning. As well, one must take into count what the planned
evolution of the site is going to be and how much $$$ and time will be
spent on maintenance and optimization as the site's traffic grows.
None of the problems documented above are specific to WebObjects. Every
application server based system out their will exhibit the same kinds of
behaviour; though, obviously, given the superior nature of EOF in
managing data caches, it is likely that other middleware will fare for
worse than WO in similar situations!
For WebObjects to achieve the level of performance exhibited by serving
static resources off of the simple combination of a webserver +
filesystem would effectively require that WebObjects be the webserver. A
seemingly attractive feature (and certainly useful in a development
environment; direct connect should work as much like a real web server as
possible-- the fact that static images don't work in direct connect mode
is just plain silly), but not really that useful.
It isn't hard to populate a filesystem with various random static
resources-- there are a myriad of tools available for doing this. As
such, instead of storing BLOBs containing images in the database, it is
equally as easy to simply store the paths to the images in the databse--
leave the images in the filesystem and let the webserver do the task it
was originally intended to do (all this adaptor/cgi-bin stuff were just
a bunch of ineffecient-- though very effective-- hacques on the original
specification of the web, anyway)!!
Your applications will perform better. More importantly, when the day
comes that you have the next runaway highly traffic'd site on the net, you
will have a relatively easy path to dividing the mechanisms that drive
your site across a farm of servers; stick the image servers here, the web
servers there, the app servers on those machines over there, the database
servers on yet other machines, while still minimizing the crosstalk
between them....
b.bum
On Tue, 8 Jun 1999, Cliff Tuel wrote:
> (long rant about database-served images deleted)
>
> Doesn't WOResourceManager in WO4 address most of those concerns? If
> you use WOImage with the key attribute, the images are cached on the
> server by WOResourceManager. It enables client-side caching (the
> URL's are identical when they should be), and since it's an
> application-wide cache, it's shared between sessions.
>
> One drawback to WOResourceManager is potential memory bloat. But you
> can manage that yourself with flushDataCache and
> removeDataForKey:session: (note the session argument is currently
> unimplemented).
>
> On a side-note, an interesting gotcha of WOResourceManager is what to
> use as the image's cache key. If your database images are changing,
> the key should be something like a modified date or a checksum. If
> you use the same key and the image changes, you risk inadvertent
> client-side caching.
>
> --
> Cliff Tuel -- ctuel@apple.com
> Enterprise Technical Support / Apple Computer, Inc.
>
>