by Ben Forta
Technical Evangelist
Allaire Corp.
This article originally appeared in the August issue
of ColdFusion
Developer's Journal, published by SYS-CON
Media.
We all know that locking is important. Most of us even
understand why locks are needed. But exactly where to use
a lock, which lock type to use and what code to put within
the lock remains confusing at best.
Part of the confusion stems from changes Allaire made
in ColdFusion 4.5 that in turn changed the recommendations
and suggested practices. Indeed, even my own recommendations
changed with that release (as many of you CFUG members are
quick to point out). And so, at the request of several of
you, and because I've helped contribute to the confusion,
I'll cover these topics in this column and try to set the
record straight.
Variables
Locks are used primarily with variables, so let's start
there. Variables are kind of virtual containers in memory,
containers that are used to store data. Look at the following
code:
The <CFSET> tag creates (or overwrites) a variable,
in this case a variable named first_name. first_name can
be thought of as a container located somewhere in the memory
of the computer, a container that now has the name "Ben"
in it. To access the data in the container you simply refer
to the container by name, like this:
<CFOUTPUT>#first_name#</CFOUTPUT>
In this example I used a simple variable. I could just
have easily placed an array or list in that container, or
even data as complicated as an array of structures containing
arrays of structures, and so on.
Regardless of the type of data, one thing is consistent:
you refer to the container (the variable) by a unique name,
and that name provides access to the contents of the container
at the moment it's requested.
Understanding Threads
Before I go further, one other topic must be mentioned
briefly – threads. ColdFusion is a high-performance
application server; it's designed to process many requests
at once. It does this by running lots of concurrent tasks
within the application server, and each one handles a single
request at any given time. These tasks are known as "threads,"
and ColdFusion is a multithreaded application – in
other words, ColdFusion is designed to perform multiple
tasks concurrently. (There's actually much more to threads
than that, but this explanation is adequate for the issue
at hand.)
Simultaneous Variable Access
ColdFusion supports several different data scopes. "Scope"
defines the life span (persistence) and visibility of data.
Let's take a quick look at five scopes:
- VARIABLES: Used for data that needs no special
persistence. Data in the VARIABLES scope persists for
the duration of the processing of a request and is automatically
destroyed once the request has completed. The data is
visible only within the thread processing that request.
- SESSION: Used for session variables, for data
that relates to many requests that together make up a
session. Data in the SESSION scope persists until the
session times out. The data is visible to any thread that
processes requests for that session, and it's entirely
possible that multiple threads will process requests for
the same SESSION (even though a SESSION is mapped to a
single client).
- CLIENT: Also used for session variables, but
CLIENT is a little different from SESSION. Unlike SESSION
variables, CLIENT variables aren't stored in memory. Rather,
they're stored in a database (a database of your choice
or the Windows registry, which is also a database of sorts).
Data in the CLIENT scope persists until the client session
times out. The data is visible to any thread that processes
requests for that client session, and it's entirely possible
that multiple threads will process requests for a single
CLIENT session (even though a CLIENT session is mapped
to a single client).
- APPLICATION: Used for variables that are shared
across complete applications. Data in the APPLICATION
scope is visible to all threads processing requests for
that application.
- SERVER: Used for variables that are shared across
all applications running on the ColdFusion server. Data
in the SERVER scope is visible to every thread on the
server.
The variable first_name, created earlier, was created
in the VARIABLES scope. As explained above, this scope is
processed by a single thread, and as soon as that thread
has completed processing the request, the variable is destroyed.
As such, there is absolutely no way more than one request
could access the data in the VARIABLES.first_name container
at the same time.
But other scopes behave differently. The following code
creates a variable in the SESSION scope:
<CFSET SESSION.first_name="Ben">
As explained above, SESSION variables can indeed be processed
by multiple threads at once. If you use frames, if the user
hits the refresh button, if the underlying network makes
retries - there are lots of conditions that could cause
the same SESSION to be processed by more than one thread
at any given time.
This is where things get dangerous. Let's go back to our
container analogy. If you were to put data into a container
at the exact moment someone else was doing so, what would
happen? Both writes couldn't occur at the same time, so
something would get lost – or worse, the container
itself could become corrupted. If the <CFSET> statement
above was executed at the exact same time as another <CFSET>
statement that was updating the same variable, you'd likely
corrupt server memory. If you're lucky, you'll just throw
an error, but you could also negatively impact server operation
as a whole.
And it's not just SESSION variables that are affected.
APPLICATION and SERVER scope variables are even more susceptible
to this corruption as they're always shared. (CLIENT variables,
however, aren't susceptible as they're stored in a database;
the database handles concurrency issues itself.)
Using Locks
How do you get around this problem? The answer is to use a
lock. A lock does just that – it locks a block of code
(a block containing a <CFSET> statement, for example).
Going back to our container analogy, a lock acts as a guard
monitoring access to the container's contents. The guard's
job is to line up all access requests in the order they're
received, granting admission one request at a time, and only
after the previous access request has completed.
In other words, locks can arbitrate code execution across
multiple threads, pausing execution as needed. And yes,
this could slow down your application, but considering the
alternative it's a small price to pay. Accessing a variable
(writing or reading) while it's being written by another
thread is asking for trouble.
The next code snippet sets the same SESSION variable once
again, but this time locking it for the duration of the
update:
<CFLOCK SCOPE="SESSION" TYPE="EXCLUSIVE" TIMEOUT="10">
<CFSET SESSION.first_name="Ben">
</CFLOCK>
Locking is implemented using the <CFLOCK> tag, and
any code between the <CFLOCK> and </CFLOCK>
tags will be locked. The SCOPE attribute specifies the scope
to be locked by specifying SESSION as the scope. We're instructing
ColdFusion to lock only the code execution for a particular
SESSION. We wouldn't want to lock all sessions as that would
cause other operations to pause unnecessarily (they wouldn't
be updating this SESSION anyway). The TYPE attribute specifies
the lock type. EXCLUSIVE means that no other operations
on the specified SCOPE will be allowed while the lock is
being processed. The TIMEOUT specifies the maximum time
that ColdFusion should wait when trying to acquire a lock.
If that timeout is reached before the lock can be acquired
(perhaps because other threads have the same scope locked),
the entire <CFLOCK> code block is skipped (and an
exception is thrown).
To lock the APPLICATION scope you'd simply specify SCOPE="APPLICATION".
Doing so would lock the APPLICATION scope so any other attempt
to access APPLICATION data would be paused. The same is
true for SERVER.
It's important to note that <CFLOCK> will do its
job if all appropriate code is enclosed within <CFLOCK>
tags. If somewhere in the code you had a <CFSET> statement
that didn't use a <CFLOCK>, it could access the variable
even though it was locked. For locking to work, all accesses
must be managed by <CFLOCK> statements.
SCOPE vs. NAME
In the example above I used SCOPE="SESSION" to lock my <CFSET>
statement. Three scopes are supported: SESSION, APPLICATION
and SERVER. Specifying a SCOPE of SESSION locks any other
accesses for the same SESSION only. Specifying a SCOPE of
APPLICATION locks any other accesses for the same APPLICATION
(as named in the <CFAPPLICATION> tag, usually in APPLICATION.CFM).
Specifying a SCOPE of SERVER locks any other accesses for
SERVER scope locks server-wide.
ColdFusion also supports locking by NAME. Using this method,
you provide a name to identify the activity performed in
the locked code, and only locks with the same NAME will
be locked. Exactly what operations are locked within the
lock is entirely up to you. All <CFLOCK> does is ensure
that no two blocks of code with the same NAME are executed
at once. Using NAME gives you a greater level of control
over lock granularity, but with that control comes additional
risk. If you mistakenly use different names for two locks
that access the same data, you won't be locking at all.
The ability to lock code by scope was introduced in ColdFusion
4.5, and it's the preferred way to lock code that accesses
potentially shared variables.
Read-Only Locks
Locking is really an issue only when variables are being written
to. Going back to our container analogy, if multiple users
looked into the container at the same time to see what was
in it, no harm would be done. The same is true of read access
to shared variables.
Some languages support the use of constants, special variables
that are actually not variable at all as they can't be changed.
ColdFusion has no concept of constants, so CF developers
typically create variables in the APPLICATION scope (usually
in the APPLICATION.CFM file surrounded by a
check) and are careful never to overwrite them. If an application
contained variables like this, variables that were never
updated (after initial creation), you wouldn't really need
to lock them at all. But you'd have to be 100% sure that
an update wouldn't occur, realizing that there's nothing
you can do programmatically to prevent that.
What to do? Locking all read accesses (every time you
refer to #SESSION.first_name#, for example) with exclusive
locks imposes a significant performance loss, and the risk
may not be worth it. So you could opt not to lock variable
reads.
But there's always the chance that someone will edit the
code, and the variable that was never supposed to be updated
... well, what if some new code now updated it?
To address this problem, ColdFusion supports an additional
lock type, READONLY. A READONLY lock doesn't actually lock
anything unless an EXCLUSIVE lock is being processed at
the same time. Only then will the READONLY lock pause until
the EXCLUSIVE lock has completed. In other words, READONLY
locks have no real performance hit associated with them.
They are essentially ignored until an EXCLUSIVE lock is
in effect.
Other Operations Needing Locks
Variables aren't the only things that need locking. Any code
with potential concurrency issues should be locked. Examples
of this include:
- Accessing files with <CFFILE> if other processes
could be accessing the same data file
- Calling code that isn't multithread safe (some CFX tags,
for example)
- Connecting to remote sites via <CFHTTP> if those
sites don't allow concurrent connections
In all of these examples, locking the code block can avoid
concurrency problems. But instead of locking by SCOPE, these
operations should be locked by NAME.
Locking Tips
Locking is important and must be used. But locking slows your
application, as already mentioned. Locks must be used carefully,
and they must never be overused. Here are some pointers to
keep in mind:
- Don't lock code unnecessarily, but don't create and
drop locks too frequently. It's a fine line to walk, but
if you find yourself needing to lock two variables with
some other lengthy processing in between them (that doesn't
need locking), you might be better off using two locks
so you don't keep locks active when they're not needed.
- If you find yourself having to perform complex operations
on locked variables (for example, complex string processing,
looping or WDDX decoding), consider making local (VARIABLES)
copies of the data and performing the processing on the
local copy, then using a lock only when saving the local
copy back to the shared variable.
- APPLICATION locks should be used sparingly as they typically
apply to lots of code. If you need to lock only part of
the APPLICATION scope, consider using the NAME attribute
instead of SCOPE. This will give you more granular control
over exactly what gets locked and when, which in turn
can prevent unnecessary locking (or unnecessarily long
locking times). Of course, as explained earlier, using
NAME comes with a risk. You must be sure that different
names aren't used for code that accesses the same data.
(The same thing applies to SERVER variables.)
- ColdFusion 4.5 supports automatic locking modes in which
ColdFusion locks variables for you. There's a bit of overhead
in using auto-locking, plus you'll lose the potential
performance gains that could be attained by more granular
locking. As a rule, if performance is an issue (and when
isn't it?), don't use these options. You'll be able to
squeeze a bit more performance out of your application
by doing it yourself.
- ColdFusion 4.5 also supports a mode called Full checking.
In this mode no locking occurs automatically. Instead,
ColdFusion throws an error if a lock isn't used, helping
you eliminate potentially missed locks. But this option
can't be used with NAME locks - it'll always throw errors
on those.
Conclusion
The common theme is that locks should be used, but they must
be used carefully. And careful use requires a good understanding
of what locks are, what they do and how your application should
use them.
Locking is an important ColdFusion feature, and one that
serious developers must use in their applications. Without
locking there's a very real risk that data corruption will
occur, and this can impact server stability.
Incorrect lock use, of course, can bring your application
to its knees. That fine line must be walked. Yes, there
are performance penalties involved, but every decision involves
some kind of trade-off.
What you do is your choice. My advice? Lock it or lose
it.
About the Author
Ben Forta is Allaire Corporation's product evangelist for
the ColdFusion product line. Ben has over 15 years of experience
in the computer industry and spent 6 years as part of the
development team responsible for creating ONTime, one of the
most successful calendar and group scheduling products, with
over one million users worldwide.
Ben is the author of the popular The ColdFusion 4.0 Web
Application Construction Kit (now in its third edition)
and the more recent Advanced ColdFusion 4.0 Application
Development (both published by Que). He co-authored the
official Allaire ColdFusion training course, writes a regular
column on ColdFusion development, and now spends a considerable
amount of time lecturing and speaking on ColdFusion and
Internet application development worldwide.
Born in London, England, and educated in London, New York,
and Los Angeles, Ben now lives in Oak Park, Michigan, with
his wife Marcy, and their four children. Ben welcomes your
email at ben@forta.com
and invites you to visit his own ColdFusion
Web site.