When you take a look at http://svn.repoze.org/whatsitdoing/, you'll see profiling measurements for a good number of web frameworks, Pyramid included. "Lies and benchmarks" apply, but the relative "rankings" of each of the tested systems (as of their last testing) is as follows. A lower "numlines" is arguably better:
Numlines System 22 Pyramid (1.0a9) 34 Bottle (0.8.1) 57 Django (1.02) 96 Flask (0.6) 96 Pylons (0.9.7rc4) 250 Zope2 (2.10 via repoze.zope2) 317 Grok (1.0a1) 398 TurboGears 2 (2.0.3)
The "numlines" value above is the number of lines of profiler output generated by 1000 hits to a warm "hello world" application in each framework. This number was taken from the "Profile lines" data point in each "results.txt" of each subdirectory of the reporting data. This metric counts the number of lines output by the Python profiler after these 1000 hits. See What's Your Web Framework Doing Under the Hood for more context.
While a lower number doesn't necessarily always mean the system is faster or simpler on every axis, there is often a correlation between the number of profile lines output and the relative complexity and speed of the very core of the system being tested. Many of these tests were done more than a year ago, so benchmarks for the frameworks shown under test may show improvements in more recent versions, and the whatsitdoing tests are deliberately repeatable, so I hope other framework authors contribute more recent scores. But that's not the point of this blog post, because most of these frameworks are empirically simple and fast enough for production use, and their scoring on this particular benchmark is not particularly meaningful when developing a large scale application.
But still why is Pyramid's "numlines" low? It has been the "best scoring" framework on this particular test for about a year now. It's not a "microframework", so how can it be doing less than the two microframeworks listed in the results table (Flask and Bottle)? It has more features than Pylons, so why is it doing less than Pylons? It arguably has many of the features of TurboGears, so why is it doing less than TG? For that matter, why is Django doing less than Pylons? Why is Flask doing more than Django? The terms "lightweight" and "microframework" don't necessarily correlate with better performance.
The below points describe why Pyramid executes very few function calls for a given request, which makes it fast
When Pyramid starts up, it considers configuration statements made by a user. Most configuration statements made by users are about views. Pyramid's most important job in life is to figure out which view callable (if any) to execute for a given request. Views are callables meant to respond when a particular URL pattern or other request parameter is matched.
View callables in Pyramid can take many forms. A view might be a simple module-scope function. It might be a method of a class. It could be an instance. Regardless, when Pyramid considers a view callable at startup time, it does a good bit of sniffing at the callable and its associated configuration parameters. It does not sniff at the callable and the callable's configuration parameters during each request. Pyramid sniffs once at startup time and, if necessary, wraps the callable in one or more wrapper functions. For example, at startup, if a renderer has been specified, Pyramid wraps the view callable in a renderer function. If an authorization policy is in place, and the view is protected with a permission, it might wrap the view callable within a function that does authorization checking. If authorization debugging has been enabled, it might wrap the view callable within a function that emits debugging information when a request is encountered. If a view callable has a special kind of argument signature, it may wrap the callable within a mapper that maps how the view callable is called. And so on. These wrappers are executed as a pipeline. Here's the actual bit of Pyramid code that guides this magic:
def __call__(self, view): return self.attr_wrapped_view( self.predicated_view( self.authdebug_view( self.secured_view( self.owrapped_view( self.decorated_view( self.rendered_view( self.mapped_view(view))))))))
"view" in the code above is the view callable specified by the user. "self" is an instance of the pyramid.config.ViewDeriver class which holds on to system and view-specific configuration. Each one of the methods called potentially wraps the user-specified view callable in some wrapping layer that performs a function. If a particular subsystem is not required to execute the view, the method simply returns the view unwrapped (for example, if no authorization checking is required because the user has not configured an authorization policy or protected the view with a permission, the call to self.secured_view above will just return the view it was passed).
Using a functional composition like this makes it possible to call the least amount of code required to execute any particular view. For example, instead of making the framework check over and over whether some view callable is executable due to security constraints when a request enters the system, we just make that check the responsibility of a view callable wrapper. The wrapper protects the view callable from inappropriate execution. If there's no authorization policy active, there's no view callable wrapper, and the view callable is executed more directly. If there is an authorization policy, and the view is protected by a permission, the authorization wrapper is executed first, which either executes the view function if authorization was permitted or it raises an exception indicating that permission was denied. In either case, the least amount of code is run for any particular circumstance.
In general, Pyramid does work at startup time to avoid doing work at request time. Other such optimizations exist in the Pyramid codebase unrelated to view lookup, too. You might think this would make Pyramid startup "slow", but it's not.. not particularly anyway. Even the largest Pyramid application starts much faster than the smallest Zope or Grok application. The 110Kline application named Karl (a repoze.bfg app) starts in under a second. Pyramid is usable on Google App Engine, which has an execution model much like CGI. This optimization work is pretty trivial. It's not even particularly clever.
To determine whether a particular view callable should be invoked for the current request or not, Pyramid uses an optimization: the zope.component subsystem, which is written in C. zope.component provides Pyramid with a "registry" object that is sort of like a "superdictionary". A very common pattern of access employed by Pyramid can be worded in English something like "find me the best match for a view given the current context resource and the current request". zope.component can make this determination very quickly without the overhead of many Python function calls. zope.component was developed essentially to make this particular query as fast as possible.
The Pyramid "whatsitdoing" application replaces the WebOb "Response" object with a simpler variant that does a lot less. Calling this "cheating" is a bit of an overstatement, because Pyramid's design allows for this. It's even documented. It's only documented, and designed as possible, however, because we did systematic optimization very early and decided that tying Pyramid to a particular Response implementation was unneccessary. Not all frameworks have made such a decision. Pyramid never creates a global Response object. This is another design decision which doesn't hurt much in practice, and which makes it possible for specialized applications to get very high throuput. Again, not all frameworks have made such a decision.
The first run of the "whatsitdoing" hello world on repoze.bfg (Pyramid's ancestor) in 2008 produced about 100 lines of profiling output. We've slowly whittled down extraneous dynamicism and inessential features to bring it up to its current level of performance.
In any long-running process that executes the same code over and over again, do as much work as possible up front. An analogy can be drawn between the way that web frameworks optimize view lookup and execution and static vs. dynamic typing. The way Pyramid takes advantage of user-provided view configuration statements to exploit startup-time view wrapper optimizations is comparable to a compiler taking advantage of static type declarations. The "compiler" (Pyramid's view wrapping system) does only as much work as necessary, and never redoes any unnecessary request-time checking. Zope, Grok, TG, and other systems, do not seem to take as much advantage of the potential provided to it by user-provided configuration statements. Instead, they likely do more work than is strictly necessary.
Specialized C optimizations help, as well as trusting libraries. In our case, we use zope.component to avoid the function call overhead that a well-factored Python implementation of view-vs-request testing would imply.
Premature optimization may be the root of all evil, but deferring optimizing until "after 1.0" is the root of slow and painted into a corner. Optimizing while your framework is in its design phase is extremely enlightening, and can help you make early design decisions that will keep your system smaller and faster.