Debugging Jinja2 templates in Google App Engine

Jinja2 needs some python modules that are forbidden in the python sandbox provided by Google App Engine. Fortunately its fairly easy to white list them in the SDK local development server — currently there is no workaround for production.

PRODUCTION = not os.environ.get('SERVER_SOFTWARE', '').startswith('Dev')
    # enable jinja2 debugging info in GAE SDK
    from import sandbox
    sandbox._WHITE_LIST_C_MODULES += ['_ctypes', 'gestalt']

The original workaround is not working on the new development server aka devappserver2.


Default tuple values

In python you can assign values to a number of variables using tuple unpacking:

foo, bar, baz = some_3_values_tuple

But if the tuple does not have an exact number of values the assignment will fail with ValueError:

>>> some_tuple = 1,2
>>> foo, bar, baz = some_tuple
Traceback (most recent call last):
File "", line 1, in
ValueError: need more than 2 values to unpack

In that case fill up missing values with None:

>>> foo, bar, baz = some_tuple + (None,) * (3 - len(some_tuple))

PS: I like one-liners that can supplement otherwise needed if-elif-else logic 🙂

Embedded webapp mini profiler

If your’re ASP.NET MVC developer you should already know about MVC Mini Profiler released by Stack Overflow, if you don’t, go get it now.

Google App Engine developers have the appstats, a tool that does similar things but requires you to explicitly look for profiling info.

Now GAE has mini-profiler to — this is old news apparently 😉 — an appstats wrapper mini app that you can embed in your web app, check out Google App Engine Mini Profiler and have stats in your face, all the time.

Create test directory tree with PowerShell

I needed to create directory tree structure to test something and I thought that it might be the right time to try out PowerShell.

An hour ago I didn’t know PowerShell, and I still don’t — I’m pretty sure there this is noob script… Nonetheless it does what it meant to and is better than “hello world” for a first script 😉

function New-TestFS ([string]$Path = "c:fstest", [int]$Depth = 1) { 
	if ($Depth -gt 10) { 
	Write-Host "#### NEW DIR: $Path $Depth"
	New-Item -ItemType directory -path $Path -ErrorAction SilentlyContinue 
  	900..999 | % { 
		New-TestFS -Path ("$PathDir$_") ($Depth + 1)
	Write-Host "#### NEW FILES: $Path $Depth"
	for ( $i=1; $i -le (Get-Random -Minimum 100 -Maximum 199); $i++ ) { 
		Write-Host "FILE: $PathFile$i.txt"
		fsutil file createnew ("$PathFile$i.txt") 0

Kudos to Jeff Wouters for his powershell example 🙂

Removing Models from Google App Engine at minimum cost

There come times when data schema changes so you’ll have to remove some models by the bunch. AFAIK there is no currently a „TRUNCATE TABLE MyKind” or “DROP TABLE MyKind” equivalent in Google App Engine data store, so you’ll have to write an ordinary request handler to do the bidding or use MapReduce.

I have wrote some utility code to do background processing on the very first versions of GAE — when there was no MapReduce available — and I’m happy with that, but I surely would like to compare my own solution with MapReduce based once — maybe some other time…

Regardless of what type of solution you’ll use, you’ll need to remember about indexes and how the writing cost is calculated — deleting is considered writing. Each time you write something you’ll update an index, this count toward data store the quotas where you”ll pay for each write (I mean delete): 2 Writes + 2 Writes per indexed property value + 1 Write per composite index value

So it’s good thing to remove all composite indexes on the model before you’ll start deleting them in bulk, it’s not immediate so remember to check indexes section in the app engine console. Other thing you might do is to disable indexing on all the model properties — I’m not sure thou if it will impact the index update on delete, it’ probably will not (need to check that some day), but there is harm in disabling those indexes also.

I use a simple push task queue handler:

def purge_audit_logs(request):
    url = reverse('purge_audit_logs')
    q = AuditLog.query()
    seq, cursor, more  = fetch_page(request, q, page_size=50, keys_only=True)
    if more:
        schedule_next(request, url, cursor=cursor, queue_name="cleanup")
    return HttpResponse("OK")

This will do the purging in serial minimizing the cost and allowing you to do things slowly, but if you’ll what a fast solution use fan-out and a faster queue:

def purge_audit_logs(request):
    url = reverse('purge_audit_logs')
    q = AuditLog.query()
    seq, cursor, more  = fetch_page(request, q, page_size=50, keys_only=True)
    if more:
        schedule_next(request, url, cursor=cursor, queue_name="cleanup")
    return HttpResponse("OK")

And here are my utility functions for task queues handlers:

def fetch_page(request, query, page_size=30, **query_options):
    cursor = request.POST.get("cursor", None)
    if cursor:
        cursor = Cursor(urlsafe=cursor)
    col, cursor, more = query.fetch_page(page_size, start_cursor=cursor, **query_options)
    if cursor:
        cursor = cursor.urlsafe()
    return col, cursor, more

def schedule_next(url, queue_name='deferred', cursor=None):
    task = Task(countdown=0, url=url, params={'cursor': cursor})

These are simplified, in reality these are do much more, like task naming to disallow duplicate task scheduling for the same data. Task can occasionally run twice, so you’ll need to make them idempotent.