Default tuple values

In python you can assign values to a number of variables using tuple unpacking:

foo, bar, baz = some_3_values_tuple

But if the tuple does not have an exact number of values the assignment will fail with ValueError:

>>> some_tuple = 1,2
>>> foo, bar, baz = some_tuple
Traceback (most recent call last):
File "", line 1, in
ValueError: need more than 2 values to unpack

In that case fill up missing values with None:

>>> foo, bar, baz = some_tuple + (None,) * (3 - len(some_tuple))

PS: I like one-liners that can supplement otherwise needed if-elif-else logic ūüôā

Advertisements

Removing Models from Google App Engine at minimum cost

There come times when data schema¬†changes¬†so you’ll have to remove some models by the bunch. AFAIK there is no currently a ‚ÄěTRUNCATE TABLE MyKind‚ÄĚ or “DROP TABLE MyKind‚ÄĚ equivalent in Google App Engine¬†data store, so you’ll have to write an ordinary request handler to do the bidding or use MapReduce.

I have wrote some utility code to do background processing on the very first versions of GAE ‚ÄĒ when there was no MapReduce available ‚ÄĒ and I’m happy with that, but I surely would like to compare my own solution with MapReduce based¬†once¬†‚ÄĒ maybe some other time‚Ķ

Regardless of what type of solution you’ll use, you’ll need to¬†remember¬†about¬†indexes and how the writing cost is calculated ‚ÄĒ deleting is considered writing. Each time you write something you’ll update an index, this count toward data store the quotas¬†where you”ll pay for each write¬†(I mean delete):¬†2 Writes + 2 Writes per indexed property value + 1 Write per composite index value

So it’s good thing to remove all composite indexes on the model before you’ll start deleting them in bulk, it’s not immediate so¬†remember¬†to check indexes section in the app engine console. Other thing you¬†might¬†do is to disable indexing on all the model properties¬†‚ÄĒ I’m not sure thou if it will impact the index update on delete, it’ probably will not (need to check that some day), but there is harm in disabling those indexes also.

I use a simple push task queue handler:

def purge_audit_logs(request):
    url = reverse('purge_audit_logs')
    q = AuditLog.query()
    seq, cursor, more  = fetch_page(request, q, page_size=50, keys_only=True)
    ndb.delete_multi(seq)
    if more:
        schedule_next(request, url, cursor=cursor, queue_name="cleanup")
    return HttpResponse("OK")

This will do the¬†purging¬†in serial minimizing the cost and allowing you to do things slowly, but if you’ll what a fast solution use fan-out and a faster queue:

def purge_audit_logs(request):
    url = reverse('purge_audit_logs')
    q = AuditLog.query()
    seq, cursor, more  = fetch_page(request, q, page_size=50, keys_only=True)
    if more:
        schedule_next(request, url, cursor=cursor, queue_name="cleanup")
    ndb.delete_multi(seq)
    return HttpResponse("OK")

And here are my utility functions for task queues handlers:

def fetch_page(request, query, page_size=30, **query_options):
    cursor = request.POST.get("cursor", None)
    if cursor:
        cursor = Cursor(urlsafe=cursor)
    col, cursor, more = query.fetch_page(page_size, start_cursor=cursor, **query_options)
    if cursor:
        cursor = cursor.urlsafe()
    return col, cursor, more

def schedule_next(url, queue_name='deferred', cursor=None):
    task = Task(countdown=0, url=url, params={'cursor': cursor})
    Queue(queue_name).add(task)

These are simplified, in reality these are do much more, like task naming to disallow duplicate task scheduling for the same data. Task can occasionally run twice, so you’ll need to make them idempotent.