Blog

Breaking the limits: Storing data bigger than 1 Mb in Google App Engine’s Datastore

Google App Engine is a fantastic platform for hosting webapps, and a great resource for iOS developers who need an online component to their products. It’s hard to believe that the service is essentially free! I’m using it with The Cartographer, but I found myself coming up against a hard limit with the datastore.

You see, the datastore limits entities to 1 Mb. I’m trying to store XML data in there, and sometimes that can exceed the 1 Mb limit.

XML being the verbose creature that it is compresses very nicely, so it occurred to me that if I selectively compress the larger blocks, I should be able to quite easily squeeze in underneath the limit. Sure enough, a 1.6 Mb XML block compressed into about 200 Kb.

App Engine makes it very easy to define custom properties on data models, so I’ve written a CompressibleTextProperty class that automatically compresses/decompresses properties above a certain size. This means that there’s no performance loss for entities that are small enough to fit easily, but still enables the storage of bigger blocks of content.

The alternative was to break entities up into several different database entities, but this sounded like much more work, and sounded much less elegant.

So here’s what I came up with — it’s used the same way the other Property types are used.Download it here: compressible_text_property.py

#!/usr/bin/env python
# encoding: utf-8
"""
compressible_text_property.py
 
A string property that will automatically be stored compressed if larger than a given length threshold
 
Created by Michael Tyson on 2011-01-07.
Copyright (c) 2011 A Tasty Pixel. All rights reserved.
 
BSD LICENSE
"""
 
from google.appengine.ext import db
from google.appengine.api import datastore_types
import zlib
Text = datastore_types.Text
 
LENGTH_THRESHOLD            = 500000 # Bytes
EXPECTED_ZLIB_HEADER        = u"x\x9c"
 
class CompressibleTextProperty(db.TextProperty):
    """A string property that will automatically be stored compressed if larger than a given length threshold
 
    This is designed to be used with textual properties that may exceed App Engine's 1MB entity size limit.
    Note that, if compressed, property will not be searchable.
    """
 
    def validate(self, value):
      """Validate text property; Nicked verbatim from TextProperty.
 
      Returns:
        A valid value.
 
      Raises:
        BadValueError if property is not instance of 'Text'.
      """
      if value is not None and not isinstance(value, Text):
        try:
          value = db.Text(value)
        except TypeError, err:
          raise BadValueError('Property %s must be convertible '
                              'to a Text instance (%s)' % (self.name, err))
      value = super(db.TextProperty, self).validate(value)
      if value is not None and not isinstance(value, Text):
        raise BadValueError('Property %s must be a Text instance' % self.name)
      return value
 
    def get_value_for_datastore(self, model_instance):
        """For writing to the datastore: Performs compression if length is greater than the threshold"""
        value = super(CompressibleTextProperty, self).get_value_for_datastore(model_instance)
        if len(value) > LENGTH_THRESHOLD and not value.startswith(EXPECTED_ZLIB_HEADER):
            value = unicode(zlib.compress(value), 'ISO-8859-1')
        return Text(value)
 
    def make_value_from_datastore(self, value):
        """For reading from the datastore: Decompresses if compressed data detected"""
        if value is None:
            return None
        if value.startswith(EXPECTED_ZLIB_HEADER):
            value = zlib.decompress(value.encode('ISO-8859-1'))
        return value
 
    data_type = Text
, , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

2 Comments

  1. eric
    Posted January 7, 2012 at 10:11 am | Permalink

    Hi,

    Thanks for sharing, Good post but IMHO I find a little problem. When the text size is smaller than the threshold “make_value_from_datastore” always return None, though its exact value gets printed in console, I don’t know why but converting it to str did the trick and now my code display proper value

    Changed “return value” to “return str(value)”

    with out the change “return str(value)” below code prints “String value is None” even it has a big value though if the value is g-zipped then there is no problem because zlib.decompress returns a “str” value.

    db class

    class CompressTestModel(db.Model): compressPTest = CompressibleTextProperty()

    test code

    ctm = CompressTestModel() ctm.compressPTest = “asdfasjfklasjdflkajsklfjaslkdfjlaksjdflkajsdfjads;fjaslkdjfwjlaskjd” ctm.put()

    ctm1get = db.get(str(ctm.key())) CTSUtil.ctsLogger(” string value is ” + str(ctm1get.compressPTest))

  2. Posted March 6, 2013 at 8:12 pm | Permalink

    I think bz2 does a better job at compressing data. I had a large input (almost 3MB) and zlib wasn’t good enough for this case. So I’ve changed “import zlib” for “import bz2 as zlib” and worked great. It is a one-line change that may be valuable for some ;)