Google App Engine is a fantastic platform for hosting webapps, and a great resource for iOS developers who need an online component to their products. It’s hard to believe that the service is essentially free! I’m using it with The Cartographer, but I found myself coming up against a hard limit with the datastore.
You see, the datastore limits entities to 1 Mb. I’m trying to store XML data in there, and sometimes that can exceed the 1 Mb limit.
XML being the verbose creature that it is compresses very nicely, so it occurred to me that if I selectively compress the larger blocks, I should be able to quite easily squeeze in underneath the limit. Sure enough, a 1.6 Mb XML block compressed into about 200 Kb.
App Engine makes it very easy to define custom properties on data models, so I’ve written a CompressibleTextProperty
class that automatically compresses/decompresses properties above a certain size. This means that there’s no performance loss for entities that are small enough to fit easily, but still enables the storage of bigger blocks of content.
The alternative was to break entities up into several different database entities, but this sounded like much more work, and sounded much less elegant.
So here’s what I came up with — it’s used the same way the other Property types are used.Download it here: compressible_text_property.py
#!/usr/bin/env python # encoding: utf-8 """ compressible_text_property.py A string property that will automatically be stored compressed if larger than a given length threshold Created by Michael Tyson on 2011-01-07. Copyright (c) 2011 A Tasty Pixel. All rights reserved. BSD LICENSE """ from google.appengine.ext import db from google.appengine.api import datastore_types import zlib Text = datastore_types.Text LENGTH_THRESHOLD = 500000 # Bytes EXPECTED_ZLIB_HEADER = u"xx9c" class CompressibleTextProperty(db.TextProperty): """A string property that will automatically be stored compressed if larger than a given length threshold This is designed to be used with textual properties that may exceed App Engine's 1MB entity size limit. Note that, if compressed, property will not be searchable. """ def validate(self, value): """Validate text property; Nicked verbatim from TextProperty. Returns: A valid value. Raises: BadValueError if property is not instance of 'Text'. """ if value is not None and not isinstance(value, Text): try: value = db.Text(value) except TypeError, err: raise BadValueError('Property %s must be convertible ' 'to a Text instance (%s)' % (self.name, err)) value = super(db.TextProperty, self).validate(value) if value is not None and not isinstance(value, Text): raise BadValueError('Property %s must be a Text instance' % self.name) return value def get_value_for_datastore(self, model_instance): """For writing to the datastore: Performs compression if length is greater than the threshold""" value = super(CompressibleTextProperty, self).get_value_for_datastore(model_instance) if len(value) > LENGTH_THRESHOLD and not value.startswith(EXPECTED_ZLIB_HEADER): value = unicode(zlib.compress(value), 'ISO-8859-1') return Text(value) def make_value_from_datastore(self, value): """For reading from the datastore: Decompresses if compressed data detected""" if value is None: return None if value.startswith(EXPECTED_ZLIB_HEADER): value = zlib.decompress(value.encode('ISO-8859-1')) return value data_type = Text |
Hi,
Thanks for sharing, Good post but IMHO I find a little problem. When the text size is smaller than the threshold “make_value_from_datastore” always return None, though its exact value gets printed in console, I don’t know why but converting it to str did the trick and now my code display proper value
Changed “return value” to “return str(value)”
with out the change “return str(value)” below code prints “String value is None”
even it has a big value though if the value is g-zipped then there is no problem because zlib.decompress returns a “str” value.
db class
class CompressTestModel(db.Model):
compressPTest = CompressibleTextProperty()
test code
ctm = CompressTestModel()
ctm.compressPTest = “asdfasjfklasjdflkajsklfjaslkdfjlaksjdflkajsdfjads;fjaslkdjfwjlaskjd”
ctm.put()
ctm1get = db.get(str(ctm.key()))
CTSUtil.ctsLogger(” string value is ” + str(ctm1get.compressPTest))
I think bz2 does a better job at compressing data. I had a large input (almost 3MB) and zlib wasn’t good enough for this case. So I’ve changed “import zlib” for “import bz2 as zlib” and worked great. It is a one-line change that may be valuable for some ;)