Thoughts on Azure Functions_

April 17, 2020 @10:40

Introduction

Being sequestered in the house for the last month and a bit has given me (as I am sure it has most of us) an opportunity to go through the old ~/TODO list. One of the things that has been aging on there has been to finally explore "Serverless Computing" (whomever coined that phrase has forgotten the face of their father). When evaluating the various options available I decided to look at Azure Functions for a variety of reasons. Firstly of the big three, I find Microsoft the least distasteful. Their business model isn't 'harvest everyone's data and sell it while also sometimes doing other things', instead they are an old world corporation who seems to basically have a go-to-market strategy of exchange goods and services for money. Secondly when I first started looking into this they were the only provider to support Python which is my preferred language. I did also look at Cloudflare Workers briefly as running functions at the edge makes a lot more sense to me than running them in a central datacenter but the lack of Python support and the lack of a couple other features (more on that as I talk about requirements) meant I'd need to incorporate their technology with something else which isn't what I was looking to do.

It verks! It is verking!03/05/2020 @17:50

Goal

Wwith a platform chosen, what exactly am I going to do with it? Well that was the hardest part of the whole thing. My needs are pretty simple as evidenced by the fact that the entire public website up to this point has been static files. Granted they are generated by about 2200 lines of Python, but what ultimately gets served up to you, dear reader is loaded directly off disk.

Well, sometimes I feel like putting up a link, or a thought, or a picture and creating and commiting a new markdown file is more work than I think is warranted for a cat picture or two. I refuse to go back to social media so the obvious answer of re-open your Twitter account or get an Instagram is also not acceptable so how about I make something like that? A silly little postbox that I can include on my website along side this blog for quick and dirty little posts? That just might work.

And so Thoughts (previously called vociferate) was born (and after around 350 words of blog post a pun is uncovered).

Requirements

So with a goal in mind I set out to think about what I wanted and needed from this project if I was actually going to implement it beyond just playing around. Since the root of this whole thing was to learn and amuse myself these would not stop be from going ahead, only from this ever seeing the light of day.

My data is my own

The idea here is that while I'll lean on the cloud platform to do all of this I want to have things setup in such a way that I can come back periodically and pull all my data out onto my server. This way if I change cloud providers, or decide to self-host this I have all the stuff I created.

My thought on this was that periodically I can have a cron(8) job or something construct static HTML pages on my server based on the data in the cloud. While I have not written the script yet, the preparations for Apache to support this were easy.

        RewriteCond '%{REQUEST_URI}' '^/thoughts' [NC]
        RewriteCond /var/www/going-flying.com/%{REQUEST_URI} !-f
        RewriteRule '^/thoughts/(\d+)\.html' /thoughts/single.html [L]

That configuration fragment tells Apache to serve up single.html if the requested file does not exist on disk. Basically if I have rendered a version then use that otherwise use a page that will get it from the cloud.

URLs MUST not change

How I plan to support this is intertwined with the previous point but in this day and age of release early and release often software and the dubious quality that comes along with it seems like people have forgotten that URLs are supposed to actually mean something. Add in the alarming rate of link rot and I am fiercely determined that across all of my public-facing web presence I will do what I can to make sure URLs remain accessable as long as the content they provide still exists.

URLs should make sense

Honestly, I'm pretty tired of every website being a pile of query strings and hash fragments. It seems like more modern web applications are starting to get better at this but the dark times of having a 400 character URL filled with '?' and '%' characters has scarred me and I refuse to contribute to that.

Posting should be easy

Yep, this is pretty self explanitory though because Apple remains profoundly anti-developer that means that I'm going to have to have posting happen through a Web interface only. While it would have been tremendiously simpler for me to write a quick and dirty Swift application for my phone than to implement Microsoft's authentication mechanisms in pure JavaScript. I'd rather do that then put up with Apple's draconian restrictions on what I can do with my own damn device.

There shall NOT be any public API endpoints listening on my server.

I mean, this is really the whole point of using serverless computing for this, right?

The Function side of the house

Azure Function CLI tools starting up It turns out that after fighting with getting the tooling setup (most of the difficulty comes from the fact that I hate Visual Studio Code. If you like VS Code, getting started with Functions is super easy), the rest of the process is very easy. I ended up with four small functions to support this little project.

I talk about (and include the current version as of writing) them below, but the most up to date copies are in my git repository.

Get

As you might expect the most important aspect is to get the information stored in the various datastores. There isn't much to talk about, this is what Azure calls a HTTP triggered function, so it responds to an incoming HTTP request from the world and emits some sort of response. It essentially provides a dead stupid interface to a pair of Tables in Azure.

import json
import logging
import os
import azure.functions as func

from ..lib import ThoughtService

I have the actual Table Storage stuff in lib/ since it is used in a few places. You can see it here if you are interested.

def main(req: func.HttpRequest, context: func.Context) -> func.HttpResponse:
    try:
        thoughtService = ThoughtService()
    except Exception as e:
        logging.exception(e)
        return func.HttpResponse(status_code=500)

    if req.params.get('id', None):
        rId = req.params.get('id')
        try:
            return func.HttpResponse(
                json.dumps(thoughtService.byId(rId))
            )

        except Exception as e:
            logging.exception(e)
            return func.HttpResponse(status_code=404)

    # count may be specified either alone or with since.
    nCnt = req.params.get('count', None)
    nSince = req.params.get('since', None)

    if nCnt is not None and nSince is None:
        try:
            return func.HttpResponse(
                json.dumps(thoughtService.count(nCnt))
            )
        except Exception as e:
            logging.exception(e)
            return func.HttpResponse(status_code=500)

    elif nSince is not None:
        try:
            return func.HttpResponse(
                json.dumps(thoughtService.since(nSince, nCnt))
            )
        except Exception as e:
            logging.exception(e)
            return func.HttpResponse(status_code=500)

    # This is the default case.
    try:
        return func.HttpResponse(json.dumps(thoughtService.all()))

    except Exception as e:
        logging.exception(e)
        return func.HttpResponse(status_code=500)

This part is pretty simple, I am just parsing the query string and returning Thoughts based on the filters that the client asked for.

RSS Function

I really like RSS feeds. They are honestly my preferred way to look at things on the Internet. I have a bunch of software that exists to turn other websites into RSS feeds for my own consumption. My blog has a RSS feed (that I encourage you to use if you do not) and so of course my Thoughts shall as well.

This is 99% of the same code as the get function but instead of emitting JSON it emits RSS. I probably could have implemented it in the same function and just added a query parameter to distinguish between JSON and XML but honestly I like the clean url of /api/rss vs /api/get and I can't count on RSS readers sending something like Accept: application/xml+rss in the request.

import datetime
import logging
import os
import azure.functions as func

import PyRSS2Gen

It is really nice that the Azure remote build supports requirements.txt. Simply adding PyRSS2Gen to it allowed this to build and run even though PyRSS2Gen is not installed by default.

def main(req: func.HttpRequest, context: func.Context) -> func.HttpResponse:
    baseUrl = 'https://www.going-flying.com'
    blobUrl = 'https://thoughtsassets.blob.core.windows.net/thumbs'

    try:
        thoughtService = ThoughtService()
    except Exception as e:
        logging.exception(e)
        return func.HttpResponse(status_code=500)

    # We will return 25 thoughts for the RSS feed, that seems
    # reasonable to start with.
    try:
        rss = PyRSS2Gen.RSS2(
            title='Thoughts from Matthew Ernisse',
            link='https://www.going-flying.com/thoughts/',
            description='Brief musings and found items from Matthew Ernisse',
            lastBuildDate=datetime.datetime.utcnow()
        )

        for thought in thoughtService.count(25):
            title = 'A brief thought from mernisse'
            pubDate = datetime.datetime.fromtimestamp(
                int(thought['id'])
            )
            link = f'{baseUrl}/thoughts/{thought["id"]}.html'
            guid = PyRSS2Gen.Guid(link)
            html = f'<div><span>{thought["message"]}</span>'

            if thought.get('attachment'):
                html += f'<img src="{blobUrl}/{thought["attachment"]}">'

            html += f'<img src="{baseUrl}/blog/images/p.gif">'
            html += '</div>'

            rss.items.append(PyRSS2Gen.RSSItem(
                title=title,
                pubDate=pubDate,
                description=html,
                link=link,
                guid=guid
            ))

        return func.HttpResponse(
            rss.to_xml(),
            mimetype='application/rss+xml',
            charset='utf-8'
        )

    except Exception as e:
        logging.exception(e)
        return func.HttpResponse(status_code=500)

Image processing pipeline

The deployment scripts on my website do several nice things that I wanted to replicate here. Firstly they strip EXIF tags from images. This is even more important in the case of Thoughts since I plan on posting from my mobile devices frequently and any images taken on them will most certainly have GPS data in them that I don't want to leak. Secondly they create intermediate sizes of images so my blog posts can use srcset to deliver the most bandwidth efficient size for your screen. While I don't think I need to go quite that far I do believe that creating a thumbnail version of the images will go a long way to keeping the transfer down (and my Azure bill low). Thankfully Azure Functions supports a blob trigger which allows a function to run when a blob is created or changed. I combined this with Table Storage (for long term state) and Queue Storage (to trigger the follow on job) to implement a rudimentary image processing pipeline.

The first job, called stripper is triggered by the blob trigger and removes the EXIF data and stores the attachment name in a table. This lets it know not to re-trigger on the overwritten, now EXIF-less object and it provides the size data over to the ThoughtService class which uses it for the get function.

import io
import logging
import os
import sys
import azure.functions as func
from PIL import Image

from ..lib import AttachmentService

AttachmentService is also over in lib.

def main(
    inBlob: func.InputStream,
    outBlob: func.Out[bytes],
    msg: func.Out[str]):
    try:
        logging.info(f'Loading {inBlob.name}')
        inImage = Image.open(inBlob)
    except Exception as e:
        logging.error(f'Exception loading {inBlob.name}')
        logging.exception(e)
        return

    mimeType = inImage.get_format_mimetype()
    try:
        attachSvc = AttachmentService()
        if attachSvc.is_processed(inBlob.name):
            logging.info(f'{inBlob.name} already processed.')
            return

    except Exception as e:
        logging.error('Failed to get Table Service')
        logging.exception(e)
        return

    # Strip the path so thumbnailer's function.json can use the
    # queueTrigger variable for the output binding.
    fn = os.path.basename(inBlob.name)
    if mimeType != 'image/jpeg':
        logging.info(f'{inBlob.name} not a JPEG.')
        attachSvc.mark_processed(
            inBlob.name,
            inImage.size[1],
            inImage.size[0]
        )
        msg.set(fn)
        return

    # Re-Save the input so that we strip the EXIF information.  Log it
    # in the attachmentTable so we do not loop on the blob change notice.
    inBytes = io.BytesIO()
    inImage.save(inBytes, format='JPEG')
    outBlob.set(inBytes.getvalue())
    logging.info(f'Stripped EXIF tags from {inBlob.name}.')
    attachSvc.mark_processed(
        inBlob.name,
        inImage.size[1],
        inImage.size[0]
    )
    msg.set(fn)

Once the data has been processed a message gets dropped in Queue Storage which thumbnailer uses as a trigger. This lets me use Pillow to create the thumbnailed version without any intervention on the client side. Sweet.

import io
import logging
import os
import sys
import azure.functions as func
from PIL import Image

# Maximum Width in Pixels for the Thumbnail Version.
MAXWIDTH = 180


def main(
    msg: func.QueueMessage,
    inBlob: func.InputStream,
    thumbBlob: func.Out[bytes]):
    ''' To execute the thumbnailer function, a message must be inserted
    into the 'image-process-pipeline' Queue containing the name of the
    blob to process.  The blob must be in the 'assets' Container.
    '''
    logging.info(f'thumbnailer triggered by {msg}')

    try:
        logging.info(f'Loading {inBlob.name}')
        inImage = Image.open(inBlob)
    except Exception as e:
        logging.error(f'Exception loading {inBlob.name}')
        logging.exception(e)
        return

    outType = 'JPEG'

    if inImage.get_format_mimetype() == 'image/png':
        outType = 'PNG'

    height = inImage.size[1]
    width = inImage.size[0]

    if width <= MAXWIDTH:
        logging.info(f'Input is {width}x{height}, no resize.')
        thumbBlob.set(inBlob)
        return

    h = MAXWIDTH * height / width
    inImage.thumbnail((MAXWIDTH, h), Image.LANCZOS)

    thumbBytes = io.BytesIO()
    inImage.save(thumbBytes, format=outType)
    thumbBlob.set(thumbBytes.getvalue())
    logging.info(f'Resized {inBlob.name} to {MAXWIDTH}x{h}')

The blob input simply gets a handle to the blob in question which lets me mutate it and then write out to another container. Because I don't have easy control over the name attribute of the output binding (it has to be specified in the function's JSON description file statically) I simply save to a different container. I'd have rather had file.jpg and file-thumb.jpg or the like but instead I have assets/file.jpg and thumbnails/file.jpg.

Conclusion

So that's it. Around 400 lines of Python (including the duplicates but excluding the license headers) and I've basically got the plumbing done. What I didn't really expect was the maze of twisty passages that getting the client side done was going to be. In my curmudgeonly fashion I had to write all the JavaScript myself, eschewing pointedly any libraries or frameworks that might help me out. First principals all the way. The upside is all the work recently done with Web Components actually made a lot of the work bearable. I cringe every time I find myself with 30 or 40 lines worth of document.createElement('div'); in a script. With templates and custom elements the display of all of my thoughts boils down to a list of <goingflying-thought-small> elements that happen to know what to do and how to look.

All told there is around 1500 lines of JavaScript encompassing the 'client' side of this. I wrote a bare bones Azure Blob and Table library to allow me to post directly to my storage account without needing to write a set of functions. That is about 500 lines. The rest is to manage:

Funny, I think I learned more about modern web development practices than I did about serverless computing during this whole exersize. Though honestly that is probably how it should be. While obviously serverless is an awful name, and my code is absolutely running on a Linux VM somewhere in Azure, I have had to spend exactly zero time thinking about that. My entire interaction after setting up the Function App and storage services in the Azure portal and then installing the development tools has been to run func azure functionapp publish vociferate (until I put that in a Makefile). I spent the vast majority of my time iterating on the pile of JavaScript, CSS and HTML to make it look pretty.

If I had done this in a more traditional way it would have probably been the same amount of Python in a Flask app and a little less JavaScript (the storage API authentication in Azure is a bit... complex) but it would also be one more app on my side to take care of. One more Puppet manifest to write, one more... thing.

Am I going to go and start using Functions for all the things? No. Do I think serverless is going to fundamentally change application development and delivery? 🤣 No. But I am happy to have it as a tool in the bag and I can see innumerable use cases for a tool like this, especially integrating collections of services together.

Stay safe out there folks. Stay inside and look at the cloud. 🌩

Subscribe via RSS. Send me a comment.