Introduction
Being sequestered in the house for the last month and a bit has given me
(as I am sure it has most of us) an opportunity to go through the old
~/TODO
list. One of the things that has been aging on there has been
to finally explore "Serverless Computing" (whomever coined that phrase
has forgotten the face of their father). When evaluating the various
options available I decided to look at
Azure Functions
for a variety of reasons. Firstly of the big three, I find Microsoft the
least distasteful. Their business model isn't 'harvest everyone's data and
sell it while also sometimes doing other things', instead they are an old
world corporation who seems to basically have a go-to-market strategy of
exchange goods and services for money. Secondly when I first started
looking into this they were the only provider to support
Python which is my preferred language.
I did also look at Cloudflare Workers briefly as running functions at the edge
makes a lot more sense to me than running them in a central datacenter but
the lack of Python support and the lack of a couple other features (more
on that as I talk about requirements) meant I'd need to incorporate their
technology with something else which isn't what I was looking to do.
It verks! It is verking!03/05/2020 @17:50
Goal
With a platform chosen, what exactly am I going to do with it? Well that was the hardest part of the whole thing. My needs are pretty simple as evidenced by the fact that the entire public website up to this point has been static files. Granted they are generated by about 2200 lines of Python, but what ultimately gets served up to you, dear reader is loaded directly off disk.
Well, sometimes I feel like putting up a link, or a thought, or a picture and creating and commiting a new markdown file is more work than I think is warranted for a cat picture or two. I refuse to go back to social media so the obvious answer of re-open your Twitter account or get an Instagram is also not acceptable so how about I make something like that? A silly little postbox that I can include on my website along side this blog for quick and dirty little posts? That just might work.
And so Thoughts (previously called vociferate) was born (and after around 350 words of blog post a pun is uncovered).
Requirements
So with a goal in mind I set out to think about what I wanted and needed from this project if I was actually going to implement it beyond just playing around. Since the root of this whole thing was to learn and amuse myself these would not stop be from going ahead, only from this ever seeing the light of day.
- My data MUST be my own.
- URLs do NOT change.
- URLs SHOULD make sense.
- Posting MUST be easy.
- There shall NOT be any public API endpoints listening on my server.
My data is my own
The idea here is that while I'll lean on the cloud platform to do all of this I want to have things setup in such a way that I can come back periodically and pull all my data out onto my server. This way if I change cloud providers, or decide to self-host this I have all the stuff I created.
My thought on this was that periodically I can have a cron(8) job or something construct static HTML pages on my server based on the data in the cloud. While I have not written the script yet, the preparations for Apache to support this were easy.
RewriteCond '%{REQUEST_URI}' '^/thoughts' [NC]
RewriteCond /var/www/going-flying.com/%{REQUEST_URI} !-f
RewriteRule '^/thoughts/(\d+)\.html' /thoughts/single.html [L]
That configuration fragment tells Apache to serve up single.html if the requested file does not exist on disk. Basically if I have rendered a version then use that otherwise use a page that will get it from the cloud.
URLs MUST not change
How I plan to support this is intertwined with the previous point but in this day and age of release early and release often software and the dubious quality that comes along with it seems like people have forgotten that URLs are supposed to actually mean something. Add in the alarming rate of link rot and I am fiercely determined that across all of my public-facing web presence I will do what I can to make sure URLs remain accessable as long as the content they provide still exists.
URLs should make sense
Honestly, I'm pretty tired of every website being a pile of query strings and hash fragments. It seems like more modern web applications are starting to get better at this but the dark times of having a 400 character URL filled with '?' and '%' characters has scarred me and I refuse to contribute to that.
Posting should be easy
Yep, this is pretty self explanitory though because Apple remains profoundly anti-developer that means that I'm going to have to have posting happen through a Web interface only. While it would have been tremendiously simpler for me to write a quick and dirty Swift application for my phone than to implement Microsoft's authentication mechanisms in pure JavaScript. I'd rather do that then put up with Apple's draconian restrictions on what I can do with my own damn device.
There shall NOT be any public API endpoints listening on my server.
I mean, this is really the whole point of using serverless computing for this, right?
The Function side of the house
It turns out that after fighting with getting the tooling setup (most of the
difficulty comes from the fact that I hate Visual Studio Code. If you like
VS Code, getting started with Functions is
super easy),
the rest of the process is very easy. I ended up with four small functions
to support this little project.
I talk about (and include the current version as of writing) them below, but the most up to date copies are in my git repository.
Get
As you might expect the most important aspect is to get the information stored in the various datastores. There isn't much to talk about, this is what Azure calls a HTTP triggered function, so it responds to an incoming HTTP request from the world and emits some sort of response. It essentially provides a dead stupid interface to a pair of Tables in Azure.
import json
import logging
import os
import azure.functions as func
from ..lib import ThoughtService
I have the actual Table Storage stuff in lib/
since it is used in a few
places. You can see it here
if you are interested.
def main(req: func.HttpRequest, context: func.Context) -> func.HttpResponse:
try:
thoughtService = ThoughtService()
except Exception as e:
logging.exception(e)
return func.HttpResponse(status_code=500)
if req.params.get('id', None):
rId = req.params.get('id')
try:
return func.HttpResponse(
json.dumps(thoughtService.byId(rId))
)
except Exception as e:
logging.exception(e)
return func.HttpResponse(status_code=404)
# count may be specified either alone or with since.
nCnt = req.params.get('count', None)
nSince = req.params.get('since', None)
if nCnt is not None and nSince is None:
try:
return func.HttpResponse(
json.dumps(thoughtService.count(nCnt))
)
except Exception as e:
logging.exception(e)
return func.HttpResponse(status_code=500)
elif nSince is not None:
try:
return func.HttpResponse(
json.dumps(thoughtService.since(nSince, nCnt))
)
except Exception as e:
logging.exception(e)
return func.HttpResponse(status_code=500)
# This is the default case.
try:
return func.HttpResponse(json.dumps(thoughtService.all()))
except Exception as e:
logging.exception(e)
return func.HttpResponse(status_code=500)
This part is pretty simple, I am just parsing the query string and returning Thoughts based on the filters that the client asked for.
RSS Function
I really like RSS feeds. They are honestly my preferred way to look at things on the Internet. I have a bunch of software that exists to turn other websites into RSS feeds for my own consumption. My blog has a RSS feed (that I encourage you to use if you do not) and so of course my Thoughts shall as well.
This is 99% of the same code as the get function but instead of emitting JSON
it emits RSS. I probably could have implemented it in the same function and
just added a query parameter to distinguish between JSON and XML but honestly
I like the clean url of /api/rss
vs /api/get
and I can't count on RSS
readers sending something like Accept: application/xml+rss
in the request.
import datetime
import logging
import os
import azure.functions as func
import PyRSS2Gen
It is really nice that the Azure remote build supports requirements.txt. Simply adding PyRSS2Gen to it allowed this to build and run even though PyRSS2Gen is not installed by default.
def main(req: func.HttpRequest, context: func.Context) -> func.HttpResponse:
baseUrl = 'https://www.going-flying.com'
blobUrl = 'https://thoughtsassets.blob.core.windows.net/thumbs'
try:
thoughtService = ThoughtService()
except Exception as e:
logging.exception(e)
return func.HttpResponse(status_code=500)
# We will return 25 thoughts for the RSS feed, that seems
# reasonable to start with.
try:
rss = PyRSS2Gen.RSS2(
title='Thoughts from Matthew Ernisse',
link='https://www.going-flying.com/thoughts/',
description='Brief musings and found items from Matthew Ernisse',
lastBuildDate=datetime.datetime.utcnow()
)
for thought in thoughtService.count(25):
title = 'A brief thought from mernisse'
pubDate = datetime.datetime.fromtimestamp(
int(thought['id'])
)
link = f'{baseUrl}/thoughts/{thought["id"]}.html'
guid = PyRSS2Gen.Guid(link)
html = f'<div><span>{thought["message"]}</span>'
if thought.get('attachment'):
html += f'<img src="{blobUrl}/{thought["attachment"]}">'
html += f'<img src="{baseUrl}/blog/images/p.gif">'
html += '</div>'
rss.items.append(PyRSS2Gen.RSSItem(
title=title,
pubDate=pubDate,
description=html,
link=link,
guid=guid
))
return func.HttpResponse(
rss.to_xml(),
mimetype='application/rss+xml',
charset='utf-8'
)
except Exception as e:
logging.exception(e)
return func.HttpResponse(status_code=500)
Image processing pipeline
The deployment scripts on my website do several nice things that I wanted to replicate here. Firstly they strip EXIF tags from images. This is even more important in the case of Thoughts since I plan on posting from my mobile devices frequently and any images taken on them will most certainly have GPS data in them that I don't want to leak. Secondly they create intermediate sizes of images so my blog posts can use srcset to deliver the most bandwidth efficient size for your screen. While I don't think I need to go quite that far I do believe that creating a thumbnail version of the images will go a long way to keeping the transfer down (and my Azure bill low). Thankfully Azure Functions supports a blob trigger which allows a function to run when a blob is created or changed. I combined this with Table Storage (for long term state) and Queue Storage (to trigger the follow on job) to implement a rudimentary image processing pipeline.
The first job, called stripper
is triggered by the blob trigger and
removes the EXIF data and stores the attachment name in a table. This lets
it know not to re-trigger on the overwritten, now EXIF-less object and it
provides the size data over to the ThoughtService
class which uses it for
the get
function.
import io
import logging
import os
import sys
import azure.functions as func
from PIL import Image
from ..lib import AttachmentService
AttachmentService
is also over in lib.
def main(
inBlob: func.InputStream,
outBlob: func.Out[bytes],
msg: func.Out[str]):
try:
logging.info(f'Loading {inBlob.name}')
inImage = Image.open(inBlob)
except Exception as e:
logging.error(f'Exception loading {inBlob.name}')
logging.exception(e)
return
mimeType = inImage.get_format_mimetype()
try:
attachSvc = AttachmentService()
if attachSvc.is_processed(inBlob.name):
logging.info(f'{inBlob.name} already processed.')
return
except Exception as e:
logging.error('Failed to get Table Service')
logging.exception(e)
return
# Strip the path so thumbnailer's function.json can use the
# queueTrigger variable for the output binding.
fn = os.path.basename(inBlob.name)
if mimeType != 'image/jpeg':
logging.info(f'{inBlob.name} not a JPEG.')
attachSvc.mark_processed(
inBlob.name,
inImage.size[1],
inImage.size[0]
)
msg.set(fn)
return
# Re-Save the input so that we strip the EXIF information. Log it
# in the attachmentTable so we do not loop on the blob change notice.
inBytes = io.BytesIO()
inImage.save(inBytes, format='JPEG')
outBlob.set(inBytes.getvalue())
logging.info(f'Stripped EXIF tags from {inBlob.name}.')
attachSvc.mark_processed(
inBlob.name,
inImage.size[1],
inImage.size[0]
)
msg.set(fn)
Once the data has been processed a message gets dropped in Queue Storage
which thumbnailer
uses as a trigger. This lets me use
Pillow to create the thumbnailed version without
any intervention on the client side. Sweet.
import io
import logging
import os
import sys
import azure.functions as func
from PIL import Image
# Maximum Width in Pixels for the Thumbnail Version.
MAXWIDTH = 180
def main(
msg: func.QueueMessage,
inBlob: func.InputStream,
thumbBlob: func.Out[bytes]):
''' To execute the thumbnailer function, a message must be inserted
into the 'image-process-pipeline' Queue containing the name of the
blob to process. The blob must be in the 'assets' Container.
'''
logging.info(f'thumbnailer triggered by {msg}')
try:
logging.info(f'Loading {inBlob.name}')
inImage = Image.open(inBlob)
except Exception as e:
logging.error(f'Exception loading {inBlob.name}')
logging.exception(e)
return
outType = 'JPEG'
if inImage.get_format_mimetype() == 'image/png':
outType = 'PNG'
height = inImage.size[1]
width = inImage.size[0]
if width <= MAXWIDTH:
logging.info(f'Input is {width}x{height}, no resize.')
thumbBlob.set(inBlob)
return
h = MAXWIDTH * height / width
inImage.thumbnail((MAXWIDTH, h), Image.LANCZOS)
thumbBytes = io.BytesIO()
inImage.save(thumbBytes, format=outType)
thumbBlob.set(thumbBytes.getvalue())
logging.info(f'Resized {inBlob.name} to {MAXWIDTH}x{h}')
The blob input simply gets a handle to the blob in question which lets
me mutate it and then write out to another container. Because I don't
have easy control over the name attribute of the output binding (it has to
be specified in the function's JSON description file statically) I simply
save to a different container. I'd have rather had file.jpg
and
file-thumb.jpg
or the like but instead I have assets/file.jpg
and
thumbnails/file.jpg
.
Conclusion
So that's it. Around 400 lines of Python (including the duplicates but
excluding the license headers) and I've basically got the plumbing done.
What I didn't really expect was the maze of twisty passages that getting
the client side done was going to be. In my curmudgeonly fashion I had to
write all the JavaScript myself, eschewing pointedly any libraries or
frameworks that might help me out. First principals all the way. The
upside is all the work recently done with
Web Components
actually made a lot of the work bearable. I cringe every time I find myself
with 30 or 40 lines worth of document.createElement('div');
in a script.
With templates and custom elements the display of all of my thoughts boils down
to a list of <goingflying-thought-small>
elements that happen to know what to
do and how to look.
All told there is around 1500 lines of JavaScript encompassing the 'client' side of this. I wrote a bare bones Azure Blob and Table library to allow me to post directly to my storage account without needing to write a set of functions. That is about 500 lines. The rest is to manage:
- Posting interface
- Web Component behaviors
- Asynchronous fetch and load
- Styling an embeded Thought
Funny, I think I learned more about modern web development practices than
I did about serverless computing during this whole exersize. Though honestly
that is probably how it should be. While obviously serverless is an awful
name, and my code is absolutely running on a Linux VM somewhere in Azure, I
have had to spend exactly zero time thinking about that. My entire
interaction after setting up the Function App and storage services in the
Azure portal and then installing the development tools has been to run
func azure functionapp publish vociferate
(until I put that in a Makefile).
I spent the vast majority of my time iterating on the pile of JavaScript,
CSS and HTML to make it look pretty.
If I had done this in a more traditional way it would have probably been the same amount of Python in a Flask app and a little less JavaScript (the storage API authentication in Azure is a bit... complex) but it would also be one more app on my side to take care of. One more Puppet manifest to write, one more... thing.
Am I going to go and start using Functions for all the things? No. Do I think serverless is going to fundamentally change application development and delivery? 🤣 No. But I am happy to have it as a tool in the bag and I can see innumerable use cases for a tool like this, especially integrating collections of services together.
Stay safe out there folks. Stay inside and look at the cloud. 🌩