![]() |
DockerCon is going on today and tomorrow – and no better time to put out some new great tools for people using Containers. This will be a 3 part post – and timed intentionally with DockerCon. Why? Well – containers are indeed the buzz du jour, and while the ecosystem is still very vibrant (far from settled, with lots of battles being fought), DockerCon has become a critical time for the ecosystem to get together and collaborate. Containers and persistence – what’s the scoop? In yesterday’s “Post 1: Object storage” I said “A lot of strange new ideas for people used to NAS, SAN, and VVOLS :-)” Today, I’ll be talking about the OTHER part of the ecosystem of persistence for 12-factor/”Platform 3” apps that operate in the world of containers: the storage that supports transactional use cases, commonly under forms of data fabrics. In this, NAS, SAN, VVOLs, heck even the “plugins” and “VAAI” history of vSphere is useful. This 2nd post will focus on Docker Volumes – and what we’re doing via the plugin model (via Flocker) and via the native Docker Volumes approach (via Project Rex-Ray, Project Dogged).
Trust me – an interesting read, and very germane for those wondering what the non-Object/HDFS world of persistence will look like in the container era. Read on!
For readers who who are not familiar with the world of these new apps, some foundation is needed. Unlike traditional apps that are simply virtualized, the new apps are instead designed to be abstracted, sever every possible dependency (even on things like “availability”) and have scale-out application models. For those that live in infrastructure-land, but are intellectually curious about application-land, there’s good reading here. In particular, look at the section on “Backing Services” – this is really important:
In addition, the section on config (which highlights the only thing that there needs to be a strict separation of code and config, so use of env variables, or more commonly some form of YAML – aka “yet another markup language” config files are used to be inputs to the code). … This all leads to a container running code which is totally ephemeral. Where is the world of persistence in this picture? Is it needed? Traditional use of containers were very different than VMs in this regard, because their answer to the question above was, for the most part “no”. Containers were compute-only for all intents and purpose. They had limited networking, and VERY limited storage subsystems. They would have a small filesystem accessible to the container that used a union filesystem (which can be snapshotted). Using persistence that is external to the container, was, well – somewhat verboten, because it would potentially (done badly) create a binding that linked the code to the external service more tightly. Object stores are natural, because they are an “external backing service”, accessed via a URL. Bindings were done over a listening port – so again, beautifully isolated. But – people started to push containers into places they hadn’t been before, placing new demands on the networking and storage subsystems. An example in networking at DockerCon this year was during the hackathon the effort to add native support to the Docker Engine to have the ability to work with VLANs. Our very own EMC{code} guru Clint Kitson was there with a group and did a PR to add Native VLAN support to Docker which you can read up here. What I loved about the hackathon was that in 24 hrs, they worked on it, and did a ton of work to ultimately contributing to Docker itself, including proving it out, and building a Docker branch here. What about examples in transactional storage? People started to find all sorts of use cases where accessing some form of external persistence layer that was transactional (low latency, high IOps, tends towards smaller IO patterns) would be useful. Very often this would be shared across containers, persist across container instantiation. Also often, it would be used to power some element of a data fabric (not uncommonly NoSQL databases).
So why do this? Well, as you push containers into a place where they ARE running the database, and if the database needs more than what local storage can give, or use replicas, or any set of data services (dedupe/compression/encryption)… Again – there’s a real argument that IF you need these, should you use a container or a kernel mode VM? There’s pros and cons, but that’s outside of the scope for this post. There are a couple thoughts on how to do something like this:
Last week, we announced our work on #3 – which was to work with Flocker – which you should check out here. Now, it’s important to understand, this is REALLY moving fast, and is new. For crying out loud, plugins to the Docker Engine itself are news this week in the Docker 1.7 experimental release as part of the core Docker Engine. This is really cool. It means you can use Flocker with AWS EBS if your containers are running on EC2, EMC ScaleIO and XtremIO if it’s not. The code is on EMC{code} here, and you can get to the Flocker ScaleIO github repo here, and the XtremIO github repo here. Of course, for EMC ScaleIO and XtremIO customers this is gratis, and a bonus of having the best transactional SDS and AFA on the market :-) For others, there’s also a cinder plugin that may work more generally. What I love about this is that it’s an example of real customer involvement in innovation – in this case Swisscom, a cool shop to be sure – and here’s the shared story.
Want a better way to understand – here’s Flocker in action, first with ScaleIO and then with XtremIO. What about #2 – the “Native” route? Well, EMC has been contributing there also. It’s not clear whether people will tend to prefer the “plugin” route, or the “native route” – but experience on Linux, Cloud Foundry, Hadoop and other projects show that core contribution is an important part of being part of the eventual answer. To some degree, the “Plugin” approach and the “Native” can be competitive – since in essence they are two different ways to get to the same point (in this case, presenting and consuming transactional persistent storage within containers, and having the configuration persist as container restarts occur, and doing it in a way that doesn’t break the “12-factor” rules of proper abstraction and configuration management). Now, the Docker Engine has had a volume capability for a while – which is important because it enables persistence that bypasses the Union Filesystem, and persists even if the docker container is nuked. What there isn’t is an equivalent ideal to the SPBM/VVOL idea in vSphere – there’s nothing that provides behavior and management at the docker host level. This is the purpose of Project Rex-Ray. You can read up on it here, and the github repo is here – and like all things EMC that are open source “glue code”, you can get it at EMC{code}.
As always, a demo is worth 1000 words :-) Project Rex-Ray Cool, eh? Here are the takeaways for me at least:
So – what do YOU think of this? Does transactional persistence need to strengthen in the container ecosystem? Which way – Native/Plugin? |
