Hybrid vs. Multi Cloud: which one's right for you?

Hybrid vs. Multi Cloud: which one's right for you?

Unlike a multi-cloud model, in which different clouds are used for different tasks, the components of a hybrid cloud typically work together. As a result, data and processes tend to intermingle and intersect in a hybrid environment, while in a multi-cloud situation, usage typically remains in its “own” cloud’s silo. Eric Dynowski, CTO of ServerCentral Turing Group, details the future of hybrid and multi-cloud environments, highlighting how organizations are utilizing managed services to better their operations.

Episode Transcript:

INTRO: [00:00] Welcome to the Tech Deep Dive podcast, where we let our inner nerd come out and have fun getting into the weeds on all things tech. At Clarksys, we believe tech should make your life better, searching Google is a waste of time, and the right vendor is often one you haven’t heard of before. 

Max: [00.18] Hi I’m Max Clark and I’m talking with Eric Dynowski, who is the CTO at ServerCentral Turing Group. Eric, thanks for joining.

Eric: [00.24] No problem, glad to be here.

Max: [00.26] So Eric, obviously ServerCentral Turing Group, you came from Turing Group and are now the CTO for the combined entity. Give me a little background, let’s start with a little — What was ServerCentral, what was Turing Group, and what are you guys now as one company?

Eric: [00.42] Sure, sure. I’ll start with Turing Group, since it’s kind of my major background and partly why I’m here partly with ServerCentral… So, back in 2013, I was working in the financial services industry, working for a hedge fund, managing kind of the whole team that ran their global infrastructure, starting early and we built everything out for them and I was starting to get a little itchy, a little bit bored, and I decided to really take some risks and decide to start Turing Group. So, in 2013 my business partner and I started the company with a focus on providing technology solutions based on – at the time specifically – AWS, but very solutions-oriented, we believed that there was still a lot of companies that were moving to the cloud, or starting to move to the cloud, but really doing it in the wrong way. They weren’t embracing infrastructure as code, they weren’t embracing automation, they weren’t embracing APIs… It was still like, let’s boot a server, let’s install our software, and run it. It was still very datacenter centric, and so, we positioned ourselves as a company of experts that could help you leverage platforms like AWS as they were intended to. And so we challenged this idea that I think a lot of folks had, about you moving something into Amazon, it’ll automatically be redundant, it’ll automatically be backed up, and it’ll just autoscale, because that’s the way things work in Amazon, right? And the reality is none of those things are true, and you have to design for those things and incorporate them into your strategy for using those services. And so that’s what we did. When we built that company up, along the way, I had known, you know, I’ve known Jordan actually since he started ServerCentral in the late nineties, 1999, I think 2000, and in the various roles I had been in previously, I purchased colocation space from ServerCentral. And so we’d been in contact and chatting, and when I started a company, Jordan was kind of looped in on… And I should probably introduce Jordan so everybody knows, Jordan’s one of the co-founders of ServerCentral. You know, we’d been in contact and talking about what we’re all doing, and whatnot… And along the way, I had customers approaching me saying, “We love how you’re helping us in AWS, this is great, but we have other needs that span just beyond AWS, we need some help either with network backhaul, or colocation, or bare metal solutions, or things of that nature,” and as Turing Group, we didn’t have answers for those, and so my answer was, “Hey Jordan, I’ve got someone I think you can work with,” and you know, that’s probably a good segue into ServerCentral’s history. So, Jordan started ServerCentral in 2000, you know, doing primarily hosting, and then as his customer base grew and their needs grew, he evolved the company with Daniel Brosk, into a data center and colocation company initially. But in the same way that our customers were looking to – for lack of a better term – outsource things to AWS, their customers were looking to outsource more in-house functions, in terms of managing products and services, and so ServerCentral added on managed service capabilities, so things like managed backup, managed storage, managed firewalls, managed private clusters, managed public cloud customers – all based on VMware at the time, and things of that nature. And so, you know, I was kind of funneling some customers his way, and they were running into the opposite problem: they had customers that had signed agreements and had maybe, you know, fifty cabinets, two hundred kilowatts of power, a bunch of bandwidth and all kinds of great stuff, but they were really starting to embark on their cloud journeys or cloud initiatives, and their CIOs and CTOs were saying, “Hey, well… How does Amazon fit in? We have an initiative.” ServerCentral didn’t have an answer, didn’t have a way to continue to work with some of those customers in a way that was meaningful. So, over lunch one day we were chatting and we’re like, “I think we have the same problem on opposite ends,” and at one level we might be viewed as competitors, but if we kind of bring this together, we can offer a much more interesting solution to our customers, where we’re not working against each other, but we’re working with each other to build solutions. And so, in late 2018 we decided to bring all that together, so we became ServerCentral Turing Group, or SCTG for short, so that we could work with some of those larger, more sophisticated customers that had use cases that necessitated both solutions on top of AWS, but also solutions within the datacenter and on top of more sort of classic platforms. But also ServerCentral liked that we had sort of this infrastructure as code, automation-first approach to everything that we did. We believed in everything being completely repeatable, you never did anything manually, servers and instances should be ephemeral, you shouldn’t treat them like your kids and nurture them. You should bring them up for a reason, use it and then dispose of it, and be able to recreate it on the fly. So, we started incorporating a lot of that thinking also into a broader approach to how we did everything as a company. So, yeah – a lot there.

Max: [05.55] A lot there actually, I’m laughing about your comment about it’s in the cloud, it’s obviously redundant and more of an outage, and I’m just thinking, flashing back to like every US East outage in the past like, four years, it’s taken down massive swathes of the internet. I mean, redundancy in cloud’s complicated too, right? You’re not just talking about, you know, having multiple instances running, now you start talking about AZ redundancy, region redundancy, data replication, how does your application route, what delivers where, where does it live… I mean, that’s a whole can of worms as well that you start opening up when you start looking at these sorts of things. 

Eric: [06.30] Yeah, for sure. You know, what the public cloud provides is a set of building blocks and solutions that are distributed across multiple zones and regions, but you really have to factor that in to your overall design and thinking to actually be able to take advantage of it, and you have to be pretty thoughtful about it. It’s really easy to think that, yes, I’m running two instances across two regions, I’m fully and highly available, and you missed the fact that maybe you’re using one service somewhere under the hood that is a common single point of failure. In some ways it actually creates even more complexity because it gives you that false sense of security, right? Yeah, no, we store all of our data in S3 and we’re connecting to S3 just fine and that S3 has, you know, seventeen lines of availability, or whatever it is. And then US East goes down and you can’t even look at Amazon’s status page because they forgot about that single point of failure. Yeah, so it takes from strategy and really complicated thinking into designing your applications correctly. And the other piece of that too is that a lot of enterprises aren’t building their own applications, they’re buying them, and they have limited options in terms of, you know, redesigning those apps to take full advantage of what the public cloud platforms can do, if you’re in a position to control the architecture.

Max: [07.59] One of my favorite examples — I mean, you know, redundancy in cloud is… It requires a lot of diligence, it requires a lot of energy and continual diligence towards this, and Netflix was very famous for releasing their platform through this Chaos Monkey years ago, which it literally, randomly turns things off inside of their AWS environment. I mean, you want to talk about really taking this to an extreme of being resilient and redundant, we’re just going to randomly log in and select something running and just go, “boom,” and disable it, and it does a system recovery. I would imagine you’re probably not advocating for that for a lot of your customers at this point but that is a pretty… That’s a pretty amazing goal to have, of saying, at any moment, anything can be disabled randomly and that application is going to work.

Eric: [08.43] Yeah, you know, I won’t say that we don’t advocate for it, I would say that we’re thoughtful about when we advocate for it, in the sense of where we’re at in the lifecycle of the solution that we’re building. It doesn’t really matter actually whether it’s in AWS or something that we’re building in our data centers or on top of our infrastructure, you know? If it’s in early phases of testing and we really want to test a lot of the functions, we’ll bring in some folks that weren’t involved in the design, that won’t make the saem assumptions, and play Chaos Monkey, and then it also depends on what our clients are asking for. In some cases, we have had clients want us to test production systems like that, and we’ll do our best in those situations. But the other interesting thing about it is… Creation, I think, of public cloud and the adoption of infrastructure as code, and automation at scale has introduced a new point of failure, which didn’t exist to this same scale, historically, which is the engineer. If you think back to the US East outage of S3 that we were just talking about, that wasn’t a technical failure. There was no design flaw in the architecture of S3, such that, you know, it caused that outage. The outage was caused by an engineer making a mistake on one of the commands that they were running for performing maintenance, and what automation and infrastructure as code and large scale control of a massive fleet of infrastructure does, is it puts a lot of power to do something in the hands of one person. I cannot overemphasize like, the need now for good operational practices, process and control, everywhere from change control to peer review, because you know, you can accidentally take out entire environments with a single command or a single typo in a configuration file, whereas before, maybe you had to log in to a fleet of servers or something like that. But, now, you know, we have one customer we managed some IoT devices for, and there’s thousands and thousands of them out in the field, and one mistyped command and we can brick them all. Yeah, so I think that, you know, the advent of this approach to infrastructure has created new challenges and new problems.

Max: [11.08] So I’m going to use a bingo word, which is cloud transformation or digital transformation or customer journey, right? They got lumped in together at some point, and at one point Amazon and public cloud were new, and now we’ve got a lot of competition at different sizes, between Amazon, Google and Azure, and… A company is looking at these things, okay I’m maintaining data centers, is this efficient? Or, I’m in the cloud, is this efficient? Or I’ve decided to move to the cloud — and it’s still surprising how wide the scale of where people actually still are or those in this transformation are. You know, let’s say I came to you… And I was still running — let’s say I still had some on-premise equipment, maybe I had some IaaS with a VMware cloud somewhere, and we walked into your — well, we wouldn’t walk into your office anymore, but we talked to you and said, “Hey, we want to move this into public cloud,” you know, walk me through what that experience is like, you know, with SCTG and how you take an organization through that process? You talked about it earlier, it’s not just like, “Okay, let’s just replicate everything, and just turn it on over here,” I mean, that’s not a good end state. 

Eric: [12.19] Right, right. It’s interesting that you ask that, because, you know, we’ve evolved our methodology and our process over the years. When we started Turing Group, I was a technologist, I really enjoyed building things and we would work with our customers and almost immediately jump to solutioning. Okay, this is what you’re trying to do? Perfect, we need an API gateway over here, and you know, we’ll use EBS for that, and we can SQS over here, and within ten minutes of starting our engagement with our customers, we’re like whiteboarding. And that was a lot of fun, it was great. We built some pretty cool stuff, but what we learned pretty quickly is, as we moved into more complicated organizations, that were, you know, decentralized, we walked into organizations maybe that had attempted migrations and failed, that had twenty two Amazon accounts that they just discovered accidentally the other day they didn’t know they had – things like that. We realize that we couldn’t jump to solutioning right away. We had to really engage with the customer, and ask them what problem they were trying to solve, and you’d be surprised, many times, there was not an answer at the tip of their tongue. And, it’s like… Well, we’re trying to do this, and we’d ask one more question and they’re like… No, well our CTO said we had to do it, or something like that, right? And so what we learned over the years was that we really had to take some time upfront, to understand where our customers were at with their business, what the drivers were, how they arrived with the technology stack that they have today, and really what problem they were ultimately trying to solve. The answer in some cases is, well, we’re trying to save cost. Okay, fine, that might be possible but that’s going to have an influence on the design. No, we’re actually trying to solve a capacity problem, you know? We need five hundred servers, but we only need them for a month and we don’t want to buy them. Or, we’re trying to avoid having to build all this stuff in house, we really want to just focus on our core capabilities and not worry about managing rented servers or something like that. So, we generally start most of our conversations now with the: why, what are you trying to achieve? And once we get to a core understanding of that, that usually shifts onto an assessment of some sort, meaning, you know, let’s really get a good sense of what you have, and I know we’re talking a very public cloud focus at the moment, but in that sense like, a lot of times customers would try to do a one to one mapping, right? I’ve got twenty seven servers on-prem, that’s twenty seven instances in AWS or Azure. That’s not the case, right? Or it’s that, well no I have twenty seven servers and each one has eight cores and sixteen gig of memory, if you do any sort of rudimentary analysis, you’ll probably find they’re highly underutilized, and so doing a one to one mapping that way, you’re going to spend way more money than you should. And then also if they’re trying to solve for redundancy or scale problems, again, we’re running into that challenge of them having assumptions of what the public cloud offers. So, I think it’s a very consultative approach initially, and we kind of help our customers in the sense of just walking through them — walking them through the entire process, you know; setting expectations about what kind of cost they could expect, setting expectations of what they can expect from a DR and data recovery perspective, setting expectations in terms of governance… We’ve got so many customers that just want to let people lose in the council, and then they’re shocked when they get a bill for two hundred grand, you know?

Max: [16.01] I’ve… Everybody I think has that horror story at this point. It just comes down to how significant it was. I have one customer who, you know, they didn’t realize it and they spun something up to test it and made a configuration change. It was sixty-five thousand dollars after two and a half weeks, and the capacity of… I mean, it’s awesome that you can spin up that kind of resource instantaneously and have that capacity, but then you get into the whole governance, I mean you mentioned that beforehand – governance becomes very important as well, who is doing what, why, when, how, where… You know? What’s the fiscal impact for it? 

Eric: [16.36] Yeah, yeah. I mean, it goes on and on, like we had a financial services firm approach us saying they wanted to move their — they had Docker-ized all their applications and they wanted to move everything to either Azure or AWS but keep their database on-prem. And we did some analysis of the traffic going in and out of their database, which they hadn’t really thought about, they were just like, “Well, we’ve got two ten-gig connections and low latency, it should be fine,” until you factor in the fact that Amazon’s going to charge you two and a half cents a gig.

Max: [17.14] Yeah, on that drag connect, yeah.

Eric: [17.16] Yeah, exactly! So, once we did the math, it was like, well… You’re, you know, compute’s going to cost you two, three grand a month, and your bandwidth is going to cost you eighty five. Are you sure this is still where you want to go? So again, another reason to do kind of that thoughtful planning and analysis approach before just jumping in. So, in short Max, we’d ask you why are you trying to do this? 

Max: [17.43] You know, the bandwidth is an interesting point. It’s usually overlooked in this conversation, until you start getting the bills and you start digging into — I mean, Amazon bills are not easy to read. There’s the entire… There’s an entire industry just to help people read and understand their Amazon bills, but you know, egress costs are significant and they build very fast, and this also has an implication now as we see — I feel like there was a rush to the cloud, and this idea of, “Okay, the cloud will be cheaper, the cloud will be easier, the cloud will be this or the next thing.” And then now that we have organizations that started in the cloud, you know, a decade ago, and are looking at it a decade later and realizing what they’re actually spending, where they’re spending it on, do they have a predictable workload or a cycle or things that are actually reoccurring, and now we’re having conversations about, should it be in the cloud, or should it come back off the cloud? And that’s complicated as well, but there are advantages to having… Well, okay, we’ll use the marketing term, hybrid environments, when you have some in the cloud and some… So, how do you guys navigate that and what’s — Is this something you’re driving to customers? Like, “Hey, we’ve noticed you guys are running a static workload effectively, and you should really think about doing something different.” Are they driving the conversation or does it start with cost, “Hey, we’re spending a ridiculous amount of money, what do we do here?” I mean, what’s…

Eric: [19.07] It goes both ways. So… I think we probably have to first divide that into two different categories: why someone is approaching us, maybe a prospect, and what that conversation looks like versus existing customers that we have today that might be on AWS and looking to go on-prem or vice versa, or looking to go hybrid. So, for existing customers, we try to engage and meet with them, you know, at kind of an executive and strategic level, on a quarterly basis, just to have some conversations with how things are going. We look at their utilization both in the cloud and on prem, but if we start seeing things that look kind of wonky or kind of interesting or if we start seeing bills where we know they don’t need to be spending that kind of money on something… You know, we have one social media customer that we’ve been working with that has been using S3 to store petabytes of information, but then they use S3 as their origin source, and we are looking at these bills going like, “This is insane!” Like, you’re paying so much in egress fees, that like there’s got to be a better way to do this, right? And so, we’re able to have that conversation with them and say, “Well, first thing, at a minimum, if you put cloud in front of it you can reduce your costs by fifty percent, because transfer from cloud front to S3 is free, and you’re only paying cloud front fees.” So, we’re able to kind of come to them and sort of give them even just that little tidbit of information. Then we also say, “Well, let’s do that immediately, because that’s almost not even a technical – there’s almost no technical impact, and you can start seeing smaller bills,” but you know, we run our own on-prem, petabyte scale object storage solution, and our fees are nowhere near Amazon’s, and we can give you the same price point. So, unless you’re doing something that’s Amazon specific with S3, like triggering lambda functions or something like that, you know, we have a solution that’s on-prem that will perform to the same level if not better, if you’re in our region, and we can save you significantly — a significant amount of money from what you’re spending on AWS. On the flip side, you know, if we see customers are constantly pinging us saying, “Please add another terabyte of storage to our managed cluster,” or, “I need a new server, I need a new server, I need a new server,” like if that’s happening all the time, and then they’re having conversations with us to decon stuff, you know, before their contract’s up, and we’re just seeing like… God, they don’t have that predictable workload, we should probably dig in and understand what’s driving that variability in their infrastructure needs and consumption. In those cases, we might recommend saying — you know, you’re better off working in an environment where you can throw instances away on a daily basis, you know? We’ll have those recommendations. Or another example might be – that’s not really related to performance – was the company that we’re working with that has all the IoT products out there… In one case, we could have gone on-prem, and developed an IoT solution from scratch, or we could have put them in AWS or Azure and used the IoT service offerings that both of them have. In one case, the time to market would probably be eighteen months, in the other case time to market would be three months. That was a big thing for them, and so being able to jump on a platform that exists that we know to work, and just get going right away, was well worth any, you know, increase in costs that they might see because we can do that.

Max: [22.45] And that’s a really good, important point, right? What you’re talking about really is actually understanding the trade offs and spending money for velocity. 

Eric: [22.53] That’s right.

Max: [22.54] And you know, businesses look at this from a standpoint of, “How long does it take us to get into the market, and what does it actually cost us?” Are you allocating resources in equipment, in hosting costs, in people, in whatever it is, right? Ultimately, you’re investing in velocity, and I think that’s a really good example of where the cloud is great, because you can get into managed services and you can experiment and you don’t have to commit to something, and you can increase your velocity, but it requires you to take the next step which is at some point come back and look at it and reevaluate decisions and say, “Is this the right thing for us still?” And I see that a lot with… I mean, I don’t know how many services have offhand, it’s too many to count… I mean, well somebody counts it —

Eric: [23.36] It’s well over two hundred.

Max: [23.37] Yeah, it’s ridiculous, and every week there’s more, right?

Eric: [23.40] I know, it’s terrible!

Max: [23.41] The problem that comes out of it now is you get into situations like Elastic Cloud — ElastiCache, or Elasticsearch… You know, there’s a half a dozen, dozen different ways of running Elasticsearch, not counting the products that are API compatible with Elasticsearch that can run directly against S3, and like each one of these has really significant price differentials for you, in what these things cost. I won’t pick on Elasticsearch but it’s true for everything now in Amazon, right? Run it this way, it costs this much; run it this way, it costs this much, you run it this way, it costs that much… And it all gives you the same thing but at different scales, so… That makes it interesting, right? You have to constantly be looking at it and talking with customers and saying, “What are you doing? Why are you doing it? Let’s do this differently, you’re in the cloud, cloud has changed, great, something new has come out and now it’s time for you to change.” 

Eric: [24.27] Yeah, I mean I’ll tell you the one thing that we’re finding that’s — the one common denominator is companies are less and less interested in managing… I don’t want to say infrastructure, because I feel like that’s one layer below what I want to say, but they’re less interested in managing core infrastructure related application services, meaning… They want to use Elasticsearch, they don’t want to install it, they don’t want to set it up, they don’t want to figure out how to shard it, they don’t want to figure out how to tune it, they just want to use it. They want to install Redis, they want to install Aerospike, they want to install Kassandra, they want to use all these different kinds of core components that make up an application… Maybe it’s ActiveMQ or whatever it might be. And so, where we kind of fit in is that we understand where those things fit in in an application stack, what it takes to run them and operate them, and where we’re different than say AWS or Azure, is that when your use cases start to move into the perimeter to the edges of the normal use cases, and those providers are no longer good for you, then we can give you interesting solutions. So, you know, it’s… You’re only going to take Redis so far in AWS before either you hit a performance limit or you hit budget limits, and everything that AWS does is geared towards that, like… What is the most common use scenario and what are the most common performance characteristics? Once you start to get outside of those boundaries, then those providers no longer are a good fit.

Max: [26.10] And these numbers, I mean the differential here, it’s almost shocking when you see them the first time because it’s so high — I mean… Kinesis, you know? Running Kinesis versus running different versions of managed Kafka, versus running your own and managing your own Kafka instances on top of EC2… We did a project for a customer and I think the differential from Kinesis to them running their own instances was like 5x. It was 5x more expensive for them to run Kinesis than run their own infrastructure. You know, if that bill was ten thousand dollars a month, maybe it’s not worth it for them to devote engineering time, but they were spending a quarter million dollars a month on Kinesis, and all of a sudden that’s… That’s a real, significant number. I mean, you’re talking about a substantial amount of salaries within the organization just in that one decision, about you know, this or that…

Eric: [27.01] Right, and you have to maintain the engineers to run and operate that stuff, and the expertise and again, if you’re in that one particular — if you fall within that norm range, then public cloud’s probably just fine, but if you start stepping outside of that, it doesn’t make sense.

MID-ROLL: [27.19] Hi I’m Max Clark and you’re listening to the Tech Deep Dive podcast. At Clarksys we believe tech should make your life better, searching Google is a waste of time, and the right vendor is often one you haven’t heard of before. With thousands of negotiated contracts, Clarksys has helped hundreds of businesses source and implement the right tech at the right price. If you’re looking for a new vendor and want to have peace of mind knowing you’ve made the right decision, visit us at Clarksys.com to schedule an intro call.

Max: [27.46] So now we’re in a world of hybrid cloud, right? Really, it’s this idea that you’ve got some sort of physical infrastructure somewhere and what does that look like? And then of course multi-cloud, people look at multi-cloud — I have some differing opinions of multi-cloud but I’m kind of curious to hear what your opinion on this is. But when you look at hybrid cloud and physical infrastructure, I mean, going back to your point of companies don’t want to manage it, and they’re used to a consumption model, it becomes like, “Hey, I just want to run containers, I want to run Kubernetes somewhere,” how is ServerCentral addressing that, and what has become your… You know, at some point, you know, there’s customization but at some — at the same token, right, there is a certain… Let’s say a generic approach for most people fitting into eighty percent of the same solution… And how do you approach that?

Eric: [28.30] So first of all, I wish more companies were further along the containerization route, it does help move the conversation forward about multi-cloud. I used to think that multi-cloud was kind of a distraction, I used to think that hybrid cloud solutions were a distraction, that if you wanted to really realize the power of AWS, you have to use AWS to its fullest extent. You have to leverage cloud formation, you have to use their load balancers, you have to use their APIs, you just have to buy into the whole thing, and that’s when you really can take advantage of what they have to offer. And if you fold it in Azure into that mix, what you’ve done is reduced both providers to the least common denominators, and you’re no longer getting the true value out that each of those providers can add, and the special thing that they have. As we are — as we sort of engage with more enterprises, the notion of keeping or staying on a single cloud is not tenable, and that’s for a number of reasons. One is the services that either – that all the different providers provide aren’t identical; they’re not easily replicated across both sides, right? And larger enterprises are concerned about outages, you know, we’ve seen massive outages over at Google, and GCP problems when their routers went down, and we’ve seen significant companies go offline for it, you know? One of our customers, I think broke their streak of like, having – I don’t remember – five or six years without a single outage, and because of Google they went down! So, that’s a legitimate concern – actually, EU West in AWS today had issues, and so people want to distribute their risk. Also, I think from the hybrid piece, we have customers approaching us that have significant, stable workloads – we’re working with a social media company that has a need to probably… Probably close to five hundred servers live, all the time, just to serve their base number of requests coming in at any given time. They might have a celebrity on their platform post something, that celebrity will have… We’ll call it three million followers, and when she posts something, it generates a massive amount of traffic to update the feeds of all those millions of followers, and under normal circumstances, you know, you might go a month then you don’t need the compute, need the infrastructure to support that, until that celebrity posts something and then all of a sudden things go crazy and they only go crazy for… You know, six hours. So, scenarios like that where… You know, I can’t back up an extra five hundred servers in less than a minute, get them racked and ready to go, hand it off to you… Nobody wants to invest that kind of capital and sit on it when they don’t need it. So, things like hybrid solutions where I can spin up five hundred instances in AWS in a matter of minutes, and have your workload augmented and have that capacity available for you. To your point about containerization, none of that stuff is achievable really without containerization, and you have to really also be thinking about your application and designing it in what I think is now being termed a cloud-native way, meaning that it’s very, very independent of underlying infrastructure or proprietary APIs. And so, kind of what’s going on in the world of containerization right now is the only way to achieve that. 

Max: [32.16] And we’re seeing big plays, I mean… So, we have Kubernetes from Google, SUSE has just bought Rancher Labs for several hundred million dollars, I mean there was a feeding frenzy for that one… There’s expectations of — I mean, we’re going to see Snowflake IPO here pretty soon, that’s expected to be a big one, but HashiCorp as well, a lot of people don’t know HashiCorp unless you’re an infrastructure nerd, right? And then all of a sudden you’re like, what’s HashiCorp, and they’re going to have their expectations pretty high as well. I mean, that’s interesting you talk about social media in this elastic capacity, from the sense that… This goes back years, to the dot coms in the late nineties when we had this idea that like, every page had to be dynamically generated, and the reality was, well no every page didn’t have to be dynamically generated, there were smart ways you could use simple caching, even if you were just caching for sixty seconds, it made massive differences to your infrastructure departments. And you go, oh well, it can be five minutes, oh well if it’s five minutes it’s massive differences in your infrastructure, and that’s really true also now with public cloud and hybrid cloud, and private resources of… . You know, the cost differential is so massively different, you know? It’s hard to really quantify that for people, they don’t believe you. Like, you’re going to spend 4x what you’re going to spend… AWS would cost you 4x what it would cost you to have a private environment and — no, that can’t be true – no, it’s really true! Amazon is going to cost you 4x, whatever… I have an environment that’s going to cost you, but what does your private environment need to look like and how do you maintain that, how do you size it, and now people don’t want to maintain data centers anymore, so how do you maintain that infrastructure and who does that sorts of things to do, that’s turned into a different conversation. And now, managed service providers are filling that void again and coming back in, it’s a very interesting dynamic where, you know, Amazon was killing all of the MSPs offering managed services related to datacenters, but now all these data center companies are really back in the front again, because companies have realized like, wait – we have to go back! 

Eric. [34.10] Yeah, I don’t think we’ve seen too many folks moving back unless they’re a really large player or they’re doing a lot of traffic. Those are the things that are really pushing customers back. What we’re seeing is there’s still a lot of mid-market — just the corporate world in general is still very much about the public cloud and not ready to get out, right? Companies that are running just regular servers, installing their commercial off-the-shelf applications, running databases, running analytics workloads, we’ll call them not-internet companies, right?

Max: [34.50] Yeah!

Eric: [34.51] You know, they get a lot of flexibility. What they can do is they can really empower their employees, in the sense of well, now I don’t have to call IT, I don’t have to get budget approval for servers, I don’t have to wait four weeks for them to arrive, I don’t have to wait for people to rack them and stack them, I can just get started on my project. And there’s a lot of value in that, and corporations are still taking advantage of it. So, I think that initiative is still happening. We have customers that are in the datacenter that are actively in the process of migrating to AWS and Azure, and the great part is we’re part of the conversation and we’re helping them. Where we’re seeing that push back, that move back from public cloud back into the datacenter is in the larger internet companies, anyone that’s pushing a lot of media, you know, video – that gets expensive, storage is expensive, and the transit stuff. I mean, that’s part of the reason why we built our own managed object storage solution. We’re hosting petabytes of data on it, because the companies that really need to use object storage to that degree, the public cloud providers just don’t scale in terms of cost.

Max: [35.57] I have a question here for you… Late nineties, if you were an internet company, there was a template from a VC firm, and the template said, in order for you to be a serious company, you had to run Oracle, you had to run Web Logic, you were running a Netscape web server, and you were running all of these things on Sun Servers, and that was the… If you wanted to be a real internet company, you had to have this template. And it wasn’t until there were some companies that started really breaking that mould… You know, we say eToys, eToys had Modprobe with its custom caching layer, and NetApp appliances, and it started releasing information around what their traffic profile looked like and what their relative infrastructure requirement was and oh, by the way, it’s all open source, it doesn’t cost us anything to run this thing, and then it became, oh well, if they can run it, maybe we can run it now as well, and the mould kind of adjusted a little bit and became a little bit more adaptable and other companies could follow suit. I feel like that was the case as well with this conversation of cloud versus hybrid versus multi versus whatever, it was… Well, we have this mandate, we have to go to the cloud, there’s no, we have to be there because it’s where we have to be… And it feels like that’s loosened up a little bit, where it’s acceptable if you’re not on AWS and if you’re on Google or if you’re on Azure, and it’s acceptable if you’re on two at the same time, and it’s acceptable if you still have physical resources somewhere, you know… That has a strategic advantage for you.

Eric: [37.23] Sure… I definitely think it is acceptable. There’s probably areas of that though that are not. I think if I was looking at a company and they’re still running their own email platform, I’d go… Why are you doing that?

Max: [37.38] Unless they’re an email business, right? I mean…

Eric: [37.41] Unless they’re an email business, that’s true. But if they’re, you know, a few hundred people company and you know, call it a hundred million dollars of revenue, I’d be like, why are you running exchange servers? 

Max: [37.53] As somebody who started out as an exchange admin with an exchange cert, I firmly support you on that, nobody should be running their mail servers at all. 

Eric: [38.02] I think there’s still some areas within the corporate IT stack that I would look at and go, like why are you doing that, there’s way better solutions to it, and maybe it’s not necessarily public cloud but maybe a SaaS provider of some sort. I think also I would probably look at a stack and not necessarily think about it — you know, if I was a VC and I’m coming in to do my technology due diligence, I wouldn’t necessarily look at their solution in terms of… Is it on Azure or is it in AWS, or on-prem… I would probably be more focused on the operations aspects of it, and what choices did you make in terms of deployment, how do you do you deployments, what open source projects have you picked, what is at the core, how is it architected, is it containerized or not, how manageable is it, how scalable is it, and how much of a you know, versus a Frankenstien solution, is probably where I would start digging. How supportable is it, how manageable is it and how scalable is it? Not necessarily what platform it is on.

Max: [39.12] The stats on Amazon — sorry, on Microsoft Azure, it’s like fifty percent, sixty percent of their computer is running Linux and not a Microsoft workload. That’s a pretty amazing stat, who would have thought?

Eric: [39.25] I’m a… I won’t lie, I went from, you know, Microsoft across the board, to I’m a huge fan of Satya and everything that he’s doing – he’s a very smart man.

Max: [39.35] One hundred percent. 

Eric: [39.36] Yeah, it’s… You know, they’re talking about adapting Rust as their next programming language, .NET core running on Linux… We just helped a customer build — we wrote the backend API for a customer facing portal, and the customer facing portal is all written in .NET, containerized – which is great. 

Max: [40.00] Five years ago you’d never imagine having that conversation, right?

Eric: [40.02] No, like five years ago I would have been like, oh, it’s .NET, we’re staying away.

Max: [40.08] So that’s actually an interesting thing that we haven’t talked about at all with this, because you now, SCTG is more than just… You’re not giving somebody advice around managed services or ARE functions or these kinds of DevOps roles, of hey, we’re going to help you figure out how to run this in Amazon or how to make sure Amazon’s running right, you’re actually writing code for people and helping people build platforms or modify their existing systems or figure out, hey, I’ve got this application that was running on premise, and now you want to move it, and you want to make it cloudy, and this is what we have to do with it. 

Eric: [40.41] That’s right. 

Max: [40.42] Let’s talk about that a little bit, because I mean, I don’t want to gloss over this, this is a really big deal. 

Eric: [40.48] Yeah. You know, that’s a little bit of our — the Turing Group routes that come through, you know? If you remember when we talked about when we started that company, that we were effectively a group of consultants, right? We were technologists that had — that were really smart about how to design and build distributed solutions and do that within the public cloud, so… When I talk about how we — we first just jumped in and started talking about solutions without understanding the why, that trajectory led us to this place where the nature of our conversations with all of our customers changed. We weren’t talking about gigabytes and you know, how many terabytes of disk you need, we were talking about what problems are you trying to solve, right? Defining those problems and understanding them in a business context. And so, our conversations now with many of our customers are exactly that, and then when they sort of learn that, hey, we have an understanding around this stuff, they ask for help in building it, and help in designing it, and help in architecting it. And so, in many cases, we get involved either as the development team, as part of a development team, or as advisors to a development team. We also sort of act as advisors in many other different ways, so… You know, we had a university approach us recently and they had no idea how to pursue building, you know, a campus-wide fibre network, but we have a massive fibre backbone and are experts in networking, like why can’t we leverage that and help them achieve their goals, right? So, we’re sort of unique and special in that sort of way in that we can act as this technology solutions company advisors, but we also have our own infrastructure and capabilities to support those solutions. So, it’s different from like a traditional consulting company in that sense, and it’s also different from a managed services company in that sense. Sure, can I give you a sales order for a petabyte of object storage? I can, but I want to make sure you’re going to succeed with it first.

Max: [42.55] So, since you went there, let’s talk about this. Rust, node.js, Ruby on Rails, .NET, PHP, ColdFusion, let’s put that out there as well, Python, Java… I mean, that’s okay, right? These are all still the different — I mean, popular languages out there, that’s a lot of skills to have. Is this stuff that you guys are dictating, saying hey, we’ve got this project and this should be in Rust, and this should be in node.js, and this should be in this, or are you finding that customers are saying, hey, we’re hearing the hot thing right now is — we want to run GraphQL with Vue, can you help us with that? 

Eric: [43.36] Right, right. We will comment on language if a customer has not yet decided in one particular language. You know, I think we’re not dogmatic in terms of tooling, I would say. I would say we’re dogmatic on architecture and philosophy and design choices. So, in the sense that, you know, could you make this work in node.js, you could. Could it be done in Go? It certainly could. Could it be done in C#? Yes. Are some of those better than others? Yes, they are. So, I think we’re very much dependent on the right tool for the job, but I think we take other aspects under consideration. One is, are we developing it, or is your team developing it? And what does it mean for your business to have to, I don’t know, let’s take Perl for example, to support and run a Perl application? Is Perl a bad language? You know, we can debate that all day long. Can you hire Perl developers? That’s not as debatable. 

Max: [44.39] And that’s not even, could you find people that know Perl, that could you find people that know Perl that are going to come and work for you, building Perl at this point? And I still use Perl almost every day, I mean, I just — old habits die hard with certain things, and it’s an easy language to do certain things in and I find myself there all the time. 

Eric: [44.56] Some languages are unavoidable in certain areas, so if we’re talking about probably the land of infrastructure automation, you’re either talking Python or Ruby, and that’s it, right? You’re not doing infrastructure automation in C#, right? And if you’re doing it in, you know, PHP, God help you. So, I think in certain contexts, there’s things like that – there’s also technical considerations right, like if you are thinking like, well I really like some of the virtuals of Elixir but we want to use Lambda, well there’s no Elixir Runtime, so… You don’t have a choice, right? So I think, in all those areas, including you know, what kind of pub sub or messaging bus should you use, we’ll have opinions, and we’ll help navigate those things, and we’re also happy to engage in research and leverage our connections to make sure that we’re making the right recommendations.

Max: [45.57] Environments went from physical servers to virtualization to public cloud, right, instances on public cloud, containerization now, of course. I mean, if you’re not containerized, you should get containerized as quickly as possible, disclaimer. And now there’s a pretty big push into, you know, serverless functions and function execution and whether that’s in the form of a Lambda, or these different CDNs that are offering code runtimes on the edge… And what I’ve kind of wondered about with this for a while, is not so much the… Is this important and is this going to have a big change with how people architect their applications, but with containerization, you have a certain amount of portability that’s very easy, and how much portability do you have with a serverless function. If you’re making the decision to go down this path with AWS with Lambda, you know… Even with all the efficiencies and neat things that you get out of it and things you don’t have to think about, right, because you’re not — even with Docker, now you’re talking about you have to have some sort of CI/CD pipeline that’s building an application, creating a container, pushing — I mean, you go serverless, I mean really it’s like you have nothing. There’s nothing you’re thinking about – is that in itself risky in terms of vendor lock-in and cloud lock-in and portability. 

Eric: [47.12] There is risk, but I think it’s different, and I have two points to bring up on that. The first point is, one of our customers, we actually built a fairly large, serverless application for them in AWS, on top of node.js as the  core backend. So, it was an API gateway, it was Lambdas, it was some SNS, Dynamo, a handful of other things. And that customer was courting a prospect in the retail industry, and that prospect said, “We need in our contracts that you won’t use any platforms on top of Amazon, because we don’t like Jeff Bezos and we don’t want any money at all going towards him, so if you want to work with us, you can’t use AWS,” basically is what it came down to. You know, I felt really bad for our customer, because their whole thing was built on top of Lambda, which feels pretty proprietary, and they asked us – they paid us to do some analysis of what it would take to port that to Azure functions. We were like, these guys are nuts, no way! Like, that’s going to be a complete and total re-write and for one customer, and like… You know, we did all our hand waving and then eventually said, okay, fine, we’ll go do the analysis. And the result was surprising. While there was refactoring needed, and there wasn’t pure API compatibility between various services, the code that needed the least amount of refactoring was the Lambda code, because ninety percent of the code within the function was straight up node.js, straight up Javascript, nothing proprietary; you could actually copy it and run it locally with the node.js command line. 

Max: [49.09] Yeah, I was going to say, so you’re talking about the code, the issue code being like DynamoDB code, versus —

Eric: [49.14] More of like, okay, well you know what, the pieces that talk to Dynamo need to get modified because Azure doesn’t have a Dynamo, or the pieces that talk SNS. In fact actually, those were easy, because you extract most of that through extraction layers, but a more challenging layer was authentication and IAM, because those two things vary significantly between the providers. So that was probably the most challenging piece, and then also we were using – in that case – the AWS IoT stack, we had to move to Azure IoT, so it was a little bit of work there, but we — the moral of the story is, we walked away thinking at the beginning, yeah, this is a total rewrite, to, I think we can do this in about twenty percent of the total time invested, and I think we used about eighty percent of the original code. So, it wasn’t — was it great? No. Was it the end of the world? Also no. It is possible. I think that’s one part of your question about vendor lock-in and risk and things like that, but I think the second piece that I like to think about and also why we exist, is, you know, you talked about we went from bare metal to virtualization, now everyone’s running on VMware, to the cloud and now containers and functions and all this other stuff. What the outcome of all of that has been, has been a complete and total explosion of complexity, because we have extraction layer on top of extraction layer on top of extraction layer, and the developer that’s writing Lambda code doesn’t have any idea about how it’s actually running, and how it’s talking to the network and how it’s communicating, right? There’s so many levels of instruction. And also, so many tools. If we think about Kubernetes, from you know, ingress utilities and processing, there’s… Kubernetes is an orchestration of, we’ll call it thirty different processes that all have to work together just right, right? From different companies and different sources and what we’re seeing is that our — companies don’t want to understand that complexity, and they want to outsource it, they want to outsource it to companies like us, they don’t want to figure it out, they just want it to work, they just want to write their code, they want to hit deploy, and just make it all happen, right? And so, it’s just – and the level of expertise you have to keep in house, you have to have people that know all these different things and know them really well, and are nimble and are ready to evolve, like… If you watch the, you know, number of products on the — I think the cloud native computing foundation keeps a great chart of all the different technologies involved in developing cloud native applications. There’s two hundred things on there, right? 

Max: [52.04] And so, I mean – okay, so there’s some strong personalities on the internet that are advocating for monolithic applications again, like this return to the majestic monolith. And you know, at the same time, you see people posting their services oriented architecture, what their actual application flow is, and we’ll use Lambda, right? We want functions, and there’s a flow chart, and there’s, you know, fifty, sixty things on this flow chart in order to do something. And, I mean, that comes back to your point, it’s interesting when you look at the complexity and the interactions and how do you maintain these applications and what are you actually doing with it, and that comes back – for me – into the… I mean, as you said, what are you trying to do and why are you trying to do it? If you’re just trying to send an email out, is this the right way for you to send an email out? This might not be the smartest plan, here.

Eric: [52.58] You know, I think Max… Again, I talk about this notion of being dogmatic about approaches in technology, and I think we always need to keep an open mind and understand what problem it is that we’re going after, and then choose the right tools, technologies and approach to get the job done. Secondly, I think complexity is like energy – it doesn’t ever go away, it just transforms, right? You can never destroy energy, we just move it from one state and one way of being to a different state or way of being, and so, maybe we had monolithic applications that were two point eight million lines, you know, good God, good luck trying to find where the bug is in that code, to, you know what, we don’t have any single app that’s more than a hundred and eighty lines of code today, good luck untangling the infrastructure and the transaction path between your functions to figure out where the problem happened. Now you’ve got to learn this highly complicated, distributed transaction debugging tool. What we’ve done is move the complexity from here, you know, to there. You know, so it’s… It’s — even in the exchange example, we’ve moved the complexity from our exchange admin sitting in our office and running the cluster in our datacenter to Microsoft’s. 

Max: [54.13] Yeah.

Eric: [54.14] It’s a lot simpler for me today, but someone’s absorbing the complexity.

Max: [54.17] What’s really impressive about this for me, and it never gets old, I’m always in awe of this, is you went from the internet being very ancient and coming on the internet with modems and this evolution and SDSL coming out and people having broadband, and now phones with really good connectivity anywhere, and this explosions of devices and data… People used to call it web scale, and really talking about planet-wide civilization scale applications available at your fingertips, and that the ability for velocity, this idea of velocity, of very quickly being able to develop and roll things out, and produce things that can be consumed planet wide, is never… I mean, it never ceases to amaze me, I look at this with wonderment every time I think about it. But then I take a step back, I say great, so now you’ve built this thing, now you have to make sure it’s running, you have to keep it online, make sure you don’t bankrupt yourself in the process, you know, whoops – we had an S3 bucket directly connected to the internet and somebody’s downloaded all our medical transcription files and we just leaked all these patient records because, whoops, we didn’t configure. And that’s the same thing, where it’s — you talk about energy and complexity being shifted, so now it’s becoming very easy for an enterprise to use these tools, but it creates other complexity where they don’t have expertise per se, and how do you keep track of which of the hundred Amazon services is the right one for your particular application and oh, by the way, did you secure it? Do you have a cost governance model around it? Are you tagging it properly? Can you figure out what’s running, who turned it on, what it should be called, should it be turned off? Like, you know, and I love all these tools that are out there that do nothing other  than just scan your AWS infrastructure and say, hey, by the way, you’ve got all these EBS volumes that aren’t connected to anything that you’re being charged twenty five thousand dollars a month for.

Eric: [56.04] We just came full circle, Max. When we started the conversation, we talked about the idea that there’s a new kind of risk called the engineer, and the likelihood that they’re going to make a mistake is high because we have all this new complexity, so we can have a bucket that’s exposed or we can have, you know, run a single command that affects a massive amount of infrastructure and makes the wrong change, right? And then the second piece — so there’s all this  complexity and there’s all this room for making mistakes, and when we go back to the conversation about when a VC is looking at your firm, should they judge you based on what platform you’re using or what technology you’re using… No, it’s about what are your operational practices, what do you have in place that reduces the chances that you will make mistakes, that reduces the chances that you’re going to fat finger something, that is a one off change, that’s not a documented change, that there was a change control review process put in place, that there’s regular things that go on that ensure our infrastructure meets best practices, and it can’t be a manual human audit, it has to be continuous compliance. That’s the only way you can sort of manage that risk.

Max: [57.15] Well, Eric – I want to thank you very much for joining. We could spend a lot more time together, next time when we’re on I think we’re going to have to debate different orchestration platforms, maybe I’ll have you pick Chef and I’ll be Ansible, or we’ll do like Kubernetes versus Rancher, or we’ll do something fun.

Eric: [57.34] Well, that’s not going to end well. 

Max: [57.36] No, no, no, it’s not designed to end well, right? But Eric, thank you again. It’s a pleasure.

Eric: [57.42] Yeah, you’re welcome, any time – thank you Max.

OUTRO: [57.48] Thanks for joining the Tech Deep Dive podcast. At Clarksys we believe tech should make your life better, searching Google is a waste of time, and the right vendor is often one you haven’t heard of before. We can help you buy the right tech for your business, visit us at Clarksys.com to schedule an intro call.