Save your FREE seat for 流媒体 Connect in November. 现在注册!


在下一篇文章中了解更多关于sla的信息 内容交付峰会.

Read the complete transcript of this clip:

史蒂夫·强: How do you start hitting some of these numbers? 你可以采取什么方法? You can kind of ignore the "99% and below"--most people in this room probably aren't interested in those numbers. 坦率地说,它们很简单. You've got so much time to respond to stuff that you can just have a single box set in your cupboard doing the job.

艾德里安·罗伊: I met an encoding vendor a year or so back whose name should definitely remain nameless, 我对他说, “会发生什么? 当编码器出现故障时该怎么办?他说, “嗯, what we do is we send an engineer into the room and they take the SDI cable out of that one and they put it into the replacement and they turn it back on again." They called it "orchestration by human being.“Kubernetes不是.

史蒂夫·强: But it works, and if you can cope with that, it's a really good way to do it. At 99.你的时间越来越有限. Knowing what period it's measured over starts becoming very interesting. If you're measured per day, frankly it's still pretty easy though at 1.4分钟. If your SLA is over the course of a month--and assuming you're not going wrong every single day--you've got 10 minutes to spare. So, 99.9, if you're very organized and you plan what you're going to do, you can probably still get away with a fairly simplistic approach to delivering that sort of number. 这并不难. 一旦你达到99.99, it starts becoming out of the realms of human control. You can't have orchestration by human beings at that point and there's simply not time to respond.

You've got to have multiple systems live the whole time. You've got to have things active, ready to take over when stuff goes wrong. At 99.99, you might get away with something like an n+1 model where you've got a bunch of live systems and a couple of hot spares that are sat there running. 如果一个活的出了问题, you re-commission one of the hot spares and you're up and running quickly enough. But you're certainly in a world where you're going to have to have some form of distributed system. You've got multiple computers, you've got multiple things ready to go. 动态部署的能力, ability to spot the errors automatically and recover from them automatically.

艾德里安·罗伊: You get some interesting cultural challenges there. 我们有一个客户, we were deploying an n+1 solution for them for that sort of level of SLA, 他们说, "But when we're showing customers around our data center, how do we tell them which is their encoder?" 和 we said, “嗯, you don't because there isn't a 'their encoder.'" Because as soon as there's a "their encoder," when that one dies, what do you do? It's going to have to move somewhere else and in order to have it move a running job--move from one server that's just died to another server and have that happen in the course of a fraction of a second or maybe a second or two--ain't no human being involved in that. At which point the whole notion of this piece of infrastructure is delivering this part of my server just doesn't make sense anymore. 和,我不骗你, what we ended up having to do for them was we came up with this concept of "preferred encoder" that, if things were all normal and we had lots and lots of green lights everywhere, then this would be the BBC's encoder and you could show them 'round your data center and say, “就是那个.“他们有一个标签机和所有的东西.

史蒂夫·强: We suggested they just stick the labels on it. They actually really wanted it to be real. A 99.999, you've got to have multiple live systems. You've got to have A/B systems running the whole time. 你真的没有选择. 几乎是不可能的. Again, depending on where your SLA is measured. But if you shoot me a download, that's on the daily level, the reality is you've got 0.86秒的反应时间. 和 that's not just to respond; that's to detect and respond. So you've got to spot that something's gone wrong, 修复它, and have the service back up and running before your 0.864秒到了. That's very little time to do anything, particularly on wide area networks and so on. You've got ping times measured in hundreds of milliseconds. You've only got 860 milliseconds to do it. 嗯,时间不长. 所以你真的需要A/B系统. You've got to have multiple systems live delivering the service the whole time.

现在就订阅 最新一期 过去的问题

What Does 'High Availability' Mean in 在线直播?

id3as Directors Steve Strong and Adrian Roe explain that high-availability content delivery isn't just about meeting SLAs with overall uptime as not having your streams go down at critical moments.
