Category Archives for "testRTC tips"

7

How do WebRTC Media Servers Behave on Packet Loss?

Differently from each other.

Whenever I see people comparing WebRTC media servers, they tend to focus on scale:

– How many sessions can you cram in parallel?

– How many streams can you serve from a single machine?

– How much bitrate can you pump out?

All of these are very important questions – they end up in your sizing calculation that then go into your pricing model for your service. Oh, and we did cover this a bit here when talking about handling WebRTC browsers synchronization at scale.

Now that our new version is taking shape (still in staging, so if you want access – ping us), it is time to play a bit with a few new toys we’ve added for our beloved community of sadists (you may know them as test engineers, but the good ones are sadists – they like inflicting pain upon digital products and services).

What I am talking about here is a combination of two script commands we have:

  1. rtcEvent() – place a vertical event in the graphs
  2. rtcSetNetworkProfile() – change network profiles in runtime

You’ll see how it looks in a second.

What Packet Loss Does?

Packet loss is bad.

You don’t control it. And it can happen at any time. Come and go as it pleases.

The moment you have packet loss, there will be some degradation in the quality of the media. Lost packets means lost data. Means can’t playback something. It might be minor. It might be important.

Next thing that happens? WebRTC (or most other VoIP products for that matter) will start lowering bitrates. Why? Because it assumes there’s congestion on the network, and it is trying to play nice with everyone.

But what happens once that packet loss is gone? Does things go back to normal? And if they do, then how fast will that happen?

My Experiment

I decided to devise a simple enough experiment to get some answers here. I chose the following steps:

  1. Connect to a service
  2. Run for a full minute
  3. Set packet loss to 10% for a full minute
  4. Go back to normal – no packet loss
  5. Wait two minutes

That’s it. What I am interested in is less of what happens during the second minute, but more what happens in the last two minutes, and how that is different than what we have in the first minute of the session.

In general, I decided to place 5 users in the same session, to get that media server working a bit. And I also decided to focus on the SFU kind.

The services I tinkered with are:

  1. AppRTC, just as a baseline for this exercise
  2. Janus, an open source media framework, that can act as an SFU
  3. Jitsi Videobridge, an open source SFU
  4. mediasoup, a relatively new open source SFU
  5. SwitchRTC, a commercial SFU
  6. appear.in, a service that recently added its own self-developed SFU (in beta at the moment)

If you are looking for Kurento or other SFUs – they weren’t included not because I didn’t want to, but because there was no readily available installation out there that I could just use.

I’ll be happy to add more SFUs to the comparison, so give us a shout out if you want to run such an analysis.

Let the fun begin.

AppRTC – My Favorite Baseline

For our baseline, I decide to use AppRTC.

This time, I had to use only 2 browsers, as AppRTC doesn’t support any group calling capabilities.

What it does do is offer the vinyl WebRTC experience.

I started with writing a simple script to fit my needs:

var roomUrl = process.env.RTC_SERVICE_URL + "testRTC" + process.env.RTC_SESSION_IDX + '?vsc=VP8';

var agentType = Number(process.env.RTC_IN_SESSION_ID);
var recuperationTime = 60; // in seconds

client
   .rtcInfo(roomUrl)
   .rtcProgress('open ' + roomUrl)
   .url(roomUrl)
   .waitForElementVisible('body', 60000)
   .pause(2000)
   .click('#confirm-join-button')
   .waitForElementVisible('#videos', 20000)
// Minute 1
   .pause(recuperationTime * 500)
   .rtcScreenshot('Phase 1')
   .rtcProgress('Phase 1')
   .pause(recuperationTime * 500);

// Minute 2
   if (agentType === 1) {
   client
       .rtcEvent('10% Packet Loss start', 'global')
           .rtcSetNetworkProfile('custom', 'packet loss', 10, 'both', 'both'); // 10% packet loss
   }

client
   .pause(recuperationTime * 500)
   .rtcScreenshot('Phase 2')
   .rtcProgress('Phase 2')
   .pause(recuperationTime * 500)

   if (agentType === 1) {
    client
       .rtcSetNetworkProfile('') // back to pristine network conditions
       .rtcEvent('10% Packet Loss End', 'global');
   }

// Minute 3-4
client
   .pause(recuperationTime * 1000)
   .rtcScreenshot('Phase 3')
   .rtcProgress('Phase 3')
   .pause(recuperationTime * 1000);

A few things to note here:

  1. All test scripts on this post can be found on our github account. Easiest way to use them is to import them into your testRTC account
  2. I decided to force VP8 here. VP9 is erratic a bit in its bitrate so I wanted to go for VP8 – hence the addition of ‘?vsc=VP8’ in the first line of this script (check out all of AppRTC’s parameters here)
  3. When the second minute is up, the first probe in each session will generate a global rtcEvent and set packet loss in both directions to 10% (look at lines 23-27)
  4. After an additional second is over, the first probe in each session will generate another global rtcEvent and remove all packet loss and network constraints that might have been used (look at lines 35-39)

Running that using testRTC yields these results once you drill into one of these sessions:

Above you see two things:

  1. The green vertical lines – these are the result of the rtcEvent() calls
  2. The blue and red bars, showing incoming and outgoing packet loss percentage, which averages at 10%

Above you see the video bitrate graph, with the two horizontal lines on it.

Notice how the outgoing bitrate tries going up in the beginning and then drops from 2.5mbps to 1mbps in 60 seconds?

The other thing that interest me is the time it takes for WebRTC/AppRTC to get back to 2.5mbps. And that’s somewhere in the range of 15-20 seconds.

Oh, and because I know you’ll be interested in this – also remember this screenshot of the video average delay we had:

Before we move on to the media servers – remember that what I tried doing with AppRTC is provide a baseline. And the baseline here is “picture perfect”. I didn’t really expect any of the SFUs that I’ve used to be able to match AppRTC with its metrics.

Janus

Janus is an open source media server created and maintained by Meetecho.

They have an online demo running that supports a simple video room.

So we just hooked our script on top of that to get the results we needed. We aimed for 5 browsers in a single room – which will be the norm from now on in this article.

The Janus demo has somewhat of a single room, and I had to end up with a J3rry user in there, though he seemed harmless with no camera or bitrate in my session.

You can see above that the bitrates are rather low – around 140 kbps for each video stream coming into this room. And that’s even before I started adding packet loss.

During packet loss and after it, we “lost” two participants. Here’s a screenshot taken a minute after I stopped packet loss altogether:

The graphs in testRTC show a grim picture:

Janus reports packet losses at higher intervals than what WebRTC does, which is why we see the spikes on the outgoing reporting that go up to 50% and more. The weird thing is the two incoming channels that show around 10% of packet loss as well. Which is weird – more about this later.

Here’s how video bitrates look like for some of the streams (one outgoing and two incoming):

No change even though we have packet loss.

And here’s what happens in the two other incoming streams:

Apparently, these two incoming streams are the ones showing packet loss from the start. They somehow decided to drop to 0 the moment we cranked up the artificial packet loss from 0 to 10% – but never recuperated from it.

Looking at the average delay for the video…

Things can’t be good, but seems like this has nothing to do with my packet loss shenanigans.

It might be Janus and it might just be the demo machine. If I could, I’d reboot it and start all over again.

Jitsi

For me the Jitsi Videobridge is where I go first to run demos and tests on an SFU with testRTC:

  • It is out there
  • It is easy to automate
  • And I am a creature of habit…

To run our test here, we’ve directed 5 of our probes into a single room on the Jitsi meet online service/demo.

After a few attempts, I decided it would be better to disable simulcast, using this prefix to the URL: ‘#config.disableSimulcast=true’. I didn’t do it because simulcast is a bad thing, but because it made analyzing the results much harder for what I had in mind.

If we look at the packet loss graph, it will tell a similar story to what we’ve seen so far:

While there are some packet losses out of the one minute killzone I created, they are negligible (or at least sporadic). That negative values you see for packet losses in the red color? They are reports of the browser’s outgoing stream from the machine we induced packet loss on. This is most probably related to a Chrome bug (HT to Philipp Hancke).

I’ve split the video bitrate graphs here into two graphs – the outgoing one and the incoming ones since they tell two separate stories.

This one caught me by surprise – the outgoing bitrate shows no signs of a change due to packet loss. I wonder what Jitsi is doing (or not doing) to have packet loss ignored in such a way. So I decided to look at it from the receiving end of one of the other four browsers in the same session:

Bitrate drops to 0 for a duration of almost a full minute before coming back up.

Back to the browser with the trashed network, let’s see what happens to the incoming video streams:

Things drop down from around 2mbps to almost 0 on all incoming channels, taking around 40-60 seconds to get back to normal.

One last glance before we move on – check out video average delay:

Jitsi had some hard time recuperating from that packet loss.

It should be noted that I’ve played around with Jitsi before their recent updates – especially the ones including adaptivity.

Mediasoup

mediasoup is a rather new player in the open source SFU space. It is built in C++ as a Node.js module. After a quick Twitter chat, Iñaki Baz Castillo was kind enough to configure it to my needs (specifically, allowing for more bandwidth on the online demo).

Starting as always with packet loss:

The graph seems fine. Percentages are low because of the way packet losses are reported back from the media server. Probably some FEC / retransmissions are involved as well (this would be the case with many of the media servers out there).

Looking at the video bitrate, we see an interesting picture:

There’s a hiccup in the outgoing bitrate (the red line), but that for some reason takes place close to the end of the 60 seconds packet loss window.

There’s also a reduction in incoming bitrate for one of the video stream. It starts around 20 seconds into the packet loss zone, but it doesn’t recover even when we remove the packet losses.

Video delay is also a bit problematic:

It starts off nicely, goes up when packet losses start and never recuperates.

SwitchRTC

Moving on from open source to commercial, there’s SwitchRTC.

It started by me asking for a 2mbps bitrate limit. Now, the way this was set up and without simulcast, it meant the browser is going to need to encode 2mbps and decode 4 streams of 2mbps each. This turned out to be a bit too much for the way we configure our machines (and frankly – probably too much for almost any use case you plan on deploying when it comes to assuming what your typical customer may have).

The end result of it was graphs that went all over the place – each stream and each browser tried hard to compete on resources that were limited, and it wasn’t really nice.

So we dialed back down to 1mbps bitrate limit.

As always, let’s first look at the packet loss graph:

Two things here to note:

  1. One of the incoming video streams has packet losses outside the packet loss zone. Not unheard of, but a bit off the charges compared to others. I think that is due to the data centers used by SwitchRTC for this demo
  2. There’s negative packet losses on the outgoing video stream. This is due to the way SwitchRTC handles packet loss reporting (or more likely filtering packet loss reporting)

For bitrate, I took two screenshots. One for the incoming video streams and one for the outgoing video stream.

On the incoming stream we see an interesting phenomena.

When packet loss starts, bitrate picks up, most likely to overcome the packet loss. It makes sense, since we didn’t limit bitrates, so that seems like the correct strategy. Would be interesting to see what will happen if we limit bitrate as well.

The second thing, is that we have one of the incoming stream dropping down to almost zero and then picking up again. This is the same stream that shows high packet losses. I wonder what causes that.

The graph above shows the outgoing video stream. This is almost textbook behavior for the outgoing video. Once it notices there’s issues, it starts increasing bitrate to compensate, and when that fails – it drops down slowly. It is similar, though not as smooth as what you see with AppRTC.

appear.in

appear.in have a beta SFU, which Philipp Hancke was kind enough to let me use.

Now, appear.in isn’t a media server or a component you can use in your own service – it is a full service, which makes this comparison a bit unfair – checking demos and comparing them to a commercial service.

But then I wanted to check this one out, as it isn’t based on any external framework – it was self developed in house at appear.in

The results are interesting.

Packet loss graph looks rather nice, if a tad low in the percentage:

This shows how far appear.in goes in gauging and polishing the way they make use of network resources.

Video bitrate stays at the 600kbps vicinity – not showing any real effects from my additional packet loss:

Best part though is that the video delay graph doesn’t look erratic:

I am not sure how to compare these results to the rest. I will need more time to check this out – time that I just didn’t have available for this experiment of mine. I will leave it for some future tinkering.

Summing things up

Different media servers will act differently. Especially when putting them under different network conditions.

What I wanted to show here, is how you can use testRTC to goof around with whatever setting you want. Here are a few other ideas:

  1. Drop the network down to 0 bitrate. Wait a bit. Put it back up. Did media return? How quickly did it come up again?
  2. Limit bitrates to different levels. Check if your media server adapts things like resolutions and other interesting parameters to fit the needs
  3. Go down to 50 or 100 kbps. Does video persist or is the media server shutting it down in favor of audio?
  4. Limit bitrate and add a bit of packet loss at the same time (this would be closest to real life). See what happens then – how will the media server behave?
  5. Do the above while adding some load on the server. Does it start fidgeting or is it handling this nicely?

A few things to remember here:

This isn’t an apples to apples comparison

I haven’t taken each and every media server and installed it on my own on the same server configuration. I just used the online demos each of these vendors had. At times, asking for assistance and a bit of configuration from the vendor.

What was different:

  • The server(s) the media server was installed on
  • The configuration of the server, especially what max bitrate it allows

What was similar:

  • I tried disabling simulcast in all servers. Assume that’s a bad thing to do, but I wanted a level playing field on that front
  • The browser used. It was the same for all tests. This includes their version, the machine they were installed on, the network they used, their geographical location – everything
  • The scenario itself. I essentially executed the same scenario over and over again in front of different media servers

Where do we go from here?

Media servers are hard to develop. They are hard to tweak and optimize. And they are hard when it comes to making sizing decisions with them.

They are also pretty good. Most of the ones shown here are running in production services with live customers.

When you go tomorrow to pick the media server for your own project. Or when you want to plan how to size capacities per machine. Or if you want to check your media server in real life scenarios – we’ve got your back.

Check us out. I am sure we can be of help to you.

3

The 4 Techniques of Monitoring WebRTC Services

I remember that first time our servers went down after we had a couple of paying customers.

We got a call from a customer once. The only thing he wanted was to use our monitoring service. Since I knew him before, and knew he wasn’t interested in our monitoring – I asked him why.

I got something similar to this answer:

“We have monitoring on everything. We monitor the machine’s CPU, memory, storage. We look at the network. We collect metrics from our apps and monitor these as well. But yesterday we had a downtime of our service and we didn’t know it until a customer complained.”

Which brings me to the point – with WebRTC, it is extremely important to use end-to-end monitoring. It is also extremely important that this monitoring thingy you are putting in place knows a thing or two about WebRTC, otherwise, how will you know if the customer is really getting that video call or just looking at a blank screen?

Great. So now that we know we have a problem what’s the solution?

Luckily (or not?), there’s more than one way to handle monitoring WebRTC services. I like characterizing the solution based on 2 parameters, making for a nice quadrants to visualize it:

I’ll be using the terms active and passive here to describe the probing technique in a way that might be somewhat confusing to some, but for me this works.

Active monitoring is a system which actively generates traffic in the monitored product, using the generated traffic and the product’s behavior to determine its health.

Passive monitoring is a system which passively collects metrics off the different product components, determining from that the product’s health.

The exact definition/architecture of what is Cloud / SaaS versus what is on premise on premise for me ends up depending on what probing probing technique you refer to – active or passive monitoring. Let’s see how they compare (and along the way explain what cloud and on premise is in each case).

#1 – Active Monitoring (Cloud / SaaS)

Active monitoring is for us the most popular monitoring service that our customers subscribe to.

The way such a monitor works?

  • It has a specific scenario it executes
  • It runs it at a given frequency
  • It validates a certain set of expectations, deciding if there were any failures requiring raising an alert

The WebRTC monitoring frequency pyramid above shows the various frequencies such a monitor can employ.

A daily monitor is akin to a ping – a healthcheck placed on a demo system for example; while a 1-minute monitor is mission critical – it is there to find issues and alert about them as soon as possible and before your customers notice them.

The cloud part of the active monitor is about the machines used to run your service. You deploy them in the cloud, probably on a managed monitoring service (we’ve got one for you). It means less setup hassle and also the ability to decide the geographical location of these machines.

Why use active monitoring?

  1. When your service runs at specific hours of the day. Contact centers for example, or doctor appointments. They tend to have their own “opening hours”, but what happens when the system breaks outside of opening hours? When do you get notified it? When the first customer complains at the beginning of the shift? Or 5 hours earlier when you get an alert from an active monitoring system? In order to get alerts ahead of time here, you need a “non-user” to join the session
  2. When the failure occurs before WebRTC altogether. Sure you have a great way to monitor calls that happen to interact with the WebRTC APIs. But what if the service failure occurs earlier? Like a connection error between your web server and the directory service? An active monitor that runs end-to-end can find and pinpoint such issues
  3. Consistency. Passive monitors show the experience of your users. But it can’t reproduce the same settings to show you if and how you improved – and it is devilishly hard to decide if the problem is a user problem or a service problem. An active monitor can be configured to run in very specific network configurations – over and over again. Its results can be compared in certain timeframes to show the objective degradation or improvement of the service
  4. Zero instrumentation. Nothing needs to change in your service to accommodate for active monitoring. The active probes that will interact with your service accommodate themselves to whatever you are doing today

Not all is rosy here though. To setup a good active monitor you need to plan a use case that fits nicely. One in which the UI of your service is predictable and simple enough to automate. I’ve seen a couple of times instances where monitors failed due to inconsistencies in the UI which caused service failures – things that humans would be comfortable with but automation would not be.

#2 – Active Monitoring (On premise)

An On premise active monitoring solution is similar to a cloud based active monitoring solution with one minor difference: the probes that are used are deployed “on premise” as opposed to “in the cloud”.

What does it mean exactly?

For an education service, where teachers and students can be anywhere, a cloud based approach works great. It actually mimics how the service is used “live”. So having the probes deployed strategically across the globe in different locations makes a lot of sense.

But for a contact center for example, where the agent sits inside the office, you sometimes want to have a monitor on site – a machine dedicated to monitoring also the network constraints that your agents feel – placing the machine within the same subnet on your local LAN.

So, the difference between Cloud and On premise Active Monitoring in WebRTC?

To sum things up – you deploy the probes on premise or in the cloud, but collecting and analysis can happen in both approaches in the cloud. Oh, and obviously, you can also end up deploying some probes on premise and others in the cloud (especially for a call center scenario).

The advantages of the on premise approach is that you get closer to real life scenarios with it for the use cases where you can place your users at a given location.

The main disadvantage is that this is usually a bit more expensive and time consuming to setup and maintain (there’s less of an option to use economies of scale fairy dust for it).

#3 – Passive Monitoring (Cloud / SaaS)

With passive monitoring, there are no real probes. We treat each and every user who interacts with the WebRTC service as a “probe for hire”, available if and when he decides to interact with the service.

In its Cloud variant, the data pulled off from the device gets shipped to the cloud to a third party service who aggregates and analyzes the metrics available in WebRTC (usually by means of getstats calls).

The advantages of this approach is that it gives you the data and analysis on your real user’s interactions. You can’t get any closer to that when it comes to reality. It is also easy to setup and get started with.

There are certain disadvantages though:

  1. Uptime. There is no indication of uptime here. If no users call the doctor before 8am, then you get no data for the time the system is idle – and no visibility towards its health
  2. Predictability. A session may experience failures or issues that relate to the user’s device or network. You will definitely want to optimize your service as much as possible for such cases as well, but it will be hard to check for objective trends of the service’ quality in such a way
  3. Privacy. You send the metrics about your service’ real live traffic to a third party, who can easily discern the size of your operation
  4. Instrumentation. You need to modify your product’s code to integrate with a passive monitoring solution. This will typically be a minor hindrance, but will be there

#4 – Passive Monitoring (On premise)

In many cases, people end up using homegrown passive monitoring systems.

What they do is collect data off the devices and then aggregate and analyze it in their own backend monitoring system. Terms like Elastic Search and Kibana and Graylog get thrown into the air – or god forbid – Big Data.

The biggest advantage here? You collect, get and analyze exactly what you want to. Oh – and you can also easily enrich that information you collect with your business logic and other metrics unrelated to WebRTC. In many cases, this is the reason I’ve seen vendors foregoing the cloud based passive monitoring approach – the need for enrichment and wider analysis.

The big disadvantage here is probably time and material. Putting such an operation in place can be time consuming and expensive. It requires developers to work on your monitoring infrastructure which no one sees at the end of the day instead of having them focus on your core product’s offering and features.

We’re in the process of running a pilot with an on premise passive monitoring product. If you want to learn more, just contact us.

Which shall it be?

Passive or active. Cloud or on premise.

If you are serious with what you are doing, and want to run it as a business – a viable commercial service – then you will need monitoring.

I urge you not to be happy enough with web based monitoring solutions and also go for an end-to-end type of a monitoring service that understands WebRTC.

2

3 Synchronization techniques to test WebRTC at scale

Testing WebRTC is hard enough when you need to automate a single test scenario with two people in it, so doing things at scale means lots more headache.

We’ve noticed that in the past several months where more developers have started using our service to understand the capacity they can load on a single server. And as we do with all of our customers, we assisted them in setting up the scripts properly – it is still early days for us, so we make it a point to learn from these interactions.

What we immediately noticed is, that while our existing mechanisms for synchronization can be used – they should be used slightly differently because at scale the problems are also different.

How do you synchronize with testRTC?

There are two main mechanisms in testRTC to synchronize tests, and we use them together.

What we do is think of a test run as a collection of sessions. Each session has its own group of agents/browsers who make up that session. And inside each such session group – you can share values across the agents.

So if we want to try and do a test run for our WebRTC service similar to the above – 4 video conference calls of 5 browsers in each call, we configure it the following way in testRTC:

While this is all nice and peachy, let’s assume that in order to initiate a video conference, we need someone in each group of 5 browsers to be the first to do *something*. It can be setting up the conference, getting a random URL – whatever.

This is why we’ve added the session values mechanism. With it, one agent (=browser) inside the session, can share a specific value with all other agents in his session – and agents can wait to receive such a value and act upon it.

Here’s how it looks like for a testRTC agent to announce it logged in and is ready to accept an incoming call for example:

We decided arbitrarily to call our session key “readyForCall”, and we used an arbitrary value of “ignore” just because.

On the ‘receiving’ end here, we use the following code:

So now we have the second browser in the session waiting to get a value for “readyForCall”, and in this simple case, ignore the value and click the “.call” button in the UI.

This technique is something we use all the time in most of the scripts these days to get agents to synchronize their actions properly.

How do we scale a WebRTC test up?

The neat thing about these session values is that they are get signaled around only within the same session. So if we plan and write our test script properly, we can build a single simple session where browsers interact with each other, and then scale it up by increasing the size of the session to what we want and the size of the concurrent agents in the test run.

With our video conferencing service, we start with a 3-way session, using 3 agents. We designate agent #1 in the session as our “leader”, who must be the first to login and setup the session. Once done, he sends the URL as a session value to the other agents in the session.

The moment we want to scale that test up, we can grow the session size to 5, 10, 20, 100 or more. And when we want to check multiple video conferences in parallel, we can just grow the number of concurrent agents in the test run but leave the session size smaller.

A typical configuration for several test runs of scale tests will look like this:

  1. Start with 5 agents in a single session
  2. Then run 10 agents in 2 sessions (5 agents per session)
  3. End with 200 agents in 10 sessions (20 agents per session)

What will usually go wrong as we scale our WebRTC scenario?

Loads of things. Mainly… load.

We’ve seen servers that break down due to poor network connection. Or maxed out CPU. Or I/O as they store logs (or media recordings) to the disk. And bad implementations and configurations. You name it.

There are though, a few issues that seem to plague most (all?) WebRTC based services out there. And the main one of them is that they hate a hoard logging in at roughly the same time.

That just kills them.

You take 20 browsers. Point them all to the same URL, in order to join the same session, and you get them to try it out all together in the span of less than a second. And things fall down in pieces.

I am not sure why, but I have my own doubts and ideas here (something to do with the way RTCPeerConnection is used to maintain these media streams and how the SFUs manage it internally in their own crazy state machine). Now, for the most part, customers don’t care. Because this usually won’t happen in real life. And if it does – the user will hit F5 to refresh his browser and the world will get back to normalcy for him. So it gets lower priority.

Which leads us again to synchronization issues. How can we almost un-synchronize browsers and have them NOT join together, or at least have them join “slower”?

We’ve devised a few techniques that we are using with our customers, so we wanted to share them here. I’ll call them our 3 synchronization techniques for testing WebRTC at scale.

Here they are.

#1 – Real-users-join-randomly

This is as obvious as it gets.

If we have 10 users that need to enter the same session, then in real-life they won’t be joining at the exact same time. Our browsers do. So what do you do? You randomize having them join.

For 3 browsers, we have them all join “at the same time”, we just spread it around a bit – just like in the illustration below, where you can see in the red lines where each browser decided to join:

Here’s how we usually achieve that in testRTC:

#2 – Pace-them-into-the-service technique

Random doesn’t always cut it for everyone. This becomes an issue when you have 100 or more browsers you want to load the server with. I am not sure why that is, as it has nothing to do with how testRTC operates (how do I know this? Using the same test on something like AppRTC with no pacing works perfectly well), but again – developers are usually too busy to look at these issues in most of the scenarios that we’ve seen.

The workaround is to have these browsers “walk in” to the room roughly one after the other, at a given interval.

Something like this:

Here, what we do is pacing the browsers to join in a 300 milliseconds interval from one another. The script to it will be similar to this:

This is a rather easy method we use a lot, but sometimes it doesn’t fit. This occurs when timing can get jumbled due to network and other backend shenanigans of services.

#3 – One-after-the-other technique

Which is why we use this one-after-the-other technique.

This one is slightly more difficult to implement, so we use it only when necessary. Which is when the delay we wish to create doesn’t sit at the beginning of the test, but rather after some asynchronous action needs to take place – like logging in, or waiting for one of the browsers to create the actual meeting place.

The idea here is that we let each browser join only after another one in the list as already joined. We create a kind of a dependency between them using the testRTC synchronization commands. This is what we are trying to achieve here:

So we don’t really care how much time each browser takes to finish his action – we just want to make sure they join in an orderly fashion.

Usually we do that from the last browser in the session down to the first. There are three reasons why:

  1. It looks a lot smarter – like we know what we’re doing – so my ego demands it
  2. It makes it easier to scale a session up, since we’re counting down the numbers down to zero
  3. We can stop in the middle easily, if we have different types of browsers in the same session

Here’s how the code for it looks like:

Here, what happens is this:

  • agentType holds the index number of the running browser inside the session
  • sessionSize holds the number of browsers in a single session
  • If we are not the last browser in the session, then we wait until the next browser tells us he is ready (line 8). When he does, we join (line 12) and then we tell the previous browser in line that we are ready (line 16)
  • If we are the last browser, we just join and tell the previous one that we’re ready

A bit more complex, so we save it for when it is really necessary.

What’s next?

Here’s what we’ve learned:

  1. We use session and session values for synchronization and scale purposes
    1. We split a test run into group of browsers, designated to their own sessions
    2. Inside a session, we can give different roles to different browsers
    3. This enables us to pick and choose the size of a session and the size of a test run easily
  2. In most cases, large sessions don’t like browsers joining all at once – it breaks most services out there (and somehow, developers are fine with it)
  3. There are different ways to get testRTC to mimic real life when needed. Different techniques support different scenarios

If you are planning on stress testing your WebRTC service – and you probably will be at some point in time, then come check us out. Here are a few of the questions we can answer for you:

  • How many users can I cram into a single session/room/conference without degrading quality?
  • How many users can a single media server I have support?
  • How many parallel sessions/rooms/conferences can a single media server I have support?
  • What happens when my service needs to scale horizontally? Is there any degradation for the users?

Partial list, but a good starting point. See you in our service!

We already have users. Why should we monitor WebRTC?

That’s actually a great question. We get this every once in awhile, so I wanted to touch it here.

Our WebRTC monitor? It is like pingdom, just more complex and end-to-end – we make sure that if a user tries to access your system he will end up getting media and not just a web page. And that distinction is important. Pingdom only states if a given IP address is responsive or that a URL returns a response. We go the extra mile of connecting a media session.

Back to the original question:

We already have users. Why should we monitor WebRTC?

That’s how the question goes. The service is up and running. People are using the service. We’ve even connected the servers we run to Nagios. Or New Relic. Or someone else. You know what? We even collect the WebRTC statistics on our live production system and closely monitor how real users perceive our service. Why do we need another monitor?

For the same reason you have Nagios and New Relic in there in the first place. If you have customers and they are happy and you check their stats – why invest in monitoring the servers directly?

 

Here are 4 reasons I heard directly from our customers:

#1 – We want to know before our customers complain

While the best predictor of a customer issue is a customer complaining, I am not sure this is the best approach.

Many vendors prefer knowing about a problem before their customers do. This gives them time to try and solve the issue – or at least give them the ability to tell the customer that they know about it and are trying to resolve it already.

A good example here is the launch of a new browser version. These upgrades tend to break things or change behavior in one way or another. You can put a monitor in front of the latest stable Chrome version – or better yet – the beta version. That way, you can catch more issues in advance and be prepared for them.

I’d say this is the main reason why customers subscribe to our WebRTC monitoring service. They may take different routes in what exactly it is that they monitor (locations, different deployments, different frequencies, different user profiles), but they all want to monitor their service.

#2 – Everything was up and running, but calls didn’t connect

We recently had a company approach us due to downtime that their service experienced. What they said was that the application monitoring that they had in place indicated service running perfectly well, but the service was effectively down for their customers.

You see, knowing that your server’s CPU is below 80% and memory looks fine doesn’t really indicate that if someone tries to communicate he will succeed. This type of monitoring is necessary but not sufficient.

By using testRTC monitor, you can rest assured that your service is up and running. Why? Because we access your service from a real browser just like a user would, and we go through all the hoops of your service to get to that media. And once we do? We can validate that the media also meets your criteria – like specific bitrates or packetloss target thresholds.

#3 – Uptime and network quality isn’t the same thing

If what you do is collect network statistics on calls getting connected by looking at WebRTC stats then think again. It might be necessary and important, but not enough either.

The assumption of such an activity that the service is up and running, and the only thing missing is knowing the network quality by reviewing the quality experienced by users. But is that a good predictor of the next failure? If you only monitor successful calls in the system, then how would you know that calls are failing because they can’t even begin to connect?

With the testRTC monitor this is something that is checked each and every time. Making sure that users can get connected. Sure. We’ll check the quality and your criteria of it. But first and foremost, we’ll make sure that session that needs to connect – get connected.

#4 – How do you baseline a service performance?

When you try and use the statistics collected from your live audience, it tells you how well this audience is experiencing your service, but does it tell you if/how can you improve it?

One of the toughest things to do with WebRTC is to spec out the server. You’ve built a service to scale. You’ve put in place scale out mechanisms on one of the biggest cloud providers. You know for certain that whenever you’ll need more capacity, your system will automagically grown and do so economically. Or will it?

Can you see from real users what will be the experience of the next user to join? Is he experiencing packet losses because of his own network (running from a smartphone in a basement connected over 2G) or is it because your server has a bad network connection of its own towards the internet and it is leaking packets?

Part of what our WebRTC monitor does is add predictability into the process. Whenever the monitor is scheduled to run, it will run as flawlessly as it did last time, over the same type of connection and network conditions you’ve asked it to. So if you see packet losses – you know it has to be something on your end.

This enables the creation of a baseline of your service performance, and put hard criteria in place against that baseline, so whenever a testRTC WebRTC probe will poke at your service and come back unsatisfied – you will know about it, and you will be able to take actionable measures to solve it.

How do you monitor WebRTC?

WebRTC is still rather new and nascent, so there’s little in the way of best practices and experience that got collected around it.

Here’s a screenshot I took just now from our demo account, where we run a few sample monitors in front of AppRTC and Talky (for no good reason other than the fact that we can):

WebRTC monitor sample screenshot

I am really interested in understanding how you monitor your service. What is it that you do to make sure it is up and running for your customers.

4

Executing a WebRTC test that scales

There’s a growing trend from the companies that come to testRTC in recent months, and it has to do with the focus of what they are looking for.

Most are less interested in how testRTC can be used for functional testing – things like coverage of scenarios and finding edge cases and automating tests for them. What people are interested now when they want to run a WebRTC test scenario is how to scale it.

Customers typically try to take stress in WebRTC tests in two slightly different vectors: they either focus on testing how their WebRTC service can handle multiple sessions in parallel or they focus on testing how their WebRTC service can increase the number of users in a single session.

Let’s review what’s the meaning of each of these alternatives.

#1 – WebRTC test that scales to a large number of sessions

I decided to put things on a simple graph. The X axis denotes the number of sessions we’re going to focus on while the Y axis is all about the number of users in a single session.

In this case, where we want to test WebRTC for a large number of sessions, we will have this focus:

Scale a WebRTC test by the number of sessions

So we have a WebRTC service to test. It has a single user in a session (a contact center agent receiving calls from PSTN for example) or two users in a session (one person talking to another across browsers).

In such a case, vendors are usually concerned about stressing their servers – checking if they can fit their intended capacity.

When this is done, there are three different things that can be tested for scale:

  1. The signaling server
    • How well does it behave while increasing capacity? How is its connection to the databse? Does it slow down as connections accumulate? Does it leak memory?
    • Usually, stress testing a signaling server is better done with other tools. Ones that have a lower cost per connection than testRTC and don’t really require a full browser per connection
    • That said, oftentimes, you may as well want to throw in a few “real” users using testRTC on top of a tool that loads your signaling connections separately – just to make sure there’s nothing that kills your service when media is added into the mix on top of the signaling
    • You also need to think about the third component below – how do you test your TURN server?
  2. The media server
    • These crop into 1:1 tests when there’s a need to record the session or to enforce a given route. I’ve seen many of these recently, mainly in the healthcare and education markets
    • For single users, this usually means the gateway that connects the user to other networks is what we want to test, and there it will usually include a media server of sorts for media transcoding
    • In such a case, there’s no getting away from the fact that scale is in the low 10’s or 100’s of browsers and real ones are needed. It is also where we see a lot of interest in testRTC and its capabilities
  3. The TURN server
    • Anywhere between 5-20% of the calls will end up being relayed via a TURN server – and there’s nothing you can do about it
    • If you put up your own TURN servers – how confident are you in your setup and its ability to scale nicely as your service grows?
    • One way to find out is to place real browsers in front of your service, but doing so in a way that forces the browsers to negotiate via TURN. This can be acheived by changing the configuration of your client, filtering ICE candidates and doing SDP munging. A better way would be to enforce network rules on the machine running the browser and actually test your service in different network conditions
    • And yes. testRTC allows you to do just that

#2 – WebRTC test that accommodates a large group of users in a single session

The other type of focus use cases we see a lot from our customers are those that want to answer the question “how many users can I cram into a single session without considerably degrading the quality?”

Scale a WebRTC test by the number of users per sesson

Many look for doing such tests at around 10-20 concurrent browsers, either in MCU or SFU models (see this post on the differences between the multiparty WebRTC technologies).

What happens next is usually a single session where browsers are added one on top of the other to check for scale. Here, the main purpose of a test is validating the media server and not much else.

The scenario is rather simple:

  • Try 1:1. Record the results
  • Go for 4 users. Record the results
  • Expand to 10 users. Record the results
  • Rinse and repeat

Now go back to the recorded results and see if the media got degraded:

  • Was latency introduced?
  • Do we see more packet losses?
  • Does bitrates go down the more browsers we add?
  • Is the bitrate stable or fluctuating all over the chart?
  • Is the degradation linear or exponential?

These types of questions are indicators to problems in the WebRTC product’s infrastructure (be it network connections, CPU, storage or software).

#3 – Test WebRTC at scale

And then you can try to accommodate for both these needs. And you should – scale the size of the sessions at the same time that you scale the number of sessions.

Scale a WebRTC test by the number of sessions and by the number of users in them

Here what we’re trying to do is everything at the same time.

We want to be able to place multiple users in the same session but spread our browsers across sessions.

How about running 100 browsers, split across 10 different sessions, where each session accommodates for 10 browsers? This is where our customers are headed next after they tested their WebRTC multiparty service for a single session capacity.

Why is WebRTC test scaling so hard?

When you scale test WebRTC infrastructure, you end up needing lots of bandwidth and processing power. Remember that each user is a full browser (why that is necessary see here). Running 2 or 4 of these may be simple, but running 20 or more becomes quite a challenge:

  • You can no longer place them all in a single machine, so you need to start distributing them – across machines, across data centers
  • You need to take care of both downlink and uplink network speeds – this isn’t easy to acheive at scale
  • You need to synchronize across your small army of browsers so they hit the server at roughly the right time for it all to work
  • Oh – and you need the WebRTC test environment to be stable, so that when issues occur, it will more often than not be due to an issue in the tested product and not in your test environment itself

testRTC, users and sessions

There are many ways to do multiple users in a single session:

  • All join the same URL or room, given the same level of access
  • A chair hosting a large conference, where control and access is assymetric
  • A broadcaster and a large number of viewers
  • A few people in a discussion with a large number of viewers

Each of these scales differently and requires a slightly different treatment.

What we did at testRTC was introduce the notion of #session into the mix. When you indicate #session, the test will automatically wrap itself around that notion – splitting the number of concurrent users you want into sessions at the size you state by #session.

Want to see it in action? Check our our latest tutorial videos on how to scale WebRTC tests in testRTC, by using the notion of a session: