Tag Archives for " WebRTC "

Monitoring WebRTC applications using testRTC

This time, I want to do a quick recap on a webinar we hosted last week. It dealt with monitoring WebRTC applications, and as is usual, we took the approach of doing a demo.

For this one, I’ve prepared for over a month. I’ve created 3 separate monitors, running on AppRTC, appear.in and Jitsi Meet.

Why did I pick these 3 services?

  1. They are all public, and don’t throttle or cap you in how they work (a lot of demos out there do that)
  2. They are quite different from one another, yet similar at the same time (not sure if that means anything, but it feels that way)
  3. AppRTC is used by many to build their first application, and it is Google’s “hello world” application for WebRTC (it is also unstable, which is a lot of fun monitoring)
  4. appear.in and Jitsi are widely known and quite popular. They are also treated as real services (where AppRTC is merely a demo)

What was the scenario I used?

  1. Create a meeting URL (either ad-hoc or predefined)
  2. Join the URL
  3. Wait for 2 full minutes so we have data to chew on
  4. Run the above every 15 minutes for a period of a bit over a month

Connecting the dots

I wanted to show a bit more than what you see in the testRTC dashboard. Partly out of curiosity but also because many of our monitoring customers do just that – connect the results they get from testRTC’s monitor runs in their own applications.

So I took this approach:

I’ve created a Google Sheet to collect all the results.

In Zapier, I then created a Webhook that collected the results into that Google Sheet.

In testRTC, I’ve configured a webhook to connect to Zapier.

10,000 monitor runs later… there was enough data to chew on.

Initial insights

Here’s how the data looks like on the testRTC dashboard:

The first row? AppRTC. It fails every once in awhile for 2+ hours at a time.

Jitsi had a few hiccups. appear.in was almost flawless.

Here’s what happened when I created a quick pivot table of it on my Google Sheet:

AppRTC was down 6% of the time…

Media quality

We now have our own unique ranking system for WebRTC media quality in test results.

I’ve showed that during the webinar as well, checking out the various services fair with maintaining stable quality throughout the month. Interestingly, they behaved rather differently from one another.

Watch the webinar to learn more about it.

The webinar and demo

Most of the webinar was a long demo session. You can view it all here:

You can open up your own testRTC account and play with our service a bit under evaluation.

Our next webinar – CI/CD

Lots of the work our customers do involved automation. They write scripts to automate testing and then they want to automate running these scripts when needed. This usually means running a nightly build, on code check in, etc.

testRTC caters to that through a simple to use API, which is what I’ll be demoing in the next webinar. And I won’t be doing it alone – I’ll be joined by Gustavo Garcia of Houseparty. Gustavo was one of our first customers to make use of our APIs in such a way.

If you want – register now

How to test network behavior in testRTC?

Earlier this week, we hosted our first webinar in 2019, something we hope to do a lot more (once a month if we can keep it up). This time, we focused on network behavior of SFU media servers.

One of the things we’ve seen with our customers is that different SFUs differ a lot in how they behave. You might not see that much when the network is just fine, but when things get tough, that’s when this will be noticed. This is why we decided to dedicate our first webinar this year to this topic.

There was another reason, and that’s the fact that testRTC is built to cater exactly to these situations, where controlling and configuring network conditions is something you want to do. We’ve built into testRTC 4 main capabilities to give you that:

#1 – Location of the probes

With testRTC, you can decide where you want the probes in your test to launch from.

You can use multiple locations for the same test, and we’re spread wider than what you see in the default UI (we give more locations and better granularity for enterprise customers, based on their needs).

Here’s how it looks like when you test and launch a plan:

In the above scenario, I decided to use probes coming from West US and Europe locations.

Here’s how I’ve spread a 16-browsers test in the webinar yesterday:

This allows you to test your service from different locations and see how well you’ve got your infrastructure laid out across the world to meet the needs of your customers.

It also brings us to the next two capabilities, since I also configured different networks and firewalls there:

#2 – Configuration of the probe’s network

Need to check over Wifi? 3G? 4G? Add some packet loss to the network indicating you want a bad 4G network connection? How about ADSL?

We’ve got that pre-configured and ready in a drop down for you.

I showed how this plays out when using various services online.

#3 – Configuration of the probe’s firewall

You can also force all media to be relayed via TURN servers by blocking UDP traffic or even block everything that isn’t port 443.

This immediately gives you 3 things:

  1. Know if you’ve got TURN configured properly
  2. The ability to stress test your TURN servers
  3. See what happens when media gets routed over TCP (it is ugly)

#4 – Dynamically controlling the probe’s network conditions

Sometimes what you want is to dynamically change network conditions. The team at Jitsi dabbled with that when they looked at Zoom (I’ve written about it on BlogGeek.me).

We do that using a script command in testRTC called .rtcSetNetworkProfile() which I’ve used during the webinar – what I did was this:

  • Have multiple users join the same room
  • They all stay in the room for 120 seconds
  • The first user gets throttled down to 400kbps on his network after 30 seconds
  • That lasts for 20 seconds, and then he goes back to “normal”

It looks something like this when see from one of the other user’s graphs:

The red line represents the outgoing bitrate, which is just fine – it runs in front of the SFU and there’s no disturbance there on the network. The blue line drops down to almost zero. And takes some time to recuperate.

The webinar and demo

Most of the webinar was a long demo session. You can view it all here:

You can open up your own testRTC account and play with our service a bit under evaluation.

Our next webinar – monitoring

Here’s a kicker – I’ve started working on our next webinar about a month ago. It was to do with monitoring and the things we can do there. I even have 3 monitors running for that purpose only for a month now:

That first one with the reds in it? That’s AppRTC… and it failed. At the time that we did our webinar on network testing. And I planned to use it to show some things. So I reverted to showing results of test runs from a day earlier.

Anyways, monitoring is what our next webinar is about.

I am going to show you how to set it up and how to connect it to third party services. In this case, it will be Zapier and Google Sheet where more analysis will take place.

If you want – register now

[Webinar Recording] Creating a Kickass WebRTC Monitor

A few weeks ago, we’ve hosted a webinar on creating an active monitoring system for your WebRTC application. Obviously, we’ve used testRTC for that.

We went through the following topics:

  • Why is WebRTC monitoring different than VoIP or Web monitoring? (that’s because it is a bit of both)
  • What do we mean when we say active monitoring? (checking for uptime and service quality in a predictable and reproducible fashion, and without violating user privacy or data compliance)
  • How to actually write and configure a monitor in testRTC. And then connect it to Slack for its alerts (did that as a live demo on our platform)
  • When and for what scenarios to use testRTC (there are quite a few that we see customers aiming for)

The recording is now available on YouTube:

If you are looking to improve the stability, quality and uptime of your WebRTC application, then we’re here to help you. Contact us to learn more

WebRTC Application Monitoring: Do you Wipe or Wash?

UPDATE: Recording of this webinar can be found here.

If you are running an application then you are most probably monitoring it already.

You’ve got New Relic, Datadog or some other cloud service or on premise monitoring setup handling your APM (Application Performance Management).

What does that mean exactly with WebRTC?

If we do the math, you’ve got the following servers to worry about:

  • STUN/TURN servers, deployed in one or more (probably more) data centers
  • Signaling server, at least one. Maybe more when you scale the service up
  • Web server, where you actually host your application and its HTML pages
  • Media servers, optionally, you’ll have media servers to handle recording or group calls (look at our Kurento sizing article for some examples)
  • Database, while you might not have this, most services do, so that’s another set of headaches
  • Load balancers, distributed memory datagrid (call this redis), etc.

Lots and lots of servers in that backend of yours. I like to think of them as moving parts. Every additional server that you add. Every new type of server you introduce. It adds a moving part. Another system that can fail. Another system that needs to be maintained and monitored.

WebRTC is a very generous technology when it comes to the variety of servers it needs to run in production.

Assuming you’re doing application monitoring on these servers, you are collecting all machine characteristics. CPU use, bandwidth, memory, storage. For the various servers you can go further and collect specific application metrics.

Is that enough? Aren’t you missing something?

Here are 4 quick stories we’ve heard in the last year.

#1 – That Video Chat Feature? It Is Broken

We’re still figuring out this whole embeddable communications trend. The idea of companies taking WebRTC and shoving voice and video calling capabilities into an existing product and workflow. It can be project management tools, doctor visitations, meeting scheduler, etc.

In some cases, the interactions via WebRTC are an experiment of sorts. A decision to attempt embedding communications directly to the existing product instead of having users find how to communicate directly (phone calls and Skype were the most common alternatives).

Treated as an experiment, such integrations sometimes were taken somewhat out of focus, and the development teams rushed to handle other tasks within the core product, as so often happens.

In one such case, the company used a CPaaS vendor to get that capability integrated with their service, so they didn’t think much about monitoring it.

At least not until they found out one day that their video meetings feature was malfunctioning for over two weeks (!). Customers tried using it and failed and just moved on, until someone complained loud enough.

The problem ended up being the use of deprecated CPaaS SDK that had to be upgraded and wasn’t.

#2 – But Our Service is Working. Just not the Web Calling Part

In many cases, there’s an existing communication product that does most of its “dealings” over PSTN and regular phone numbers. Then one day, someone decides to add browser dialing. Next thing that happens, you’ve got a core product doing communications with a new WebRTC-based feature in there.

Things are great and calls are being made. Until one day a customer calls to complain. He embedded a call button to his website, but people stopped calling him from the site. This has gone for a couple of days while he tried tweaking his business and trying to figure out what’s wrong. Until finding out that the click to call button on the website just doesn’t work anymore.

Again, all the monitoring and health check metrics were fine, but the integration point of WebRTC to the rest of the system was somewhat lost.

The challenge here was that this got caught by a customer who was paying for the service. What the company wanted to do at that point is to make sure this doesn’t repeat itself. They wanted to know about their integration issues before their customers do.

#3 – Where’s My Database When I Need it?

Here’s another one. A customer of ours has this hosted unified communications service that runs from the browser. You login with your credentials, see a contacts list and can dial anyone or receive calls right inside the browser.

They decided to create a monitor with us that runs at a low frequency doing the exact same thing: two people logging in, one calls and the other answers. Checking that there’s audio and video and all is well.

One time they contacted us complaining that our monitor is  failing while they know their system is up and running. So we opened up a failed monitor run, looked at the screenshot we collect automatically upon failure and saw an error on the screen – the browser just couldn’t get the address book of the user after logging in.

This had nothing to do with WebRTC. It was a faulty connection to the database, but it ended up killing the service. They got that pinpointed and resolved after a couple of iterations. For them, it was all about the end-to-end experience and making sure it works properly.

#4 – The Doctor Won’t See You Now

Healthcare is another interesting area for us. We’ve got customers in this space doing both testing and monitoring. The interesting thing about healthcare is that doctor visitations aren’t a 24/7 thing. For that particular customer it was a 3-hour day shift.

The service was operating outside of the normal working hours of the doctor’s office, with the idea of offering patients a way to get a doctor during the evening hours.

With a service running only part of the day, the company wanted to be certain that the service is up and running properly – and know about it as early on as possible to be able to resolve any issues prior to the doctors starting their shift.

End-to-End Monitoring to the Rescue

In all of these cases, the servers were up and running. The machines were humming along, but the service itself was broken. Why? Because application metrics tell a story, but not the whole story. For that, you need end-to-end monitoring. You need a way to run a real session through the system to validate that all of its pieces – all of its moving parts – are working well TOGETHER.

Next week, we will be hosting a webinar. In this webinar, we will show step by step how you can create a killer monitor for your own WebRTC application.

Oh – and we won’t only focus on working/not working type of scenarios. We will show you how to catch quality degradation issues of your service.

I’ll be doing it live, giving some tips and spending time explaining how our customers use our WebRTC monitoring service today – what types of problems are they solving with it.

Join me:

Creating a Kickass WebRTC Monitor Using testRTC
recording can be found here

 

Advanced Testing: Manipulating getUserMedia and Available Devices

Philipp Hancke is not new here on our blog. He has assisted us when we wrote the series on webrtc-internals. He is also not squeamish about writing his own testing environment and sharing the love. This time, he wanted to share a piece of code that takes device availability test automation in WebRTC to a new level.

Obviously… we said yes.

We don’t have that implemented in testRTC yet, but if you are interested – just give us a shout out and we’ll prioritize it.

Both Chrome and Firefox have quite powerful mechanisms for automating getUserMedia with fake devices and skipping the permission prompt.

In Chrome this is controlled by the use-fake-device-for-media-stream and use-fake-ui-for-media-stream command line flags while Firefox offers a preferences media.navigator.streams.fake. See the webdriver.js helper in this repository for the gory details of how to use this with selenium.

However there are some scenarios which are not testable by this:

  • getUserMedia returning an error
  • restricting the list of available devices

While most of these are typically handled by unit tests sometimes it is nice to test the complete user experience for a couple of use-cases

  • test the behaviour of a client with only a microphone
  • test the behaviour of a client with only a camera
  • test the behaviour of a client with neither camera or microphone
  • combine those tests with screen sharing which in some cases replaces the video track on appear.in
  • test audio-only clients interoperating with audio-video ones. The test matrix becomes pretty big at some point.

Those tests are particularly important because as developers we tend to do some manual testing on our own machines which tend to be equipped with both devices. Automated tests running on a continuous integration server help a lot to prevent regressions.

Manipulating APIs with an extension

In order to manipulate both APIs I wrote a chrome extension (which magically works in Firefox and Edge because both support webextensions) that makes them controllable.

An extension can inject javascript into the page on page load as a content script. This has been used in the webrtc-externals extension described on webrtchacks to wrap the whole RTCPeerConnection API.

In our case, the content script replaces the getUserMedia and enumerateDevices functions with wrappers that can be modified at runtime. For example, the enumerateDevices wrapper calls the original function and then uses Javascript to modify the result before returning it to the caller:

The full extension can be found on github. The behaviour is dynamic and can be controlled via sessionStorage flags. With Selenium, one would typically navigate to a page in the same domain, execute a small script to set the session storage flags as desired and then navigate to the page that is to be tested.

We will walk through two examples now:

Use-case: Have getUserMedia return an error and change it at runtime

Let’s say we want to test the case that a user has denied permission. For appear.in this leads to a dialog that attempts to help them with the browser UX to change that.

The full test can be found here. As most selenium tests, it consists of a series of simple and straightforward steps:

  • build a selenium webdriver instance that allows permissions and loads the extension
  • go to the appear.in homepage
  • set the  List of fake devices in Chrome WebRTC testing  flag to cause a NotAllowedError (i.e. the user has denied permission) as well as an appear.in specific localStorage property that says the visitor is returning — this ensures we go into the flow we want to test and not into the “getUserMedia primer” that is shown to first-time users.
  • join an appear.in room by loading the URL directly.
  • the next step would typically be asserting the presence of certain DOM elements guiding the user to change the denied permission. This is omitted here as those elements change rather frequently and replaced with a three second sleep which allows for a visual inspection. It should look like this:
  • the  List of fake devices in Chrome WebRTC testing  flag is deleted
  • this eventually leads to the user entering the room and video showing up. We do some magic here in order to avoid having to ask the user to refresh the page.

Watch a video of this test running below:

 

Incidentally, that dialog had a “enter anyway” button which, due to the lack of testing, was not visible for quite some time without anyone noticing because the visual regression tests could not access this stage. Now that is possible.

Restricting the list of available devices

The fake devices in both Chrome and Firefox return a stream with exactly those properties that you ask for and they always succeed (in Chrome there is a way to make them always fail too). In the real world you need to deal with users who don’t have a microphone or a camera attached to their machine. A call to getUserMedia would fail with a NotFoundError (note the recent change in Chrome 64 or simply use adapter.js and write spec-compliant code today).

The common way to avoid this is to enumerate the list of devices to figure out what is available using enumerateDevices by pasting this into the javascript console:

 

When you run this together with the fake device flag you’ll notice that it provides two fake microphones and one fake camera device:

When the extension is loaded (which for manual testing can be done on chrome://extensions; see above for the selenium ways to do it) one can manipulate that list:

Paste the enumerateDevices into the console again and the audio devices no longer show up:

At appear.in we used this to replace a couple of audio-only and video-only tests that used feature flags in the application code with more realistic behaviour. The extension allows a much cleaner separation between the frontend logic and the test logic.

Summary

Using a tiny web extension we could easily extend the already powerful WebRTC testing capabilities of the browsers and cover more advanced test scenarios. Using this approach it would even be possible to simulate events like the user unplugging the microphone during the call.