Category Archives for "Stories"

VoIP Network Tests in the era of WebRTC

Not sure what got to me last week, but I wanted to see what type of network testing for VoIP exists out there. This got me down memory lane to what felt like the wild west of the 90’s world wide web.

You can do that yourself! Just search for “voip network test” on Google and check what the tests look like. They come in exactly two shapes and sizes:

  1. A generic speed test
  2. Download a test app

None of these methods are good. They are either incorrect or full of friction.

The ones hosting these network tests are UCaaS vendors, trying to entice customers to come their way. The idea is, you run a test, and they nicely ask you how many phone lines you’d like a quote for…

So what’s wrong with that?

1. Generic speed tests aren’t indicative of ability to conduct VoIP calls

Most of the solutions I’ve found out there were just generic speed tests. Embedding a network test page of a third party or going to the length of installing your own speed testing machine, which is fine. But does it actually answer the question the user wants answered?

Here’s an interesting example where bandwidth speeds are GREAT but support for VoIP or WebRTC – not so much:

Great bandwidth, but no UDP available – a potential for bad VoIP call quality

I’ve used our Google Cloud machines to try with. It passes the speed test beautifully. What does that say about the quality I’ll get with it for VoIP? Not much.

For that same device on the same network, I am getting blocked over UDP. VoIP is conducted over UDP to maintain low latency and to handle packet losses (which happen on any network at one point or another).

This isn’t limited only to wholesale blocking of UDP traffic. Other aspects such as the use of a VPN, throttling of UDP, introduction of latency, access to the media devices – all these are going to affect the user’s experience and in many cases his ability to use your VoIP service.

👉 Relying only on a generic speed test is useless at best and misleading at worst.

2. Downloading test apps is not what you expect to do in 2021

In some cases, speed test services ask you to download and install an application.

There’s added friction right there. What if the user doesn’t have permission to install applications on his device? What if he is running on Linux? What if the user isn’t technically savvy?

I tried out one out one of these so called downloaded speed tests.

I clicked the “Start test” button. After some 10 seconds of waiting, it downloaded an executable to my machine. No further prompts of explanations given.

That brought the Windows 10 installation screen, with a name different than that of the vendor whose site I am on.

Deciding to install, I clicked again, only to be prompted by another installation window.

Next clicks? EULA, Opt-in, Folder selection, Finish

So… I had to agree to an EULA, actively remove an opt-in, select the folder to install (had a default there), reminded that it is running in the background now (WHY? For what purpose?), and then click on Finish.

It got me results, but at what cost and at what friction level for the end user?

In this specific case – before I even made a decision to use that service provider. And I had to:

  • Click on 6 buttons to get there
  • Sign a legal document (EULA)
  • Opt out from something (so it won’t leave ghosts on my machine)
  • Remember to go and delete what was downloaded

And there’s the challenge here of multiple popups and screen focus changes that took place throughout the experience.

The results might be accurate and useful, but there are better ways.

👉 Having a downloadable installed test adds friction and limit usability for your users.

What to look for in a VoIP network test?

There’s a dichotomy between the available solutions out there: they are either simple to use and grossly inaccurate, or they are accurate and complex to use.

Then there’s the fact that they answer only a single question – is there enough bandwidth. Less so to other network aspects like firewall and VPN configurations.

From our own discussions with clients and users, here’s what we learned in the last two years about how VoIP network tests should look like:

  • Simple to use
    • Simple for the end user to start the test
    • Simple for the support/IP person to see the results
    • Simple to read and understand the results
  • Specific to your infrastructure
    • A generic test is great, but isn’t accurate
    • Something that tests the network needs to test your infrastructure directly. If that’s impossible, then the best possible approximation to it
  • Supports your workflow
    • Ability to collect data you need about the user
    • Easily see the results on your end, to assist the client
    • Customizable to your business processes and use cases

Check qualityRTC

In the past two years or so we’ve been down this rabbit hole of VoIP network testing in testRTC. We’ve designed and built a service to tackle this problem, with a lot of help from our customers, we’ve improved on it and still are, to the point where it is today:

A simple to use, customizable solution that fits to your infrastructure and workflow

Within minutes, the user will know if his network is good enough for your service, and your support will have all the data points it needs to assist your user in case of connectivity issues.

Check out our friction-free solution, and don’t forget to schedule a demo!

Testing large scale WebRTC events on LiveSwitch Cloud

If you are developing WebRTC applications that target large scale events – think hundreds of users in a single “room”, then you should continue reading.

LiveSwitch Cloud by Frozen Mountain is a modern CPaaS offering focused around video communications. Naturally it makes use of WebRTC and relies on the long heritage and capabilities of Frozen Mountain in this space. Frozen Mountain has transitioned from a vendor that specializes in SDKs and media servers you can host on your own to providing also their managed cloud service. In essence, dogfooding their technology.

One of the strong markets that Frozen Mountain operates in is the entertainment industry, where large scale online virtual events are becoming the norm. A recent such testRTC client used our WebRTC stress testing capabilities to validate their scenario prior to a large event.

This client’s scenario included segmenting the audience of a live event into groups of 25 viewers that could easily be monitored by producers in a studio control room and displayed to performers as a virtual audience that they could see, hear, and interact with during the event. We settled on 36 such segments, totalling 900 viewers in this WebRTC stress test.

Here is a sample test run from the work done:

The graph above shows the 900 WebRTC probes that were used in one of these tests. The blue line denotes the incoming average bitrate over time of the main event as seen by each of the viewers. The redline is the outgoing bitrate. Since these viewers are used to convey an atmosphere to the event, there was no need to have them stream high bitrates – having 900 of them meant a lot of pixels in aggregate even at their low bitrate. You can see how the incoming bitrate stabilizes at around 2mbps for all the viewers.

This graph shows for each individual probe out of the 900 WebRTC browsers that we had what was the average bitrate they had throughout the test. It is a slightly different view towards the same data that is meant to find outliers.

There are slight variations to a few of the probes there, which shows a stable system overall.

What was great about this one, is the additional work Frozen Mountain did on their end: The viewers were split into segments that had to be filled randomly, as they would in real life. Each user joining in, coming in at his own pace, as opposed to packing the segments one after the other with people like automatons.

The above animation was created by Frozen Mountain to illustrate the audience. Each square is a user, and each segment/pool has 25 users in it. You can see how the 900 probes from testRTC randomly fill out the audience to capacity.

Testing for live WebRTC events at scale

We are seeing a different approach to testing recently.

As we are shifting from nice-to-have and proof-of-concepts to production systems, there is a bigger need to thoroughly test the performance and scale of WebRTC applications. This is doubly true for large events. Ones that are broadcasted live to audiences. Such events take place in two different industries: entertainment and enterprise.

Within the entertainment industry, it is about working alongside the pandemic. Being able to bring the audiences back to the stadiums and theatre halls, alas remotely. With enterprises it is a lot about virtual town halls, sales kickoffs and corporate team building where everyone is sheltered at home.

In both these industries the cost of a mistake is high since there is no second chance. You can’t really rerun that same match or reschedule that town hall. Especially not with so many people and planning involved to make this event happen.

End-to-end stress testing is an important milestone here. While media server frameworks and CPaaS vendors do their own testing, such solutions need to be tested end-to-end for scale. Bottlenecks can occur anywhere in the system and the only real way to find these bottlenecks is through rigorous stress testing.

Being able to create a test environment quickly and scale it to full capacity is paramount for the success of the platform used for such events, and it is where a lot of our efforts have been going to these recent months, as we see more vendors approaching us to help them with these challenges.

What we did on our end was solve some bottlenecks in our infrastructure that “held us back” and enabled us to assist our clients only up to 2,000 probes in a single test. We can now do more of it and with higher flexibility.

WebRTC Application Monitoring: Do you Wipe or Wash?

UPDATE: Recording of this webinar can be found here.

If you are running an application then you are most probably monitoring it already.

You’ve got New Relic, Datadog or some other cloud service or on premise monitoring setup handling your APM (Application Performance Management).

What does that mean exactly with WebRTC?

If we do the math, you’ve got the following servers to worry about:

  • STUN/TURN servers, deployed in one or more (probably more) data centers
  • Signaling server, at least one. Maybe more when you scale the service up
  • Web server, where you actually host your application and its HTML pages
  • Media servers, optionally, you’ll have media servers to handle recording or group calls (look at our Kurento sizing article for some examples)
  • Database, while you might not have this, most services do, so that’s another set of headaches
  • Load balancers, distributed memory datagrid (call this redis), etc.

Lots and lots of servers in that backend of yours. I like to think of them as moving parts. Every additional server that you add. Every new type of server you introduce. It adds a moving part. Another system that can fail. Another system that needs to be maintained and monitored.

WebRTC is a very generous technology when it comes to the variety of servers it needs to run in production.

Assuming you’re doing application monitoring on these servers, you are collecting all machine characteristics. CPU use, bandwidth, memory, storage. For the various servers you can go further and collect specific application metrics.

Is that enough? Aren’t you missing something?

Here are 4 quick stories we’ve heard in the last year.

#1 – That Video Chat Feature? It Is Broken

We’re still figuring out this whole embeddable communications trend. The idea of companies taking WebRTC and shoving voice and video calling capabilities into an existing product and workflow. It can be project management tools, doctor visitations, meeting scheduler, etc.

In some cases, the interactions via WebRTC are an experiment of sorts. A decision to attempt embedding communications directly to the existing product instead of having users find how to communicate directly (phone calls and Skype were the most common alternatives).

Treated as an experiment, such integrations sometimes were taken somewhat out of focus, and the development teams rushed to handle other tasks within the core product, as so often happens.

In one such case, the company used a CPaaS vendor to get that capability integrated with their service, so they didn’t think much about monitoring it.

At least not until they found out one day that their video meetings feature was malfunctioning for over two weeks (!). Customers tried using it and failed and just moved on, until someone complained loud enough.

The problem ended up being the use of deprecated CPaaS SDK that had to be upgraded and wasn’t.

#2 – But Our Service is Working. Just not the Web Calling Part

In many cases, there’s an existing communication product that does most of its “dealings” over PSTN and regular phone numbers. Then one day, someone decides to add browser dialing. Next thing that happens, you’ve got a core product doing communications with a new WebRTC-based feature in there.

Things are great and calls are being made. Until one day a customer calls to complain. He embedded a call button to his website, but people stopped calling him from the site. This has gone for a couple of days while he tried tweaking his business and trying to figure out what’s wrong. Until finding out that the click to call button on the website just doesn’t work anymore.

Again, all the monitoring and health check metrics were fine, but the integration point of WebRTC to the rest of the system was somewhat lost.

The challenge here was that this got caught by a customer who was paying for the service. What the company wanted to do at that point is to make sure this doesn’t repeat itself. They wanted to know about their integration issues before their customers do.

#3 – Where’s My Database When I Need it?

Here’s another one. A customer of ours has this hosted unified communications service that runs from the browser. You login with your credentials, see a contacts list and can dial anyone or receive calls right inside the browser.

They decided to create a monitor with us that runs at a low frequency doing the exact same thing: two people logging in, one calls and the other answers. Checking that there’s audio and video and all is well.

One time they contacted us complaining that our monitor is  failing while they know their system is up and running. So we opened up a failed monitor run, looked at the screenshot we collect automatically upon failure and saw an error on the screen – the browser just couldn’t get the address book of the user after logging in.

This had nothing to do with WebRTC. It was a faulty connection to the database, but it ended up killing the service. They got that pinpointed and resolved after a couple of iterations. For them, it was all about the end-to-end experience and making sure it works properly.

#4 – The Doctor Won’t See You Now

Healthcare is another interesting area for us. We’ve got customers in this space doing both testing and monitoring. The interesting thing about healthcare is that doctor visitations aren’t a 24/7 thing. For that particular customer it was a 3-hour day shift.

The service was operating outside of the normal working hours of the doctor’s office, with the idea of offering patients a way to get a doctor during the evening hours.

With a service running only part of the day, the company wanted to be certain that the service is up and running properly – and know about it as early on as possible to be able to resolve any issues prior to the doctors starting their shift.

End-to-End Monitoring to the Rescue

In all of these cases, the servers were up and running. The machines were humming along, but the service itself was broken. Why? Because application metrics tell a story, but not the whole story. For that, you need end-to-end monitoring. You need a way to run a real session through the system to validate that all of its pieces – all of its moving parts – are working well TOGETHER.

Next week, we will be hosting a webinar. In this webinar, we will show step by step how you can create a killer monitor for your own WebRTC application.

Oh – and we won’t only focus on working/not working type of scenarios. We will show you how to catch quality degradation issues of your service.

I’ll be doing it live, giving some tips and spending time explaining how our customers use our WebRTC monitoring service today – what types of problems are they solving with it.

Join me:

Creating a Kickass WebRTC Monitor Using testRTC
recording can be found here

 

Do Browser Vendors Care About Your WebRTC Testing?

It is 2017 and it seems that browser vendors are starting to think of all of us WebRTC developers and testers. Well… not all the browser vendors… and not all the time – but I’ll take what I am given.

I remember years ago when I managed the development of a VoIP stack, we decided to rewrite our whole test application from scratch. We switched from the horrible “native” Windows and Unix UI frameworks to a cross platform one – Tcl/Tk (yes. I know. I am old). We also took the time to redesign our UI, trying to make it easier for us and our developers to test the APIs of the VoIP stack. These were the good ol’ days of manual testing – automation wasn’t even a concept for us.

This change brought with it a world of pain to me. I had almost daily fights with the test manager who had her team file bugs that from my perspective were UI issues and not the product’s issues. While true, fixing these bugs and even adding more tooling for our testing team ended up making our product better and more developers-friendly – an important factor for a product used by developers.

Things aren’t much different in WebRTC-land and browsers these days.

If I had to guess, here’s what I’d say is happening:

  • Developers are the main customers of WebRTC and the implementation of WebRTC in browsers
  • Browser vendors are working hard on getting WebRTC to work, but at times neglected this minor issue of empowering developers with their testing needs
  • Testing tools provided by browsers specifically for WebRTC are second class citizens when it comes to… well… almost everything else in the browser

The First 5 Years

Up until now, Chrome was the most accommodating browser out there when it came to us being able to adopt it and automate it for our own needs. It was never easy even with Chrome, but it is working, so it is hard to complain.

Chrome gives us out of the box the following set of capabilities:

  1. Support for Selenium and WebDriver, which allows us to automate it properly (for most versions, most of the times, when things don’t go breaking up on us suddenly). Firefox has similar capabilities
  2. The webrtc-internals Chrome tab with all of its goodness and data
  3. Ability to easily replace raw inputs of camera and microphone with media files (even if at times this capability is buggy)

We’ve had our share of Chrome bugs that we had to file or star to get specific features to work. Some of it got solved, while others are still open. That’s life I guess – you win some and you lose some.

Firefox was not that fun, to say the least. We’ve been struggling for a long time with it trying to get it to behave with Selenium inside a Docker container. The end result never got beyond 5 frames per second. Somehow, the combination of technologies we’ve been using didn’t work and never got the attention of Mozilla to take a look at – it may well be our own ignorance of how and where to nag the Mozilla team to get that attention 🙂

Edge? Had nothing – or at least not close to the level that Chrome and Firefox have on offer. We will get there. Eventually.

This has been the status quo for quite some time. Perhaps the whole 5 years of WebRTC’s existence.

But now things are changing.

And they are becoming rather interesting.

Mozilla Wiresharking

Mozilla introduced last month the ability to log RTP headers in Firefox WebRTC sessions.

While Chrome had something similar for quite some time, Firefox took this a step further:

“Bug 1343640 adds support in Firefox version 55 to log the RTP header plus the first five bytes of the payload unencrypted. RTCP will be logged in full and unencrypted.”

The best thing though? It also shared a script that can convert these logs to PCAP files, making them readable in Wireshark – a popular open source tool for analyzing network traffic.

The end result? You can now analyze with more clarity what goes on the network and how the browser behaves – especially if you don’t have a media server in the middle (or if you haven’t invested in tools that enable you to analyze it already).

This isn’t a first for Mozilla. It seems that lately, they have been sharing some useful information and pieces of code on their new Advancing WebRTC blog – a definite resource you should be following if you aren’t already.

Edge Does BrowserStack

Microsoft has been on a very positive streak lately. For over a year now, most of the Microsoft announcements are actually furthering the cause of their customers and developers without creating closed gardens – something that I find refreshing.

When it comes to WebRTC, Microsoft recently released a new version of Edge (in beta still) that is interoperable with Chrome and Firefox – on the codec level. While that was a rather expected move, the one we’ve seen last week was quite surprising and interesting.

An Edge testing partnership with BrowserStack: If you want to test your web app on the Edge browser, you can now use BrowserStack for free to do that (there are a few free plans there for it).

How does WebRTC comes to play here? As an enabler to a new feature that got introduced there:

See how that Edge window inside a Chrome app running on a Mac looks?

Guess what – BrowserStack are using WebRTC to enable this screen casting feature. While the original Microsoft announcement removed any trace of WebRTC from it, you can still find that over the web (here, here and here for example). For the geeks, we have a webrtc-internal dump!

The feature is called “Live Testing” at BrowserStack and offers the ability to run a cloud machine running Windows 10 and the Edge browser – and have that machine stream its virtual screen to your local machine – all assuming the local browser you are using for it all supports WebRTC.

In a way, this is a replacement of VNC (which is what we use at testRTC to offer this capability).

Is this coming from Microsoft? From BrowserStack?

I don’t really think it matters. It shows how WebRTC is getting used in new ways and how browser vendors are a major part of this change.

Will Google even care?

Google has been running along with WebRTC, practically on their own.

Yes. Mozilla with Firefox was there from the beginning. Microsoft is joining with Edge. Apple is slowly being dragged into it if you follow the rumormill.

But Google has been setting the tone through the initial acquisitions it made and the ongoing investment in it – both in engineering and in marketing. The end result for Google’s investments (not only in WebRTC but in everything HTML5 related)? Desktop browsers market share dominance

With these new toys that other browser vendors are giving us developers and testers – may that be something to reconsider and revisit? We are the early adopters of browsers, and we usually pick and choose the ones that offer us the greater power and enable us to speed our development efforts.

I wonder if Google will answer in turn with its own new tools and initiatives or continue in their current trajectory.

Should we expect better tooling?

Yes. Definitely.

WebRTC is hard to develop compared to other HTML5 technologies and it is a lot harder to test. Test automation frameworks and commercial offerings tend to focus on the easier problems of browser testing and they often neglect WebRTC, which is where we try to fill in these gaps.

I for once, would appreciate a few more trinkets from browser vendors that we could adopt and use at testRTC.

Shocking – most apps are dealing with production bugs

I am not sure about you, but I get bored easily when people tell me a bug costs more in production than it does earlier on in the development lifecycle. It sounds correct, but usually it comes with product managers and sales people throwing out $$$ amounts trying to make a point of it.

Being in a company offering testing and monitoring puts me in an awkward when I am supposed to actually use such tactics with customers. And I hate it. So I try to stick to fact. Real hard facts. This is why I found ClusterHQ’s recent survey about application testing so interesting. I do know this information comes in part from a company selling testing products. I am also aware that this is a survey that is rather small and not coming from the academia (a place where real products don’t get made). But it still resonates with me.

I liked the questions ClusterHQ asked, and wanted to share here two of these:

Cost of bugs in production

I guess there are no surprises here, besides maybe the people who think finding bugs in development is expensive.

There are two issues that make production bugs really expensive:

  1. If customers find them, then it means you have someone complaining already. This can lead to churn – customers leaving your service. If it something critical that affects a large number of your customers, then you’re screwed already
  2. To fix a bug in production takes time. You usually want to recreate it in on of your internal environments, then fix it, then test the whole damn application again to see that nothing else broke and then upgrade production again. This time eats resources – development, testing and management

It always happens. There is no way to really get away from bugs in production – the question though, is the frequency in which they occur. Which brings us to the next question in this survey.

How often are bugs found in production

The frequency in which bugs are found in production. At these high rates, I wonder how things ever get solved.

With agile development, you can argue that these are non-issues. You are going to fix things on a daily or weekly basis, so anything found in production get squashed away rather fast and without too much of a hassle.

I am no expert in agile, but from looking at how WebRTC products are built, I think there are three areas where this approach is going to come back and bite you:

#1 – WebRTC relies on the browser

If your WebRTC product runs in a browser, then you relinquish some of your control to the browser vendors. And they are known to release their versions quite frequently (once every 6-8 weeks in an automated upgrade process). When it comes to WebRTC, they do tend to make changes that affect behavior and these may end up breaking your service.

How do you make sure you are not caught surprised by this? Do you test with the beta browser versions that are available? Do you make it a point to test this continuously or do you limit yourself to testing only when you release a new version?

#2 – More often than not, you rely on 3rd party frameworks (open source or commercial)

You use Kurento? Some other open source framework? Maybe a commercial product that acts as a media server or a gateway? A signaling framework you found on github? Or maybe it is a CPaaS vendors you opted for who is taking care of all communications for you.

Guess what – these things also need testing. Especially since you are using them in your own special way. I’ve been there, so I know it is hard to test every possible use case and every different way an API can be called. So thinks fall between the cracks.

When dealing with such issues and finding them in YOUR production – how long will it take that framework or product to be fixed so you can roll it out to your customers? Will it be at your development speeds or someone else’s?

#3 – Stress and Scale is devilishly hard to get right

Whenever someone starts using our service to test at scale, things break down. It can be minor things, like the fact that most services aren’t designed or built to get 10 people into the same session at the exact same moment (something that is hard to test so we rely on users just refreshing their browser). But it goes to serious issues, like degradation in bit rates and increase in packet losses the more people you throw on the service.

Finding these issues is one thing. Fixing it… that’s another one. Fixing large scale bugs is tough. It is tough because you need a way to reproduce them AND you need to find the culprit causing them.

If you don’t have a good way to reproduce large scale tests, then how are you supposed to be able to fix them?

What’s next?

If you end up using testRTC or not I leave for you to decide. We do have a product that takes care of many of the challenges when you test WebRTC products. So I invite you to try us out.

If you don’t – just do me a favor and take testing your product more seriously. When we work through evaluations, we almost often find bugs in production, and usually more than one. And that’s just from a single basic script we start with. It is time to look at WebRTC as more than a hobby.

Have you found serious bugs in production that you could have found and fixed if you tested WebRTC during development?