Tag Archives for " monitoring "

Monitoring WebRTC applications using testRTC

This time, I want to do a quick recap on a webinar we hosted last week. It dealt with monitoring WebRTC applications, and as is usual, we took the approach of doing a demo.

For this one, I’ve prepared for over a month. I’ve created 3 separate monitors, running on AppRTC, appear.in and Jitsi Meet.

Why did I pick these 3 services?

  1. They are all public, and don’t throttle or cap you in how they work (a lot of demos out there do that)
  2. They are quite different from one another, yet similar at the same time (not sure if that means anything, but it feels that way)
  3. AppRTC is used by many to build their first application, and it is Google’s “hello world” application for WebRTC (it is also unstable, which is a lot of fun monitoring)
  4. appear.in and Jitsi are widely known and quite popular. They are also treated as real services (where AppRTC is merely a demo)

What was the scenario I used?

  1. Create a meeting URL (either ad-hoc or predefined)
  2. Join the URL
  3. Wait for 2 full minutes so we have data to chew on
  4. Run the above every 15 minutes for a period of a bit over a month

Connecting the dots

I wanted to show a bit more than what you see in the testRTC dashboard. Partly out of curiosity but also because many of our monitoring customers do just that – connect the results they get from testRTC’s monitor runs in their own applications.

So I took this approach:

I’ve created a Google Sheet to collect all the results.

In Zapier, I then created a Webhook that collected the results into that Google Sheet.

In testRTC, I’ve configured a webhook to connect to Zapier.

10,000 monitor runs later… there was enough data to chew on.

Initial insights

Here’s how the data looks like on the testRTC dashboard:

The first row? AppRTC. It fails every once in awhile for 2+ hours at a time.

Jitsi had a few hiccups. appear.in was almost flawless.

Here’s what happened when I created a quick pivot table of it on my Google Sheet:

AppRTC was down 6% of the time…

Media quality

We now have our own unique ranking system for WebRTC media quality in test results.

I’ve showed that during the webinar as well, checking out the various services fair with maintaining stable quality throughout the month. Interestingly, they behaved rather differently from one another.

Watch the webinar to learn more about it.

The webinar and demo

Most of the webinar was a long demo session. You can view it all here:

You can open up your own testRTC account and play with our service a bit under evaluation.

Our next webinar – CI/CD

Lots of the work our customers do involved automation. They write scripts to automate testing and then they want to automate running these scripts when needed. This usually means running a nightly build, on code check in, etc.

testRTC caters to that through a simple to use API, which is what I’ll be demoing in the next webinar. And I won’t be doing it alone – I’ll be joined by Gustavo Garcia of Houseparty. Gustavo was one of our first customers to make use of our APIs in such a way.

If you want – register now

Testing Firefox has just become easier (and other additions in testRTC)

We’ve pushed a new release for our testRTC service last month. This one has a lot of small polishes along with one large addition – support for Firefox.

I’d like to list some of the things you’ll be able to find in this new release.

Firefox

When we set out to build testRTC, we knew we will need to support multiple browsers. We started off with Chrome (just like most companies building applications with WebRTC), and somehow drilled down into more features, beefing up our execution, automation and analysis capabilities.

We tried adding Firefox about two years ago (and failed). This time, we’re taking it in “baby steps”. This first release of Firefox brings with it solid audio support with rudimentary video support. We aren’t pushing our own video content but rather generating it ad-hoc. This results less effective bitrates that we can reach.

The challenge with Firefox lies in the fact that it has no fake media support the same way Chrome does – there is no simple way to have it take up media files directly instead of the camera. We could theoretically create virtual camera drivers and work our way from there, but that’s exactly where we decided to stop. We wanted to ship something usable before making this a bigger adventure (which was our mistake in the past).

Where will you find Firefox? In the profile planning section under the test editor:

When you run the tests, you might notice that we alternate the colors of the video instead of pushing real video into it. Here’s how it looks like running Jitsi between Firefox and Chrome:

That’s a screenshot we’ve taken inside the test. That cyan color is what we push as the video source from Firefox. This will be improved over time.

On the audio side you can see the metrics properly:

If you need Firefox, then you can now start using testRTC to automate your WebRTC testing on Firefox.

How we count minutes

Up until now, our per minute pricing for tests was built around the notion of a minimum length per test of 10 minutes. If you wanted a test with 4 probes (that’s 4 browsers) concurrently, we calculated it as 4*10=40 minutes even if the test duration was only 3 minutes.

That has now changed. We are calculating the length of tests without any specific minimum. The only things we are doing is:

  1. Length is rounded up towards the nearest minute. If you had a test that is 2:30 minutes long, we count it as 3 minutes
  2. We add to the test length our overhead of initiation for the test and teardown. Teardown includes uploading results to our servers and analyzing them. It doesn’t add much for smaller tests, but it can add a few minutes on the larger tests

End result? You can run more tests with the minutes allotted to your account.

This change is automatic across all our existing customers – there’s nothing you need to do to get it.

Monitoring tweaks

We’ve added two new capabilities to monitoring, due to requests of our customers.

#1 – Automated run counter

At times, you’ll want to alternate information you use in a test based on when it gets running.

One example is using multiple users to login to a service. If you run a high frequency monitor, which executes a test every 2-5 minutes, using the same user won’t be the right thing to do:

  • You might end up not leaving the first session when running the next monitor a couple of minutes later
  • Your service might leave session information for longer (webinars tend to do that, waiting for the instructors to join the same session for ten or more minutes after he leaves)
  • If a monitor fails, it might cause a transient state for that user until some internal timeout

For these, we tend to suggest clients to use multiple users and alternate between them as they run the monitors.

Another example is when you want in each round of execution to touch a different part of your infrastructure – alternating across your data centers, machines, etc.

Up until today, we’ve used to do this using Firebase as an external database source that knows which user was last used – we even have that in our knowledge base.

While it works well, our purpose is to make the scripts you write shorter and easier to maintain, so we added a new (and simple) environment variable to our tests called RTC_RUN_COUNT. The only thing it does is return the value of an iterator indicating how many times the test has been executed – either as a test or as a monitor.

It is now easy to use by calculating the modulu value of RTC_RUN_COUNT with the number of users you created.

You can learn more about RTC_RUN_COUNT and our other environment variables in our knowledge base.

#2 – Additional information

We had a customer recently who wanted to know within every run of a monitor specific parameters of that run – in his case, it was the part of his infrastructure that gets used during the execution.

He could have used rtcInfo(), but then he’ll need to dig into the logs to find that information, which would take him too long. He needed that when the monitors are running in order to quickly pinpoint the source of failures on his end.

We listened, and added a new script command – rtcSetAdditionalInfo(). Whatever you place in that command during runtime gets stored and “bubbled up” – to the top of test run results pages as well as to the test results webhook. This means that if you connect the monitor to your own monitoring dashboards for the service, you can insert that specific information there, making it easily accessible to your DevOps teams.

Onwards

We will be looking for bugs (and fixing them) around our Firefox implementation, and we’re already hard at work on a totally new product and on some great new analysis features for our test results views.

If you are looking for a solid, managed testing and monitoring solution for your WebRTC application, then try us out.

Monitoring WebRTC apps just got a lot more powerful

As we head into 2019, I noticed that we haven’t published much around here. We doubled down on helping our customers (and doing some case studies with them) and on polishing our service.

In the recent round of updates, we added 3 very powerful capabilities to testRTC that can be used in both monitoring and testing, but make a lot of sense for our monitoring customers. How do I know that? Because the requests for these features came from our customers.

Here’s what got added in this round:

1. HAR files support

HAR stands for HTTP Archive. It is a file format that browsers and certain viewer apps support. When your web application gets loaded by a browser, all network activity gets logged by the browser and can be collected by a HAR file that can later be retrieved and viewed.

Our focus has always been WebRTC, so collecting network traffic information that isn’t directly WebRTC wasn’t on our minds. This changed once customers approached us asking for assistance with sporadic failures that were hard to reproduce and hard to debug.

In one case, a customer knew there’s a 502 failure due to the failure screenshot we generate, but it wasn’t that easy to know which of his servers and services was the one causing it. Since the failure is sporadic and isn’t consistent, he couldn’t get to the bottom of it. By using the HAR files we can collect in his monitor, the moment this happens again, he will have all the network traces for that 502, making it easier to catch.

Here’s how to enable it on your tests/monitors:

Go to the test editor, and add to the run options the term #har-file

 

Once there and the test/monitor runs next, it will create a new file that can be found under the Logs tab of the test results for each probe:

We don’t handle visualization for HAR files for the moment, but you can download the file and place it on a visual tool.

I use netlog-viewer.

Here’s what I got for appr.tc:

2. Retry mechanism

There are times when tests just fail with no good reason. This is doubly true for automating web UI, where minor time differences may cause problems or when user behavior is just different than an automated machine. A good example is a person who couldn’t login – usually, he will simply retry.

When running a monitor, you don’t want these nagging failures to bog you down. What you are most interested in isn’t bug squashing (at least not everyone) it is uptime and quality of service. Towards that goal, we’ve added another run option – #try

If you add this run option to your monitor, with a number next to it, that monitor will retry the test a few more times before reporting a failure. #try:3 for example, will retry twice the same script before reporting a failure.

What you’ll get in your monitor might be something similar to this:

The test reports a success, and the reason indicates a few times where it got retried.

3. Scoring of monitor runs

We’ve started to add a scoring system to our tests. This feature is still open only to select customers (want to join in on the fun? Contact us)

This scoring system places a test based on its media metrics collected on a scale of 0-10. We decided not to go for the traditional MOS scoring of 1-5 because of various reasons:

  1. MOS scoring is usually done for voice, and we want to score video
  2. We score the whole tests and not only a single channel
  3. MOS is rather subjective, and while we are too, we didn’t want to get into the conversation of “is 3.2 a good result or a bad result?”

The idea behind our scores is not to look at the value as good or bad (we can’t tell either) but rather look at the difference between the value across probes or across runs.

Two examples of where it is useful:

  1. You want to run a large stress test. Baseline it with 1-2 probes. See the score value. Now run with 100 or 1000 probes. Check the score value. Did it drop?
  2. You are running a monitor. Did today’s runs fair better than yesterday’s runs? Worse? The same?

What we did in this release was add the score value to the webhook. This means you can now run your monitors and collect the media quality scores we create and then trendline them in your own monitoring service – splunk, elastic search, datadog, whatever.

Here’s how the webhook looks like now:

The rank field in the webhook indicates the media score of this session. In this case, it is an AppRTC test that was forced to run on simulated 3G and poor 4G networks for the users.

As with any release, a lot more got squeezed into the release. These are just the ones I wanted to share here this time.

If you are interested in a monitoring service that provides predictable synthetics WebRTC clients to run against your service, checking for uptime and quality – check us out.

Using testRTC for WebRTC-PSTN testing and monitoring

When we started a couple of years ago, we started receiving requests from contact center vendors to support scenarios that involve both WebRTC and PSTN.

Most of these were customers calling from a regular phone to an agent sitting in front of his browser and accepting the call using WebRTC. Or the opposite – contact center agents dialing out from their browser towards a regular phone.

That being the case, we thought it was high time we took care of that and give a better, more thorough explanation on how to get that done. So we partnered with Twilio on this one, took their reference application of a contact center from github, and wrote the test scripts in testRTC to automate it.

Along the way, we’ve made use of Twilio to accept calls and dial out calls; dabbled with AWS Lambda; etc.

It was a fun project, and Twilio were kind enough to share our story on their own blog.

If you are trying to test or monitor your contact center, and you need to handle scenarios that require PSTN automation mangled with WebRTC, then this is mandatory reading for you:

Automate Your Twilio Contact Center Testing with testRTC

And if you need help in getting that done, just ping us.

[Webinar Recording] Creating a Kickass WebRTC Monitor

A few weeks ago, we’ve hosted a webinar on creating an active monitoring system for your WebRTC application. Obviously, we’ve used testRTC for that.

We went through the following topics:

  • Why is WebRTC monitoring different than VoIP or Web monitoring? (that’s because it is a bit of both)
  • What do we mean when we say active monitoring? (checking for uptime and service quality in a predictable and reproducible fashion, and without violating user privacy or data compliance)
  • How to actually write and configure a monitor in testRTC. And then connect it to Slack for its alerts (did that as a live demo on our platform)
  • When and for what scenarios to use testRTC (there are quite a few that we see customers aiming for)

The recording is now available on YouTube:

If you are looking to improve the stability, quality and uptime of your WebRTC application, then we’re here to help you. Contact us to learn more

1 2 3