Tag Archives for " monitoring "

How can watchRTC improve your WebRTC service operations?

watchRTC is our most recent addition to the testRTC product portfolio. It is a passive monitoring service that collects events information and metrics from WebRTC clients and analyzes, aggregates and visualizes it for you. It is a powerful WebRTC monitoring and troubleshooting platform, meant to help you improve and optimize your service delivery.

Learn more about watchRTC

It’s interesting how you can start building something with an idea of how your users will utilize it, to then find out that what you’ve worked on has many other uses as well.

This is exactly where I am finding myself with watchRTC. Now, about a year after we announced its private beta, I thought it would be a good opportunity to look at the benefits our customers are deriving out of it. The best way for me to think is by writing things down, so here are my thoughts at the moment:

What is watchRTC and how does it work?

watchRTC collects WebRTC related telemetry data from end users, making it available for analysis in real time and in aggregate.

For this to work, you need to integrate the watchRTC SDK into your application. This is straightforward integration work that takes an hour or less. Then the SDK can collect relevant WebRTC data in the background, while using as little CPU and network resources as possible.

On the server side, we have a cloud service that is ready to collect this telemetry data. This data is made available in real-time for our watchRTC Live feature. Once the session completes and the room closes, the collected data can get further analyzed and aggregated.

Here are 3 objectives we set out to solve, and 6 more we find ourselves helping with:

#1- Bird’s eye view of your WebRTC operations

This is the basic thing you want from a WebRTC passive monitoring solution. It collects data from all WebRTC clients, aggregates and shows it on nice dashboards:

The result offers powerful overall insights into your users and how they are interacting with your service.

#2- Drilldown for debugging and troubleshooting WebRTC issues

watchRTC was built on the heels of other testRTC services. This means we came into this domain with some great tooling for debugging and troubleshooting automated tests.

With automated testing, the mindset is to collect anything and everything you can lay your hands on and make it as detailed as possible for your users to use it. Oh – and be sure to make it simple to review and quick to use.

We took that mindset to watchRTC with a minor difference – some limits on what we collect and how. While we’re running inside your application we don’t want to interrupt it from doing what it needs to do.

What we ended up with is the short video above.

From a history view of all rooms (sessions) you can drill down to the room level and from there to the peer (user) level and finally from there to the detailed WebRTC analytics domain if and when needed.

In each layer we immediately highlight the important statistics and bubble up important notifications. The data is shown on interactive graphs which makes the work of debugging a lot simpler than any other means.

#3 – Monitoring WebRTC at scale

Then there’s the monitoring piece. Obvious considering this is a monitoring service.

Here the intent is to bubble up discrepancies and abnormal behavior to the IT people.

We are doing that by letting you define the thresholds of various metric values and then bubbling up notifications when such thresholds are reached.

Now that we’re past the obvious, here are 5 more things our clients are doing with watchRTC that we didn’t think of when we started off with watchRTC:

#4 – Application data enrichment and insights

There’s WebRTC session data that watchRTC collects automatically, and then there’s the application related metadata that is needed to make more sense out of the WebRTC metrics that are collected.

This additional data comes in different shapes and sizes, and with each release we add more at our clients request:

  • Share identifiers between the application and watchRTC, and quickly switch from one to the other across monitoring dashboards
  • Add application specific events to the session’s timeline
  • Map the names of incoming channels to other specific peers in a session
  • Designate different peers with different custom keys

The extra data is useful for later troubleshooting when you need to understand who the users involved are and what actions have they taken in your application.

#5 – Deriving business intelligence

Once we started going, we got these requests to add more insights.

We already collect data and process it to show the aggregate information. So why not provide filters towards that aggregate information?

Starting with the basics, we let people investigate the information based on dates and then added custom keys aggregation.

Now? We’re full on with high level metrics – from browsers and operating systems, to score values, bitrates and packet loss. Slice and dice the metrics however you see fit to figure out trends within your own custom population filters.

On top of it all, we’re getting ready to bulk export the data to external BI systems of our clients – some want to be able to build their own queries, dashboards and enrichment.

#6 – Rating, billing and reporting

Interestingly, once people started using the dashboards they then wanted to be able to make use of it in front of their own customers.

Interestingly, not all vendors are collecting their own metrics for rating purposes. Being able to use our REST API to retrieve highlights for these, and base it on the filtering capabilities we have, enables exactly that. For example, you can put a custom key to denote your largest customers, and then track their usage of your service using our APIs.

Download information as PDFs with relevant graphs or call our API to get it in JSON format.

#7 – Optimization of media servers and client code

For developers, one huge attraction of watchRTC is their ability to optimize their infrastructure and code – including the media servers and the client code.

By using watchRTC, they can deploy fixes and optimizations and check “in the wild” how these affect performance for their users.

watchRTC collects every conceivable WebRTC metric possible, optimization work can be done on a wide range of areas and vectors as the results collected capture the needed metrics to decide the usefulness of the optimizations.

#8 – A/B testing

With watchRTC you can A/B test things. This goes for optimizations as well as many other angles.

You can now think about and treat your WebRTC infrastructure as a marketer would. By creating a custom key and marking different users with it, based on your own logic, you can A/B test the results to see what’s working and what isn’t.

It is a kind of an extension of optimizing media servers, just at a whole new level of sophistication.

#9 – Manual testing

If you remember, our origin is in testing, and testing is used by developers.

These same developers already use our stress and regression testing capabilities. But as any user relying on test automation will tell you – there are times when manual testing is still needed (and with WebRTC that happens quite a lot).

The challenge with manual testing and WebRTC is data collection. A tester decided to file a bug. Where does he get all of the information he needs? Did he remember to keep his chrome://webrtc-internals tab open and download the file on time? How long does it take him to file that bug and collect everything?

Well, when you integrate with watchRTC, all of that manual labor goes away. Now, the tester needs only to explain the steps that caused the bug and add a link to the relevant session in watchRTC. The developer will have all the logs already there.

With watchRTC, you can tighten your development cycles and save expensive time for your development team.

watchRTC – run your WebRTC deployment at the speed of thought

One thing we aren’t compromising with watchRTC is speed and responsiveness. We work with developers and IT people who don’t have the time to sit and wait for dashboards to load and update, therefore, we’ve made sure and are making it a point for our UI to be snappy and interactive to the extreme.

From aggregated bird’s eye dashboard, to filtering, searching and drilling down to the single peer level – you’ll find watchRTC a powerful and lightning fast tool. Something that can offer you answers the moment you think of the questions.

If you’re new to testRTC and would like to find out more we would love to speak with you. Please send us a brief message, and we will be in contact with you shortly.

Understanding a call center agent’s network in a WFH world

As we settle into 2022, it seems like call center agents may continue in their WFH (work from home) mode even beyond the pandemic. This will be done either part time or full time, for some agents or for all of them.
The reasons for that are wide and varied, but that’s probably a topic for another time. This time, I’d like to discuss what we are going to do moving forward, to ensure that those reaching out to your call center get the best call quality possible, even when your agents are working from home.

The shift of the call center agent to WFH

Since the pandemic started, those who are able to work remotely have been directed to do so. That includes call center agents – the people who answer the phone when we want to complain, book, order, cancel or do a myriad of other activities in front of businesses.
The whole environment and architecture of the call center has changed due to the new world we live in today.

In the past, this used to be the call center:

The call center PBX, the network connections to the agents, the agent’s environment (room), computer and phone have all been in our control and in our office facilities.

Now? It looks more like this for an on premise call center:

With an on premise call center and work from home agents, we’re likely to deploy an SBC (Session Border Controller) and/or a VPN to connect the agents back to the office. It adds more moving parts, and burdens the internet connection of the office, but it is the fastest patch that can be employed and it might be the only available solution if you can’t or don’t want to run your call center in the cloud.

Or this for a cloud call center:

In a cloud call center, the agents connect directly to the cloud from their home office.
Just like the on premise call center, the cloud solution ends up with some new challenges. Mainly – the loss of control:

  • We don’t control the network quality of the agent
  • The environment of the agent is out of our control
  • It is likely that the device and peripherals of the agent are still in our control. But that’s not always the case either

And, even with our best intentions in asking the agents to be on ethernet, on a good network and in a quiet environment, they can struggle with doing it well enough.

Can you hear me now?

With work from home call center agents our main challenge becomes controlling their home environment and network.

At home, agents will have noise around them. Kids playing, family members watching television, the neighbors renovating (I had my share of this one during the pandemic), or traffic noises from the street. By using better headsets and noise suppression these can be improved and even solved.

The network is the bigger headache though. Many of your agents are likely to be non-technical in nature. How do they configure their home network? Which ISP are they using and with which communication bundle? How are they even connected to the network – via wifi or ethernet? How far are they from the wifi access point? Who else is using their home network and how? How is their network configured?
The answers to the questions above are going to affect the network quality and resulting audibility of their calls.
Since we can’t control their network, we at least want to understand it properly to be able to make intelligent decisions, such as routing calls to agents that have better networks and environments or to assist our agents in improving their network and environment.

Assessing a WFH call center agent’s environment

They say that knowing is half the battle. In order to solve a call quality problem you should start from understanding what is causing it, and that comes from understanding the network and environment of your agent.
There’s no specific, single solution or problem here, which is why the process usually takes a lot of back and forth interactions between the agent and the IT/support helping them out remotely.

What are the things that you’d like and need answers to?

  • What machine, operating system and browser is the agent using?
  • Are they using a headset? Is it a bluetooth one? Is it the one provided to them for this purpose?
  • Where is the agent located exactly? What ISP are they connected through?
  • Is the agent using a VPN? Are they behind a firewall? Has someone configured the agent’s DNS servers inappropriately? (you’ll be surprised)
  • Are their calls directed to the correct call center in a region nearby?
  • Can their calls flow over UDP or are they forced over TCP?
  • Are all of your applications needed by the agent available and reachable?
  • What does the agent’s network look like? Is it fiber? ADSL? Something else? Is their uplink accommodating enough for calling services?
  • How much VoIP traffic can their network handle?
  • When the agent connects to the PBX, what call quality do we measure?
  • Is his network clean or noisy with packet losses and jitter?
  • What’s the latency like?

Getting answers to these questions quickly and accurately reduces the handling time of such issues. This is what our clients use our qualityRTC product for – to get the data they need as fast as possible to help them resolve issues sooner.

What’s your workflow?

Each call center has its own nuances – different infrastructure to test and different locations.
You have your own workflow and support process to tackle issues. Do you empower agents with self service, or keep close tabs on when and how network tests are conducted?
Some would rather have agents test their network daily at the beginning of the shift, while others want that to take place only when issues arise.
Large call centers usually need access to the data for BI purposes. Others want to map all their call center agents’ status once in a while – just to understand where they stand.

We’ve built qualityRTC with the help of the biggest call center providers out there, so we’ve got you covered no matter your workflow. qualityRTC is flexible to the level you’ll need to help you in reducing your support strains of WFH agents and get you focused on what really matters – your customers.

If you want to really up your game in WebRTC diagnostics – for either voice or video scenarios – with Twilio, some other CPaaS vendor or with your own infrastructure – let us know. We can help using our qualityRTC network testing solution.

 

Here’s why Intermedia turned to testRTC to proactively monitor their AnyMeeting web conferencing service

“testRTC enables us to monitor meeting performance from start to finish, with a focus on media quality. We get the best of everything with this platform.”

As 2019 came to a close, I had a chance to sit and chat with Ryan Cartmell, Director of Production System Administration at Intermedia®. Ryan manages a team responsible for monitoring and maintaining the production environment of Intermedia. His top priority is maintaining uptime and availability of Intermedia’s services.

In 2017, Intermedia acquired AnyMeeting®, a web conferencing platform based on WebRTC. Since then, Ryan and his team have been working towards building up the tools and experience needed to give them visibility into media quality and meeting performance.

Initially, these tools took care of two levels of monitoring:

  1. System resource performance monitoring was done by collecting and looking at server metrics
  2. Application level monitoring was incorporated by collecting certain application specific metrics, aggregating them in an in-house monitoring platform as well as using a cloud APM vendor

This approach gave a decent view of the service and its performance, but it has its limits.

What you don’t know you don’t know

The way such monitoring systems work is by collecting metrics, data and logs from the various components in the system as well as the running applications. Once you have that data, you can configure alerts and dashboard based on what you know you are looking for.

If an issue was found in the AnyMeeting service, Ryan’s team would try to understand how the issue can be deduced from the logs and available information, creating new alerts based on that input. Doing so would ensure that the next time the same issue occurred, it would be caught and dealt with properly.

The challenge here is that you don’t know what you don’t know. You first need a problem to occur and someone to complain in order to figure out a rule to alert for it happening again. And you can never really reach complete coverage over potential failures.

This kept the Intermedia Operations team in a reactive position. What they wanted and needed was a way to proactively run and test the system to catch any issues in their environment.

Proactively handling WebRTC issues using testRTC

Get ahead of issues and not wait for customers to report issues. 

“With testRTC we are now able to get ahead of issues and not wait for customers to report issues.”

testRTC is an active monitoring test engine that allows Intermedia to proactively test services. This enabled Intermedia to become aware of their performance as well as in the total service availability of the AnyMeeting platform.

Intermedia deployed multiple testRTC monitors, making sure its data centers are probed multiple times an hour. The monitors are active checks that create real web conferences on AnyMeeting, validating that a session works as expected from start to finish. From logging in, through communicating via voice and video to screen sharing. If things go awry or expected media threshold aren’t met, alerts are issued, and Ryan’s team can investigate.

A screenshot from one of the testRTC monitors for the AnyMeeting service

These new testRTC monitors mapped and validated the complete user experience of the AnyMeeting service, something that wasn’t directly monitored before.

Since its implementation, testRTC has helped Intermedia identify various issues in the AnyMeeting system – valuable information that Intermedia, as a market leader in unified communications, uses in its efforts to continually improve the performance and quality of its services.

The data that testRTC collects, including the screenshots it takes, make it a lot easier to track down issues. Before testRTC, using performance metrics alone, it was really difficult to understand the overall impact to an end user. Now it is part of the analysis process.

Working with testRTC

Since starting to work with testRTC and installing its monitors, Intermedia has found the following advantages of working with testRTC:

  1. The flexibility of the testRTC platform – this enables Intermedia to test all elements of the web platform service it offers
  2. Top tier support – testRTC team was there to assist with any issues and questions Intermedia had
  3. Level of expertise – the ability of testRTC to help Intermedia work through issues that testRTC exposes

For Ryan, testRTC gives a level of comfort knowing that tests are regularly being performed.  And, if a technical challenge does arise, the data available from testRTC will enable Ryan and his team to triage the issue a lot easier than they used to.

Intermedia and AnyMeeting are trademarks or registered trademarks of Intermedia.net, Inc. in the United States and/or other countries.

2

Network monitoring: 8 benefits of active monitoring in WebRTC

Call it WebRTC active monitoring or WebRTC synthetic monitoring, the concept is rather simple. What you are trying to do is run a scenario from real browsers the same way a user would. Why? So you can see (=track and monitor) your WebRTC application the way your customers do.

You know pingdom? It is a service that pings your website every couple of seconds. If it fails to get a response – you receive an email that your website is down. A simple and straightforward solution. There are many similar services out there and they work beautifully. If all you’re after is to answer the question “is my website still up?”

This, though is different than asking the question “is my website working properly?”

How do you go about monitoring a website for that? You dig one or two levels deeper, specifically, by putting on probes that load your webpages and look for indication that these pages are fresh and not erroneous. Why? Because a ping test of a website can be happy with this kind of a result:

That’s Google Calendar being down a few weeks back. I am not sure that a ping test would notice that, as a page does load.

The path to synthetic/active monitoring

What would an IT person do? Add more metrics that he can track. CPU use, memory use, network traffic. And then add more metrics from the application: page views, open sessions, etc.

These metrics are prone to two problems:

  1. Seasonality changes their behavior. Think weekend or holiday traffic versus regular days, or opening hours versus night time
  2. The lights might be on but there’s nobody home. All looks fine, but somehow a user is unable to login or get connected to a certain service due to breakage in the connection of two internal systems. Since monitoring is done on low level metrics, such cases might be missed

The next step for our IT person would be to have a probe act like a user to going through the system to understand its behavior. These probes conduct synthetic monitoring, where they act like real users going through the system.

The same applies to WebRTC applications as well.

8 benefits of active monitoring in WebRTC

Call it WebRTC active monitoring or WebRTC synthetic monitoring, the concept is rather simple. What you are trying to do is run a scenario from real browsers the same way a user would. Why? So you can see (=track and monitor) your WebRTC application the way your customers do. And once you automate it and run it frequently, you can gain insights and understanding that you just can’t get in any other way.

Here are 8 benefits that got customers like Vidyo to use testRTC for monitoring their WebRTC cloud deployment:

#1 – Predictability and Objectivity

When you run an active monitor you are in control. You know where the probes are coming from, what is the performance of the machines they use offer, and what their network conditions are. And if you don’t, then running that active monitor in the same scenario a couple of times will create the baseline you need.

With that information, you can now run the scenario as an active monitor, and if all goes well the results will be consistent. The moment something changes – there’s a pretty high level of confidence that something changed in your WebRTC deployment. That’s predictability.

The fact that the metrics collected and analyzed results are based on machine automation, you also gain objectivity. While it will be hard to say how bad a jitte value of 120 is versus 100, it will be easy to say that if you had a jitter value of 100 for a few months and now that has changed to 120 in the monitor you are running, then things changed for the worse, and it would be wise to check why.

#2 – End-to-End

When we deploy a monitor with a new client of ours at testRTC there’s almost always a learning period of a month or two. At that time, we need to assist our client to fine tune and tweak the script written for the monitor.

Common things we need to do is slow down button clicking or add retries in certain strategic places (like login procedures). Why? Because production WebRTC services sometimes receive 502 when people try to login, connect or start sessions. Real users would simply refresh the page by clicking F5 or retry clicking a button.

In some cases, our clients would go about hunting these bugs and fix them. In others, we’d build these retry mechanisms into the script used by the monitor.

The thing is though, that when a WebRTC session fails, it can fail a lot before it even started. Or it can work nicely, but screen sharing fails. Or screen sharing will work but PSTN dial-in won’t. Being able to define the most important WebRTC scenarios and synthetically monitor for them gives you an end-to-end solution.

#3 – Be the first to know

You need to be the first to know when there is an issue. That issue can be with the login, directory service, session initialization, media quality or any other problem that might arise.

If you are operating a contact center, then calls take place at certain times of the day (office hours). Understanding potential failures before they happen simply by running a monitor prior to a Monday morning shift starting the day would give you more time to resolve issues.

If you have millions of calls taking place a day on your system, then this might not be an issue for you – or more likely, your users would complain at the same time your service monitoring will notice a failure. In such a case, other reasons such as predictability would make more sense to using synthetic monitoring. This is doubly so since using predictable probes that create synthetic sessions should result consistent outcomes, as opposed to real users where you lack any control over their machine, location and network.

#4 – Simplicity

There’s something to be said about simple approaches for complex problems.

When users can’t connect to your service, do you know why that is? If they complain about quality, is it because of their device, network connection or your service? How do you even go about analyzing this?

WebRTC synthetic monitoring reduces a lot of the variables and brings predictability with it into the process. What you end up collecting and how you serve that to the IT person in charge is also quite important – there are so many metrics and parameters to look at with WebRTC that many don’t find their way around.

What we’re razor focused in testRTC is in making the analysis process as simple as possible to our clients. Letting them glean the insights they need with the least amount of effort on their part. Our upcoming release goes in that trend and is already being trialed by a few of our clients.

#5 – Debuggability

The monitor failed or alerted at an issue. Great. Now what? How do you make that alert an actionable one?

With passive monitoring of live users, there’s very little you can do in a lot of cases. Quality is a subjective thing that is affected quite a lot by the user’s own device and network. Move a meter or two farther away from your current position while in a call, and your Wifi connection might become unusable. In my house, using Wifi in the bedroom is quite the challenge. The living room and my home office? They’re guaranteed to give high network quality. At least up to the carrier. My desktop has its good days and bad days, depending on the number of Chrome tabs opened and the number of days since the last reboot.

If you run a synthetic monitor for WebRTC, then there are quite a few things at your disposal. Here are some that we’ve implemented in testRTC for our clients:

  • Collect all possible data, so developers can look at logs and figure out the issues. This includes WebRTC metrics, browser console logs, network events log, browser performance data and screenshots
  • Visualize the scenario and the metrics collected, keeping it simple at first glace with high level graphs and aggregations while enabling drill down to the minute details
  • Automate threshold on metrics, to make sure tests warn or fail on certain use cases and conditions that are suitable for you
  • Grab a screenshot at the time of failure, so you can see the moment the scenario fails
  • Execute the scenario again, so you can see the failure (since the scenario and probes are predictable, there’s a high likelihood the failure will occur again)
  • Join a running synthetic session via VNC, so you can see for yourself how the session progresses

#6 – No instrumentation

Synthetic monitoring requires no instrumentation of your service.

Since you end up using real browsers, running real scenarios, the only thing you’ll probably need is create certain users for running the monitor and that’s about it.

There’s no code you’ll need to inject into your service. No js file to include. No SDK to compile into your app.

That means it is faster to deploy to production than alternatives and the potential effect it has on your service due to the addition of external code is non-existent, since you’ve changed nothing in the code.

#7 – Privacy

A synthetic monitor collects synthetic metrics. It doesn’t sit on your live users, so there’s no live user data collection taking place. There’s also no real indication of the size of your deployment, the trajectory and growth of your service or anything similar associated with it.

We’ve seen reluctance of clients to share such data with cloud based services. These mostly stem from legal issues such as where the data gets collected and stored, but also from a business perspective of having a third party trusted with the day to day communications that takes place. In many cases, companies are happier having this part of the operation take place in-house.

With an active monitor, the only data collected and analyzed is the data generated by the browser of the active monitor itself and no one else. The users used by the active monitor are dummy users created for that purpose only.

#8 – Fixed investment

Talking about predictability… as your service grows, a WebRTC active monitor act in the same manner. This means your investment in running the monitor won’t be changing either. This is never the case with a passive monitor, where pricing is based on the size of the user base as well as the amount of traffic.

That means you can budget and plan ahead for longer periods of time at relatively low investment.

When will you need to grow your investment? When you want to deepen your analysis. This is done by deploying more monitors (to run from more geographic locations or to hit different data centers of your service), increasing the frequency of the monitors (to get alerted on issues earlier) or when you beef up monitors (by adding more probes to test larger video group calls for example).

testRTC’s active monitoring

If you are in need of better visibility of your WebRTC application, then by all means – explore passive monitoring and deploy it. But also check how active monitoring can improve your day-to-day operations and in the end, improve uptime and media quality for your users.

We’re here to help, so contact us for a demo.

Monitoring WebRTC applications using testRTC

This time, I want to do a quick recap on a webinar we hosted last week. It dealt with monitoring WebRTC applications, and as is usual, we took the approach of doing a demo.

For this one, I’ve prepared for over a month. I’ve created 3 separate monitors, running on AppRTC, appear.in and Jitsi Meet.

Why did I pick these 3 services?

  1. They are all public, and don’t throttle or cap you in how they work (a lot of demos out there do that)
  2. They are quite different from one another, yet similar at the same time (not sure if that means anything, but it feels that way)
  3. AppRTC is used by many to build their first application, and it is Google’s “hello world” application for WebRTC (it is also unstable, which is a lot of fun monitoring)
  4. appear.in and Jitsi are widely known and quite popular. They are also treated as real services (where AppRTC is merely a demo)

What was the scenario I used?

  1. Create a meeting URL (either ad-hoc or predefined)
  2. Join the URL
  3. Wait for 2 full minutes so we have data to chew on
  4. Run the above every 15 minutes for a period of a bit over a month

Connecting the dots

I wanted to show a bit more than what you see in the testRTC dashboard. Partly out of curiosity but also because many of our monitoring customers do just that – connect the results they get from testRTC’s monitor runs in their own applications.

So I took this approach:

I’ve created a Google Sheet to collect all the results.

In Zapier, I then created a Webhook that collected the results into that Google Sheet.

In testRTC, I’ve configured a webhook to connect to Zapier.

10,000 monitor runs later… there was enough data to chew on.

Initial insights

Here’s how the data looks like on the testRTC dashboard:

The first row? AppRTC. It fails every once in awhile for 2+ hours at a time.

Jitsi had a few hiccups. appear.in was almost flawless.

Here’s what happened when I created a quick pivot table of it on my Google Sheet:

AppRTC was down 6% of the time…

Media quality

We now have our own unique ranking system for WebRTC media quality in test results.

I’ve showed that during the webinar as well, checking out the various services fair with maintaining stable quality throughout the month. Interestingly, they behaved rather differently from one another.

Watch the webinar to learn more about it.

The webinar and demo

Most of the webinar was a long demo session. You can view it all here:

You can open up your own testRTC account and play with our service a bit under evaluation.

Our next webinar – CI/CD

Lots of the work our customers do involved automation. They write scripts to automate testing and then they want to automate running these scripts when needed. This usually means running a nightly build, on code check in, etc.

testRTC caters to that through a simple to use API, which is what I’ll be demoing in the next webinar. And I won’t be doing it alone – I’ll be joined by Gustavo Garcia of Houseparty. Gustavo was one of our first customers to make use of our APIs in such a way.

Testing Firefox has just become easier (and other additions in testRTC)

We’ve pushed a new release for our testRTC service last month. This one has a lot of small polishes along with one large addition – support for Firefox.

I’d like to list some of the things you’ll be able to find in this new release.

Firefox

When we set out to build testRTC, we knew we will need to support multiple browsers. We started off with Chrome (just like most companies building applications with WebRTC), and somehow drilled down into more features, beefing up our execution, automation and analysis capabilities.

We tried adding Firefox about two years ago (and failed). This time, we’re taking it in “baby steps”. This first release of Firefox brings with it solid audio support with rudimentary video support. We aren’t pushing our own video content but rather generating it ad-hoc. This results less effective bitrates that we can reach.

The challenge with Firefox lies in the fact that it has no fake media support the same way Chrome does – there is no simple way to have it take up media files directly instead of the camera. We could theoretically create virtual camera drivers and work our way from there, but that’s exactly where we decided to stop. We wanted to ship something usable before making this a bigger adventure (which was our mistake in the past).

Where will you find Firefox? In the profile planning section under the test editor:

When you run the tests, you might notice that we alternate the colors of the video instead of pushing real video into it. Here’s how it looks like running Jitsi between Firefox and Chrome:

That’s a screenshot we’ve taken inside the test. That cyan color is what we push as the video source from Firefox. This will be improved over time.

On the audio side you can see the metrics properly:

If you need Firefox, then you can now start using testRTC to automate your WebRTC testing on Firefox.

How we count minutes

Up until now, our per minute pricing for tests was built around the notion of a minimum length per test of 10 minutes. If you wanted a test with 4 probes (that’s 4 browsers) concurrently, we calculated it as 4*10=40 minutes even if the test duration was only 3 minutes.

That has now changed. We are calculating the length of tests without any specific minimum. The only things we are doing is:

  1. Length is rounded up towards the nearest minute. If you had a test that is 2:30 minutes long, we count it as 3 minutes
  2. We add to the test length our overhead of initiation for the test and teardown. Teardown includes uploading results to our servers and analyzing them. It doesn’t add much for smaller tests, but it can add a few minutes on the larger tests

End result? You can run more tests with the minutes allotted to your account.

This change is automatic across all our existing customers – there’s nothing you need to do to get it.

Monitoring tweaks

We’ve added two new capabilities to monitoring, due to requests of our customers.

#1 – Automated run counter

At times, you’ll want to alternate information you use in a test based on when it gets running.

One example is using multiple users to login to a service. If you run a high frequency monitor, which executes a test every 2-5 minutes, using the same user won’t be the right thing to do:

  • You might end up not leaving the first session when running the next monitor a couple of minutes later
  • Your service might leave session information for longer (webinars tend to do that, waiting for the instructors to join the same session for ten or more minutes after he leaves)
  • If a monitor fails, it might cause a transient state for that user until some internal timeout

For these, we tend to suggest clients to use multiple users and alternate between them as they run the monitors.

Another example is when you want in each round of execution to touch a different part of your infrastructure – alternating across your data centers, machines, etc.

Up until today, we’ve used to do this using Firebase as an external database source that knows which user was last used – we even have that in our knowledge base.

While it works well, our purpose is to make the scripts you write shorter and easier to maintain, so we added a new (and simple) environment variable to our tests called RTC_RUN_COUNT. The only thing it does is return the value of an iterator indicating how many times the test has been executed – either as a test or as a monitor.

It is now easy to use by calculating the modulu value of RTC_RUN_COUNT with the number of users you created.

You can learn more about RTC_RUN_COUNT and our other environment variables in our knowledge base.

#2 – Additional information

We had a customer recently who wanted to know within every run of a monitor specific parameters of that run – in his case, it was the part of his infrastructure that gets used during the execution.

He could have used rtcInfo(), but then he’ll need to dig into the logs to find that information, which would take him too long. He needed that when the monitors are running in order to quickly pinpoint the source of failures on his end.

We listened, and added a new script command – rtcSetAdditionalInfo(). Whatever you place in that command during runtime gets stored and “bubbled up” – to the top of test run results pages as well as to the test results webhook. This means that if you connect the monitor to your own monitoring dashboards for the service, you can insert that specific information there, making it easily accessible to your DevOps teams.

Onwards

We will be looking for bugs (and fixing them) around our Firefox implementation, and we’re already hard at work on a totally new product and on some great new analysis features for our test results views.

If you are looking for a solid, managed testing and monitoring solution for your WebRTC application, then try us out.

Monitoring WebRTC apps just got a lot more powerful

As we head into 2019, I noticed that we haven’t published much around here. We doubled down on helping our customers (and doing some case studies with them) and on polishing our service.

In the recent round of updates, we added 3 very powerful capabilities to testRTC that can be used in both monitoring and testing, but make a lot of sense for our monitoring customers. How do I know that? Because the requests for these features came from our customers.

Here’s what got added in this round:

1. HAR files support

HAR stands for HTTP Archive. It is a file format that browsers and certain viewer apps support. When your web application gets loaded by a browser, all network activity gets logged by the browser and can be collected by a HAR file that can later be retrieved and viewed.

Our focus has always been WebRTC, so collecting network traffic information that isn’t directly WebRTC wasn’t on our minds. This changed once customers approached us asking for assistance with sporadic failures that were hard to reproduce and hard to debug.

In one case, a customer knew there’s a 502 failure due to the failure screenshot we generate, but it wasn’t that easy to know which of his servers and services was the one causing it. Since the failure is sporadic and isn’t consistent, he couldn’t get to the bottom of it. By using the HAR files we can collect in his monitor, the moment this happens again, he will have all the network traces for that 502, making it easier to catch.

Here’s how to enable it on your tests/monitors:

Go to the test editor, and add to the run options the term #har-file

 

Once there and the test/monitor runs next, it will create a new file that can be found under the Logs tab of the test results for each probe:

We don’t handle visualization for HAR files for the moment, but you can download the file and place it on a visual tool.

I use netlog-viewer.

Here’s what I got for appr.tc:

2. Retry mechanism

There are times when tests just fail with no good reason. This is doubly true for automating web UI, where minor time differences may cause problems or when user behavior is just different than an automated machine. A good example is a person who couldn’t login – usually, he will simply retry.

When running a monitor, you don’t want these nagging failures to bog you down. What you are most interested in isn’t bug squashing (at least not everyone) it is uptime and quality of service. Towards that goal, we’ve added another run option – #try

If you add this run option to your monitor, with a number next to it, that monitor will retry the test a few more times before reporting a failure. #try:3 for example, will retry twice the same script before reporting a failure.

What you’ll get in your monitor might be something similar to this:

The test reports a success, and the reason indicates a few times where it got retried.

3. Scoring of monitor runs

We’ve started to add a scoring system to our tests. This feature is still open only to select customers (want to join in on the fun? Contact us)

This scoring system places a test based on its media metrics collected on a scale of 0-10. We decided not to go for the traditional MOS scoring of 1-5 because of various reasons:

  1. MOS scoring is usually done for voice, and we want to score video
  2. We score the whole tests and not only a single channel
  3. MOS is rather subjective, and while we are too, we didn’t want to get into the conversation of “is 3.2 a good result or a bad result?”

The idea behind our scores is not to look at the value as good or bad (we can’t tell either) but rather look at the difference between the value across probes or across runs.

Two examples of where it is useful:

  1. You want to run a large stress test. Baseline it with 1-2 probes. See the score value. Now run with 100 or 1000 probes. Check the score value. Did it drop?
  2. You are running a monitor. Did today’s runs fair better than yesterday’s runs? Worse? The same?

What we did in this release was add the score value to the webhook. This means you can now run your monitors and collect the media quality scores we create and then trendline them in your own monitoring service – splunk, elastic search, datadog, whatever.

Here’s how the webhook looks like now:

The rank field in the webhook indicates the media score of this session. In this case, it is an AppRTC test that was forced to run on simulated 3G and poor 4G networks for the users.

As with any release, a lot more got squeezed into the release. These are just the ones I wanted to share here this time.

If you are interested in a monitoring service that provides predictable synthetics WebRTC clients to run against your service, checking for uptime and quality – check us out.

Using testRTC for WebRTC-PSTN testing and monitoring

When we started a couple of years ago, we started receiving requests from contact center vendors to support scenarios that involve both WebRTC and PSTN.

Most of these were customers calling from a regular phone to an agent sitting in front of his browser and accepting the call using WebRTC. Or the opposite – contact center agents dialing out from their browser towards a regular phone.

That being the case, we thought it was high time we took care of that and give a better, more thorough explanation on how to get that done. So we partnered with Twilio on this one, took their reference application of a contact center from github, and wrote the test scripts in testRTC to automate it.

Along the way, we’ve made use of Twilio to accept calls and dial out calls; dabbled with AWS Lambda; etc.

It was a fun project, and Twilio were kind enough to share our story on their own blog.

If you are trying to test or monitor your contact center, and you need to handle scenarios that require PSTN automation mangled with WebRTC, then this is mandatory reading for you:

Automate Your Twilio Contact Center Testing with testRTC

And if you need help in getting that done, just ping us.

[Webinar Recording] Creating a Kickass WebRTC Monitor

A few weeks ago, we’ve hosted a webinar on creating an active monitoring system for your WebRTC application. Obviously, we’ve used testRTC for that.

We went through the following topics:

  • Why is WebRTC monitoring different than VoIP or Web monitoring? (that’s because it is a bit of both)
  • What do we mean when we say active monitoring? (checking for uptime and service quality in a predictable and reproducible fashion, and without violating user privacy or data compliance)
  • How to actually write and configure a monitor in testRTC. And then connect it to Slack for its alerts (did that as a live demo on our platform)
  • When and for what scenarios to use testRTC (there are quite a few that we see customers aiming for)

The recording is now available on YouTube:

If you are looking to improve the stability, quality and uptime of your WebRTC application, then we’re here to help you. Contact us to learn more

WebRTC Application Monitoring: Do you Wipe or Wash?

UPDATE: Recording of this webinar can be found here.

If you are running an application then you are most probably monitoring it already.

You’ve got New Relic, Datadog or some other cloud service or on premise monitoring setup handling your APM (Application Performance Management).

What does that mean exactly with WebRTC?

If we do the math, you’ve got the following servers to worry about:

  • STUN/TURN servers, deployed in one or more (probably more) data centers
  • Signaling server, at least one. Maybe more when you scale the service up
  • Web server, where you actually host your application and its HTML pages
  • Media servers, optionally, you’ll have media servers to handle recording or group calls (look at our Kurento sizing article for some examples)
  • Database, while you might not have this, most services do, so that’s another set of headaches
  • Load balancers, distributed memory datagrid (call this redis), etc.

Lots and lots of servers in that backend of yours. I like to think of them as moving parts. Every additional server that you add. Every new type of server you introduce. It adds a moving part. Another system that can fail. Another system that needs to be maintained and monitored.

WebRTC is a very generous technology when it comes to the variety of servers it needs to run in production.

Assuming you’re doing application monitoring on these servers, you are collecting all machine characteristics. CPU use, bandwidth, memory, storage. For the various servers you can go further and collect specific application metrics.

Is that enough? Aren’t you missing something?

Here are 4 quick stories we’ve heard in the last year.

#1 – That Video Chat Feature? It Is Broken

We’re still figuring out this whole embeddable communications trend. The idea of companies taking WebRTC and shoving voice and video calling capabilities into an existing product and workflow. It can be project management tools, doctor visitations, meeting scheduler, etc.

In some cases, the interactions via WebRTC are an experiment of sorts. A decision to attempt embedding communications directly to the existing product instead of having users find how to communicate directly (phone calls and Skype were the most common alternatives).

Treated as an experiment, such integrations sometimes were taken somewhat out of focus, and the development teams rushed to handle other tasks within the core product, as so often happens.

In one such case, the company used a CPaaS vendor to get that capability integrated with their service, so they didn’t think much about monitoring it.

At least not until they found out one day that their video meetings feature was malfunctioning for over two weeks (!). Customers tried using it and failed and just moved on, until someone complained loud enough.

The problem ended up being the use of deprecated CPaaS SDK that had to be upgraded and wasn’t.

#2 – But Our Service is Working. Just not the Web Calling Part

In many cases, there’s an existing communication product that does most of its “dealings” over PSTN and regular phone numbers. Then one day, someone decides to add browser dialing. Next thing that happens, you’ve got a core product doing communications with a new WebRTC-based feature in there.

Things are great and calls are being made. Until one day a customer calls to complain. He embedded a call button to his website, but people stopped calling him from the site. This has gone for a couple of days while he tried tweaking his business and trying to figure out what’s wrong. Until finding out that the click to call button on the website just doesn’t work anymore.

Again, all the monitoring and health check metrics were fine, but the integration point of WebRTC to the rest of the system was somewhat lost.

The challenge here was that this got caught by a customer who was paying for the service. What the company wanted to do at that point is to make sure this doesn’t repeat itself. They wanted to know about their integration issues before their customers do.

#3 – Where’s My Database When I Need it?

Here’s another one. A customer of ours has this hosted unified communications service that runs from the browser. You login with your credentials, see a contacts list and can dial anyone or receive calls right inside the browser.

They decided to create a monitor with us that runs at a low frequency doing the exact same thing: two people logging in, one calls and the other answers. Checking that there’s audio and video and all is well.

One time they contacted us complaining that our monitor is  failing while they know their system is up and running. So we opened up a failed monitor run, looked at the screenshot we collect automatically upon failure and saw an error on the screen – the browser just couldn’t get the address book of the user after logging in.

This had nothing to do with WebRTC. It was a faulty connection to the database, but it ended up killing the service. They got that pinpointed and resolved after a couple of iterations. For them, it was all about the end-to-end experience and making sure it works properly.

#4 – The Doctor Won’t See You Now

Healthcare is another interesting area for us. We’ve got customers in this space doing both testing and monitoring. The interesting thing about healthcare is that doctor visitations aren’t a 24/7 thing. For that particular customer it was a 3-hour day shift.

The service was operating outside of the normal working hours of the doctor’s office, with the idea of offering patients a way to get a doctor during the evening hours.

With a service running only part of the day, the company wanted to be certain that the service is up and running properly – and know about it as early on as possible to be able to resolve any issues prior to the doctors starting their shift.

End-to-End Monitoring to the Rescue

In all of these cases, the servers were up and running. The machines were humming along, but the service itself was broken. Why? Because application metrics tell a story, but not the whole story. For that, you need end-to-end monitoring. You need a way to run a real session through the system to validate that all of its pieces – all of its moving parts – are working well TOGETHER.

Next week, we will be hosting a webinar. In this webinar, we will show step by step how you can create a killer monitor for your own WebRTC application.

Oh – and we won’t only focus on working/not working type of scenarios. We will show you how to catch quality degradation issues of your service.

I’ll be doing it live, giving some tips and spending time explaining how our customers use our WebRTC monitoring service today – what types of problems are they solving with it.

Join me:

Creating a Kickass WebRTC Monitor Using testRTC
recording can be found here