By akademiotoelektronik, 19/03/2023
How Bedrock Streaming Supervise 6Play and Salto
Bedrock Streaming was born under M6's fold and is presented as a technological subsidiary of the television group.In March 2020, M6 decided to open the capital of this entity.The German group RTL (itself mainly held by Bertelsmann) then acquired half of bedrock streaming, before other investors joined it.
The Bedrock Streaming technical team, based in Lyon, brings together nearly 300 employees.
“Our job is to develop streaming platforms for television groups, that is to say infrastructure for advertising/VOD applications (free demand video platforms, but including advertisingEditor's note) and SVOD (where access to content is paid editor's note), "says Olivier Mansour, assistant CTO of Bedrock Streaming.
The customers of this joint venture between M6 and RTL are mainly European.Bedrock Streaming operates two important video services in France: 6play, property of M6, and Salto, the TF1/France TV/M6 co-enterprise.Bedrock Streaming also has customers in Belgium (RTL Belgium), Croatia (RTL Croatia), Hungary (RTL Hungary) and the Netherlands (Videoland).
"Our goal is to provide local players with a technology comparable to suppliers like Netflix, Amazon, Hulu, Disney or others.To fight in the big leagues on the technical part, ”says Olivier Mansour.
These video platforms must be robust.Bedrock Streaming desserts no less than 50 million users in total.Platforms can host video content on demand, reruns, but also live TV flows.
Having good architecture is important, but you still have to be able to monitor it."Often, our customers are aimed at us after trying themselves to develop a streaming platform: they are looking for stability, this is our promise," said the Deputy to the Director of the Director.
Bedrock streaming challenges
Before "playing in the big leagues", and called Bedrock Streaming, the subsidiary operated in 2008 the M6 replay platform hosted in the Parisian Data Center of M6."At the time, the problem for the M6 group was to ensure economic profitability.We did a lot of things ourselves, ”he recalls."For monitoring, we had deployed a stack Elk coupled with Grafana and Statd".
This technological battery also used to supervise the back-ends and the front-ends.
In the early 2010s, this approach was perceived as innovative."We already put what we call observability today at the heart of our development practices," he said."We had for doctrine" everything that is not observable does not exist "".
Also, the market supervision tools were both dear and was based on complex economic models, according to the assistant CTO."We didn't see an economic balance.We were not yet in a public cloud, but our infrastructure was already virtualized and managed using VMware hypervisors, among others.At the time of M6 Replay, pricing depended on the number of servers, the number of CPUs.We couldn't really understand how it worked, "he explains.
Bedrock has itself gone from one BtoC economic model to another BtoBTOC."There, everything has changed.We were a team of 20 developers in total to operate M6 Replay.We had to multiply the workforce to meet the growth objectives, ”says Olivier Mansour."We also understood that our existing stack was not going to follow us.By adding customers, even if our platform continued to behave well for important events, our monitoring fell ”.
In addition, the increase in the number of customers and the progression of M6 replay which in the meantime has been in 6play required a transition to the cloud.This migration to AWS, coupled with a passage on Kubernetes, started in 2017 and is widely documented in a book - "Le Plan Copenhague" - written by Pascal Martin, principal engineer at Bedrock Streaming.
As early as 2019, the IT team asked itself the question of monitoring."Apart from commercial practices and purely technical requirements, the choice of our cloud supplier was conditioned by the fact that we had to go from 20 to 300 developers.It’s a bit like the same approach that influenced the choice of a new monitoring platform, ”says Olivier Mansour.
Bedrock Streaming has evaluated the solutions of Elastic, Datadog, Splunk or New Relic."There was more appetite for Splunk and Datadog on the side of the Ops, while the Back-End developers preferred Elastic, but all these systems met our needs quite limited at the time," said the assistant CTO.So how do you decide?"We have compared our important growth goal.We have chosen a relatively known tool for applicants and the widest possible functional coverage ”.
But this evaluation was not done in a day."We have long sought a company available to accompany us really," says Olivier Mansour.
It was finally New Relic that was chosen in 2020."This work seems long, because we were at the same time embedded our new customers, to develop our multi-line, multi-instance system," explains the technical manager."But as soon as we launched the POC, New Relic quickly understood our approach aimed at exploring their platform in depth.They knew how to accompany us ”.
Bedrock Streaming also wanted to assess the cost of implementing the observability solution, "which is not obvious," said Olivier Mansour.
"In the instance that turns Salto, you have 75 different microservices.We had to assess the effort to connect the New Relic agent to each of the microservices.What has helped us a lot is that we use Terraform for the Infrastructure AS Code and that New Relic offers solutions to set up its agent since this tool, "explains the Deputy Technical Director."From the start, New Relic engineers taught us good practices to do so".
A deployment propelled by the AS Code infrastructure
Once the method is assimilated, Bedrock Streaming has set up an internal skeleton system that can quickly extend to all its projects."After having done the right tests, it is" enough "for the teams to apply this skeleton, to properly read their project and to deploy it".
“At Bedrock, a team is made up of 6 to 8 people and manages three to four microservices.After placing the agent in the Terraform code, the team deploys its own dashboards and alerts to supervise the good health of the elements in production.It is not very different from what we had historically, but the practices were not homogeneous, "said the manager.
Exhaustive instrumentalization, like that wanted by Bedrock Streaming, takes time."We have taken six to nine months to deploy New Relic on all Back-End resources," says Olivier Mansour.
"In just under a year, we have practically finished monitoring, but we have not finished logging.It is also because we did not want to slow down the ongoing projects, "he said.
"I still discussed with our management to ensure sufficient level of observability in anticipation of peaks on our streaming platforms, especially during certain sporting events.To do these load tests, we needed full observability of our cloud platform.From now on, we can follow the requests of all the microservices of the END front to the storage system ".
In this, the deputy technical director considers that the New Relic agent is an advantage.
"Once installed, the agent goes back data on response times, the communication latency between microservices or the performance of our databases.If we observe degraded response times, we can quickly know if we have reached the limits of a managed service and therefore reconfigure it, "said Olivier Mansour.
The capacities of the observability platform do not stop at the analysis of response times.The deputy technical director appreciates the possibility of supervising the errors present in the code.
"We get a lot of information on errors.For example, when we observe an event in which New Relic has raised a panel of errors, the platform can tell us where the faults come from in the code or in our Kubernetes architecture, "he illustrates.
Thus, the technical team of Bedrock Streaming began "by using the most simple, without artificial intelligence analysis tool.We have obtained important gains with our network infrastructure without doing anything other than installing the SDK, "says Olivier Mansour.
Then, the APM of New Relic allowed the teams to view a layer of standard information."As we have labeled information, we can determine overall health.If a problem has been raised by one of our customers, quickly we know if this concern is specific to an instance or if the slowdown is generalized.We did not have this ability before, "he said.
Anticipate the connection wall caused by Top Chef
Because, the objective, let us remember, is to maintain in production video applications on demand."We are trying to estimate the number of users of our platforms and we confront it to our autoscaling system.We want this climb to the scale is based on good metrics and that it is well linked to use.New Relic allows us to aggregate metrics such as the number of video launches, the number of calls for navigation in the pages, etc..We can easily confront this information with the number of active Kubernetes pods in order to know if we do too much or not enough ”.
But the autoscaling system is not as reactive as its name suggests it."For example, when the authentication microservice is much requested, we have the intuition to preheat the services that display the video catalog," said Olivier Mansour.
"On 6play, a good part of the year, Wednesday at 8:30 p.m., it's Top Chef time.People finish their meals before connecting.This is what we call the connection wall.With each of our customers, we are paying our platform according to the seasonality of their traffic in order to preproduce resources in addition to our less reactive autoscaling system when the connection wall is sudden, "he said.
Bedrock Streaming also sets out to use graceful degradation methods and break circuit to ensure the continuity of services for end users, which will be translated by the absence of a progress bar on a video or evenThe impossibility of resuming a series along the way, but which will not prevent the main operation of VOD and SVOD applications.
"These two mechanisms are difficult to optimize," warns Olivier Mansour."You have to understand very precisely what is going on in the streaming platform: which microservice is slowed down?For what ?How ?For how long ?The sequel to New Relic helps us a lot in this ”.
It is not enough to supervise, it is also necessary to prevent the teams concerned by a possible deterioration of services.
"What was quite long - and what is never really finished - is the alert part," says Olivier Mansour."This requires that the team in charge of the microservice is asking the right questions: what is the acceptable level of performance?What are the impacts on products?»»
It was not the priority when adopting the observability suite."Our existing system was very well configured, so at first we did not activate alerts in New Relic, in accordance with the advice of their employees," said the CTO deputy.
Then, Bedrock Streaming set out to put a double level of alerts: one for daily monitoring of the performance of streaming platforms, the other for critical problems.The first sends warnings to Slack rooms."It has become a very reassuring element of our production in continuous deployment.Having New Relic allowed us to increase this rate, "explains Olivier Mansour.
The second informs people on penalty via Pagerduty and helps respect SLAs with customers, according to the assistant CTO.
A publisher "pushed into his entrenchments"
All this would not have been possible as quickly without the assistance of the publisher, according to Olivier Mansour.
"In general, at Bedrock Streaming, when we use a service provider, we quickly push him in his entrenchments.With New Relic, we found bugs in the agents who monitor our applications, but we were able to have access to the code even when certain portions were not yet open source, we were able to offer patches that have been accepted ”,he delights.
Bedrock Streaming still uses a lot of PHP."We were able to discuss it with the engineers of New Relic.They were attentive and were reactive to meet our needs, "notes our interlocutor.
The support is not the only interest of this support.The company also wanted training assistance."We have interlocutors in France.With them, we have free distance meetings in which our 300 developers can join and ask their questions, "said the assistant to the technical director who appreciates this device.
Deployment also went well because Bedrock Streaming has appointed an interlocutor responsible for exchanges with New Relic."This person knows our back-end very well and was able to guide the accompaniment of the publisher and the teams".
This also made it possible to select the right bricks to meet the needs of developers.
"I think we use half of New Relic's features.The offer is very complete.Besides, this is one of the reasons why the start of the project was a bit long, "notes Olivier Mansour."When we connected our first microservice to the test environment with New Relic, we were lost, there was a lot of information.The support helped us a lot to see more clearly ".
If the rest has proven its usefulness, New Rélic did not enter all the strata at Bedrock Streaming.Infrastructure teams manipulate another tool.Today, 70 people in charge of Back-End environments use New Relic, just like some teams responsible for the front-end and mobile.
"In total, 150 developers employ the solution.We also have non -technical people who use the future.The guests access dashboards, which is enough for our product teams in order to measure the use of the features available from our streaming platforms, "announces Olivier Mansour.
"We advance step by step by finely observing the cost of the solution: we pay the license and the number of gigabytes ingested.New Relic teams are proactive to pilot this cost, ”he adds.
The next New Relic adoption steps at Bedrock Streaming concerns the supervision of mobile applications, continuous Dashboarding effort and the assessment of the members responsible for infrastructure.
Related Articles