Telemetry data acquisition is becoming crucial for efficient detection and timely reaction in the case of network status changes, such as failures. Streaming telemetry data to many collectors might be hindered by scalability issues, causing delay in localization and detection procedures. Providing efficient mechanisms for managing the massive telemetry traffic coming from network devices can pave the way to novel procedures, speeding up failure detection and thus minimizing response time. This paper proposes a novel Kafka-based monitoring framework leveraging the telemetry service. The proposed framework exploits the built-in scalability and reliability of Kafka to go beyond traditional monitoring systems. The framework allows a continuous monitoring of optical system data and their distribution through simple compressed text messages to a large number of consumers. Moreover, the proposed framework keeps a limited history of the monitored data, easing, for example, root cause failure analysis. The implemented monitoring platform is experimentally validated, considering the disaggregated paradigm, in terms of functional assessment, scalability, resiliency, and end-to-end message latency. Obtained results show that the framework is highly scalable, supporting up to around 4000 messages per second (and potentially more) with low CPU load, and is capable of achieving an end-to-end (i.e., producer-consumer) latency of about 50 ms. Moreover, the considered architecture is capable of overcoming the failure of a monitoring framework core component without losing any message.

Reliable and scalable Kafka-based framework for optical network telemetry

Sgambelluri A.;Pacini A.;Paolucci F.;Castoldi P.;Valcarenghi L.
2021-01-01

Abstract

Telemetry data acquisition is becoming crucial for efficient detection and timely reaction in the case of network status changes, such as failures. Streaming telemetry data to many collectors might be hindered by scalability issues, causing delay in localization and detection procedures. Providing efficient mechanisms for managing the massive telemetry traffic coming from network devices can pave the way to novel procedures, speeding up failure detection and thus minimizing response time. This paper proposes a novel Kafka-based monitoring framework leveraging the telemetry service. The proposed framework exploits the built-in scalability and reliability of Kafka to go beyond traditional monitoring systems. The framework allows a continuous monitoring of optical system data and their distribution through simple compressed text messages to a large number of consumers. Moreover, the proposed framework keeps a limited history of the monitored data, easing, for example, root cause failure analysis. The implemented monitoring platform is experimentally validated, considering the disaggregated paradigm, in terms of functional assessment, scalability, resiliency, and end-to-end message latency. Obtained results show that the framework is highly scalable, supporting up to around 4000 messages per second (and potentially more) with low CPU load, and is capable of achieving an end-to-end (i.e., producer-consumer) latency of about 50 ms. Moreover, the considered architecture is capable of overcoming the failure of a monitoring framework core component without losing any message.
2021
File in questo prodotto:
File Dimensione Formato  
pub1_jocn-13-10-E42.pdf

non disponibili

Tipologia: Documento in Post-print/Accepted manuscript
Licenza: Non pubblico
Dimensione 4.43 MB
Formato Adobe PDF
4.43 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11382/539623
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 31
social impact