HTTP/2 traffic Dataset
Dataset Overview
Context:
This dataset was construct for training and testing our classification tool: H2Classifier. The classification is presented in the paper "Transparent and Service-Agnostic Monitoring of Encrypted Web Traffic" published in the TNSM journal 2019.Description:
Data 1:
The dataset provide pcap capture of the load of pages requestes with different keywords on different services. The captured traffic is protected by HTTPS (TLS + HTTP/2). Here we consider 5 services: Amazon, Instagram, Google, Google Images, Google Maps. The dataset is here split in 3 parts:- A: For each of the 5 services -> 2000 keywords, more than 12 traces for each keywords (in the paper: 2 data_h2_2000, monitored keywords)
- B: For each of the 4 services (all except instagram) -> more than 20.000 different keywords with one trace
- C: For each of the 4 services (all except instagram) -> 500 different keywords, 60 traces for each keywords.
Data 2:
The dataset provide pcap capture of the load of pages requestes with different keywords on different services. Additionaly, each capture is support with a screenshot and the HTML code. The captured traffic is protected by HTTPS (TLS + HTTP/2). The dataset is here split in 2 parts:- A: Test-of-time 4 services: Amazon, Instagram, Google, Google Images. During 121 days: 4 capture of 500 different keywords on the 4 services. 968.000 traffic pages captured (around 1.8 TB)
- B: Test-of-space. 3096 services. For each service: 20 keywords captured 20 times each. 1.238.400 traffic pages captured (around 6 TB)
License:
- Use of the datasets above for research or other purposes is subject to the "Creative Commons 4.0 Attribution-Sharealike license" (http://creativecommons.org/licenses/by-sa/4.0/).
- Please make sure to cite the dataset:
Pierre-Olivier Brissaud, Jérôme François, Isabelle Chrisment, Thibault Cholez, Olivier Bettan:
Transparent and Service-Agnostic Monitoring of Encrypted Web Traffic.
IEEE Trans. Network and Service Management 16(3): 842-856 (2019)
@article{tnsmBrissaud2019,
author = {Pierre{-}Olivier Brissaud and J{\'{e}}r{\^{o}}me Franc{\c{c}}ois and Isabelle Chrisment and Thibault Cholez and Olivier Bettan},
title = {Transparent and Service-Agnostic Monitoring of Encrypted Web Traffic},
journal = {{IEEE} Trans. Network and Service Management},
volume = {16},
number = {3},
pages = {842--856},
year = {2019},
url = {https://doi.org/10.1109/TNSM.2019.2933155},
doi = {10.1109/TNSM.2019.2933155},
timestamp = {Mon, 23 Sep 2019 17:26:32 +0200} }
For more information please contact:
Pierre-Olivier Brissaud
pierre-olivier.brissaud(at)inria.fr
Or
Jérôme François
jerome.francois(at)inria.fr