HTTP/2 traffic Dataset
Dataset Overview

Context:

This dataset was construct for training and testing our classification tool: H2Classifier. The classification is presented in the paper "Transparent and Service-Agnostic Monitoring of Encrypted Web Traffic" published in the TNSM journal 2019.

Description:

Data 1:

The dataset provide pcap capture of the load of pages requestes with different keywords on different services. The captured traffic is protected by HTTPS (TLS + HTTP/2).
Here we consider 5 services: Amazon, Instagram, Google, Google Images, Google Maps.
The dataset is here split in 3 parts:
  • A: For each of the 5 services -> 2000 keywords, more than 12 traces for each keywords (in the paper: 2 data_h2_2000, monitored keywords)
  • B: For each of the 4 services (all except instagram) -> more than 20.000 different keywords with one trace
  • C: For each of the 4 services (all except instagram) -> 500 different keywords, 60 traces for each keywords.
We note that we only provide here the tcp traffic capture on port 443.
The packets related with port 8000 are generated to stake out the different loaded pages.

Data located: here

Data 2:

The dataset provide pcap capture of the load of pages requestes with different keywords on different services. Additionaly, each capture is support with a screenshot and the HTML code. The captured traffic is protected by HTTPS (TLS + HTTP/2).
The dataset is here split in 2 parts:
  • A: Test-of-time
    4 services: Amazon, Instagram, Google, Google Images.
    During 121 days: 4 capture of 500 different keywords on the 4 services.
    968.000 traffic pages captured (around 1.8 TB)
  • B: Test-of-space.
    3096 services.
    For each service: 20 keywords captured 20 times each.
    1.238.400 traffic pages captured (around 6 TB)
We note that we only provide here the tcp traffic capture on port 443.

Data available on demand.

License:

  • Use of the datasets above for research or other purposes is subject to the "Creative Commons 4.0 Attribution-Sharealike license" (http://creativecommons.org/licenses/by-sa/4.0/).
  • Please make sure to cite the dataset:

    Pierre-Olivier Brissaud, Jérôme François, Isabelle Chrisment, Thibault Cholez, Olivier Bettan:
    Transparent and Service-Agnostic Monitoring of Encrypted Web Traffic.
    IEEE Trans. Network and Service Management 16(3): 842-856 (2019)

    @article{tnsmBrissaud2019,
    author = {Pierre{-}Olivier Brissaud and J{\'{e}}r{\^{o}}me Franc{\c{c}}ois and Isabelle Chrisment and Thibault Cholez and Olivier Bettan},
    title = {Transparent and Service-Agnostic Monitoring of Encrypted Web Traffic},
    journal = {{IEEE} Trans. Network and Service Management},
    volume = {16},
    number = {3},
    pages = {842--856},
    year = {2019},
    url = {https://doi.org/10.1109/TNSM.2019.2933155},
    doi = {10.1109/TNSM.2019.2933155},
    timestamp = {Mon, 23 Sep 2019 17:26:32 +0200} }
Contact:

For more information please contact:

Pierre-Olivier Brissaud
pierre-olivier.brissaud(at)inria.fr

Or

Jérôme François
jerome.francois(at)inria.fr