Open HTTPS Dataset
Dataset Overview
The dataset contains full HTTPS raw PCAP files of crawling top 779 accessed HTTPS websites. The scan was made daily based, two times per day using Goolge Chrome and Mozilla Firefox Web browsers.
- The database is constructed by crawling HTTPS websites over two weeks (September 2016).
- The full PCAP files is given with full HTTPS payloads.
- The scanning process was automated using a local machine and a pre-configured remote proxy to dump all packets with port number 443 (HTTPS port).
- The Chrome part of the dataset includes 250,185 HTTPS flows related to 7977 services/websites.
- The Firefox part of the dataset includes 237,127 HTTPS flow related to 7322 services/websites.
- pkt2flow (https://github.com/caesar0301/pkt2flow) tool can be used to extract flows from PCAP files.
- Use of the datasets above for research or other purposes is subject to the "Creative Commons 4.0 Attribution-Sharealike license" (http://creativecommons.org/licenses/by-sa/4.0/).
- Please make sure to cite the dataset:
@misc{shbair2016,
author = {Wazen Shbair, Thibault Cholez, Jerome Francois, Isabelle Chrisment},
title = {HTTPS Websites Dataset},
howpublished={\url{4 http://betternet.lhs.loria.fr/datasets/https/}}
year = {2016}
}
For more information please contact:
Thibault Cholez
thibault.cholez(at)lorai.fr
Wazen Shbair
shbair.wazen(at)loria.fr