1 SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing Scilons Project