A Procedure for Building Reduced reliable Training Datasets from Real-World Data