* streamk example and performance tuning * one missing file Co-authored-by: Haicheng Wu <haichengw@nvidia.com>