A package I implemented and used during my internship at Metlife and InChange which contains techniques of Web crawler,Text analysis, Spark machine learning and so forth. I try to write it in a draft form for the convenience of use, it is a bit confusing but useful.
Be free to use and fork, thanks.