Hyperlearn

ML 算法速度快 2-2000 倍,内存使用量减少 50%,适用于所有新旧硬件。「2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.」

Github星跟蹤圖

! Hyperlearn is under construction! A brand new stable package will be uploaded sometime in 2022! Stay tuned!

Moonshot Website
Documentation
50 Page Modern Big Data Algorithms PDF


+ Microsoft, UW, UC Berkeley, Greece, NVIDIA

+ NASA + Facebook's Pytorch, Scipy, Cupy, NVIDIA, UNSW


HyperLearn is written completely in PyTorch, NoGil Numba, Numpy, Pandas, Scipy & LAPACK, C++, C, Python, Cython and Assembly, and mirrors (mostly) Scikit Learn.
HyperLearn also has statistical inference measures embedded, and can be called just like Scikit Learn's syntax.

Some key current achievements of HyperLearn:

  • 70% less time to fit Least Squares / Linear Regression than sklearn + 50% less memory usage
  • 50% less time to fit Non Negative Matrix Factorization than sklearn due to new parallelized algo
  • 40% faster full Euclidean / Cosine distance algorithms
  • 50% less time LSMR iterative least squares
  • New Reconstruction SVD - use SVD to impute missing data! Has .fit AND .transform. Approx 30% better than mean imputation
  • 50% faster Sparse Matrix operations - parallelized
  • RandomizedSVD is now 20 - 30% faster

Around mid 2022, Hyperlearn will evolve to GreenAI and aims to incorporate:

  • New Paratrooper optimizer - fastest SGD variant combining Lookahead, Learning Rate Range Finder, and more!
  • 30% faster Matrix Multiplication on CPUs
  • Software Support for brain floating point (bfloat16) on nearly all hardware
  • Easy compilation on old and new CPU hardware (x86, ARM)
  • 100x faster regular expressions
  • 50% faster and 50% less memory usage for assembly kernel accelerated methods
  • Fast and parallelized New York Times scraper
  • Fast and parallelized NYSE Announcements scraper
  • Fast and parallelized FRED scraper
  • Fast and parallelized Yahoo Finance scraper

I also published a mini 50 page book titled "Modern Big Data Algorithm".

Modern Big Data Algorithms PDF

Comparison of Speed / Memory

Algorithm n p Time(s) RAM(mb) Notes
Sklearn Hyperlearn Sklearn Hyperlearn
QDA (Quad Dis A) 1000000 100 54.2 22.25 2,700 1,200 Now parallelized
LinearRegression 1000000 100 5.81 0.381 700 10 Guaranteed stable & fast

Time(s) is Fit + Predict. RAM(mb) = max( RAM(Fit), RAM(Predict) )

I've also added some preliminary results for N = 5000, P = 6000


Help is really needed! Message me!


Key Methodologies and Aims

1. Embarrassingly Parallel For Loops

2. 50%+ Faster, 50%+ Leaner

3. Why is Statsmodels sometimes unbearably slow?

4. Deep Learning Drop In Modules with PyTorch

5. 20%+ Less Code, Cleaner Clearer Code

6. Accessing Old and Exciting New Algorithms


1. Embarrassingly Parallel For Loops

  • Including Memory Sharing, Memory Management
  • CUDA Parallelism through PyTorch & Numba

2. 50%+ Faster, 50%+ Leaner

3. Why is Statsmodels sometimes unbearably slow?

  • Confidence, Prediction Intervals, Hypothesis Tests & Goodness of Fit tests for linear models are optimized.
  • Using Einstein Notation & Hadamard Products where possible.
  • Computing only what is neccessary to compute (Diagonal of matrix and not entire matrix).
  • Fixing the flaws of Statsmodels on notation, speed, memory issues and storage of variables.

4. Deep Learning Drop In Modules with PyTorch

  • Using PyTorch to create Scikit-Learn like drop in replacements.

5. 20%+ Less Code, Cleaner Clearer Code

  • Using Decorators & Functions where possible.
  • Intuitive Middle Level Function names like (isTensor, isIterable).
  • Handles Parallelism easily through hyperlearn.multiprocessing

6. Accessing Old and Exciting New Algorithms

  • Matrix Completion algorithms - Non Negative Least Squares, NNMF
  • Batch Similarity Latent Dirichelt Allocation (BS-LDA)
  • Correlation Regression
  • Feasible Generalized Least Squares FGLS
  • Outlier Tolerant Regression
  • Multidimensional Spline Regression
  • Generalized MICE (any model drop in replacement)
  • Using Uber's Pyro for Bayesian Deep Learning

Goals & Development Schedule

Hyperlearn will be revamped in the following months to become Moonshot GreenAI with over an extra 150 optimized algorithms! Stay tuned!!
Also you made it this far! If you want to join Moonshot, complete the secretive quiz!
Join Moonshot!


Extra License Terms

  1. The Apache 2.0 license is adopted.

主要指標

概覽
名稱與所有者unslothai/hyperlearn
主編程語言Jupyter Notebook
編程語言Python (語言數: 6)
平台
許可證Apache License 2.0
所有者活动
創建於2018-08-27 16:00:47
推送於2024-11-19 02:09:54
最后一次提交2024-11-18 18:09:54
發布數0
用户参与
星數2.2k
關注者數94
派生數143
提交數264
已啟用問題?
問題數22
打開的問題數1
拉請求數10
打開的拉請求數1
關閉的拉請求數4
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?