-
Notifications
You must be signed in to change notification settings - Fork 375
Have any "succesor libraries" emerged, as Cam suggested? #414
Comments
UPDATE: pymc-marketing will become the new successor to this library. I know this post is nearly a year old, but I would be happy to collaborate with others on a successor library built in PyMC. I've recently started working on a CLV project and already foresee the time-based splitting of calibration and holdout data as a considerable limitation. Random and/or stratified sampling to ensure calibration and holdout data are equally distributed would be my priority, but the built-in statistical functions of PyMC would lend themselves well to this project, and model training can be distributed across GPUs and dramatically reduce training time. I'm still proceeding with lifetimes as-is for the beta release of my CLV project, so I won't have much time to dedicate to a successor library until Mar 2022, but if anyone is interested, please respond to this issue. |
@ColtAllen feel free to contact me |
@ColtAllen, I am personally more interested in a TensorFlow Probability-based successor, having not worked with Pyro much, but I would be interested in assisting and seeing where there may be overlap. |
I have been working, albeit slowly, on building a successor on Dask instead of Pandas. I see the challenge of doing CLV on millions of users and not being able to fit things in memory. The idea of Pyro sounds very compelling. How would you like to organize the project? |
@shgidi @gpyga @RodrigoRivera Want to plan a Zoom call to discuss this further? I’m in the Denver area, Mountain Standard Time (UTC-7:00). I have a draft prepared of the details I want to discuss, but I’ll provide an overview here and address your comments.
Pyro is to PyTorch what TFProb is to TensorFlow. If this project takes off, then support for both libraries would be a great direction to go. I personally prefer Pyro because open-source is only as good as the supporting documentation. I starting working with TFProb back in 2017 when it was still called Edward, but have since moved away from it because the vague yet verbose documentation - which even has a few broken links - created considerable friction in my projects: https://www.tensorflow.org/probability/overview The documentation for Pyro on the other hand is among the best I’ve ever seen for an open-source library: https://docs.pyro.ai/en/stable/ Both packages are also relatively low-level. Base TF can be cumbersome to work with, whereas PyTorch was expressly written to have a syntax similar to NumPy: https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html Speaking of numpy: https://examples.dask.org/array.html
Dask is basically a distributed drop-in replacement for numpy and would be an excellent alternative for the RFM aggregations. My current project has over 88 million transactions, so my team had to create a separate RFM feature store just to use lifetimes.
In the Zoom call, I want to address and attain common agreement in the following areas:
I’ve reviewed the GitHub issues for lifetimes in detail, and we each have our own lists of problems to bring up I’m sure, but let’s not confuse issues with features we’d like to see added. I like the OKR approach for setting goals (qualitative Objectives and measurable Key Results) but I’m not married to the methodology by any means. A good objective would be to make lifetimes the premiere open-source library for stochastic RFM and CLV modeling. The number of models supported, reducing training times and the rate of convergence errors, and increasing the number of GitHub Stars and Watches are all ways we can measure this. Lastly, the documentation for lifetimes is quite good, but I want to review the contributor’s guide in particular, make any desired changes, and ensure we’re all in alignment before going full-speed ahead with code development, because it will make PRs go much more smoothly in the future. After these preliminaries are out of the way, we can put a task list together and set up GitHub Project pages for each. Looking forward to working with you all! |
@ColtAllen I would also be interested on collaborating on a successor library, and would love to join an upcoming call (if the kickoff you mentioned hasn't happened yet)! We use |
Absolutely. I am in Central European Time. Should we aim at having a call on the second or third week of March? |
@RodrigoRivera Awesome! How about either March 13th or 20th for the Zoom meeting? Due to time zone differences, I see this happening around noontime for those in the Americas, and in the evening for those in Europe. @deepyaman Hope you can join! I've been looking at the |
March 13 works for me personally! @ColtAllen I’ve used |
@deepyaman Great! I'll let @RodrigoRivera pick the time since this will be happening at the very end of his day, and I'll post the Zoom link here for anyone to join. Also, I have little interest in integrating If I had to pick another language to incorporate into https://github.com/facebook/prophet/blob/main/python/stan/unix/prophet.stan |
@deepyaman @RodrigoRivera @gpyga @shgidi I’m pushing back this Zoom call because I’ve sent collaboration invites to others and want to give them the opportunity to join as well. If I don’t hear from any of them by St. Patrick’s Day, we can go forward with meeting on 20-Mar or any other Sunday you prefer. I’ve been reviewing the choices of backend for a successor library, and now believe
Lastly, I've forked this repo and have invited you all to be collaborators: https://github.com/ColtAllen/lifetimes I haven’t done much yet aside from update the README, but I’ll be adding some new research paper links and making other minor documentation changes here shortly. |
I appreciate the invitation to join the call and provide advice, but I don't think I would add much! I would like to express my excitement about a successor library being built with probabilistic programming tools - that was a future vision of mine for these RFM techniques. Best of luck, folks! |
Zoom call is scheduled for Sunday, 27-Mar at 10 AM Mountain Standard Time (GMT-6:00) I've been receiving messages from other interested parties on LinkedIn, so I'm delaying the Zoom call by one more week to give others the chance to discover this discussion and join. I've already started working on a MCMC implementation of the Beta-Geo model. MCMC has challenges of its own, but according to this paper it has far less convergence issues than the current MLE approach, which will solve a lot of problems people have with using this library: Worth the effort? Comparison of different MCMC algorithms for estimating the Pareto/NBD model Join Zoom Meeting Meeting ID: 819 3822 1716 Dial by your location |
Thanks @deepyaman, @juanitorduz, and everyone else for attending the Zoom call today. Here's a summary of what we discussed: Identified Library Issues
Development Priorities
Future Additions
Future work will continue in the fork I've created: |
An alpha release of the successor library - rebranded as |
The |
Second beta release of the |
Third beta release of |
I've decided to merge efforts with the PyMC Labs team and work on the pymc-marketing project, which will become the premiere solution for CLV modeling going forward. BTYD has been a solo project of mine ever since I forked this library, but this is now a community effort! @CamDavidsonPilon , please update the README to reflect this, thank you. |
Neat! Looks like a fun project! |
No description provided.
The text was updated successfully, but these errors were encountered: