Saul Shanabrook

👋 Hi, my name is Saul Shanabrook. 👋

💝 Welcome to my website! 💝
🔗 Here lies a collection of "internet links." 🔗
🗂 I have helpfully arranged them into categories. 🗂
🍾 I hope you enjoy! 🍾

📄 Oh also, if you are looking for my resume, here it is 📄

💌 Contact 💌

Feel free to reach out to me via email, google meet, twitter, mastodon or github.

💐 Nice Things 💐

Plants for a Future

a database of edible plants

Help Yourself

nonprofit helping create public access food forests in western MA

East Bay Permanent Real Estate Cooperative

an inspiring model of using community control to prevent gentrification and create affordable resident controlled housing

Radical Homeownership Part 2

a v. fun video on some great alternative options on community stewardship

egg: e-graphs good

a cool tool to help build replacement systems without having to worry about rule order 😱.

🏡 Things I have worked on 🏡

egglog-python

a library to use e-graphs in Python for building expressive DSLs and optimizing code

Valley Housing Cooperative

a project with some friends to find a place to live, do fun things, and try something out

Plant Friends

an iOS app I started to help people become better friends with plants near them

metadsl

a library to use pattern matching and type analysis to build safe DSLs in Python, in order to allow scientific computing libraries to better collaborate and share key abstractions.

python-code-data

provides a friendly isomporphic representation of Python's bytecode objects

jupyterlab

an open source data science IDE in your browser. I was a core maintainor for a while and helped on a variety of extensions as well

lineapy

a python code analysis tool, which helps productionize data science code by building a DAG of python code

🔄 Links to Links 🔄

blog posts

my new blog posts on Github Discussions

old blog posts

my old blog posts on my previous statically generated website

🎭 Talks 🎭

March 21st, 2024: Optimizing Scikit-Learn with Egglog and Numba

Now that I have this great e-graph library in Python, what extra mechanisms do I need to make it useful in existing Python code?

This talk will go thorugh a few techniques developed and also point to how by bringing in use cases from scientific Python can help drive further theoretic research

EGRAPHS Community - Lightning Talks

November 3rd, 2023: egglog: e-graphs in Python

PyData NYC '23 Lightning Talk

August 1st, 2023: egglog: E-Graphs in Python

The PyData ecosystem is home to one of the largest and most successful open source communities. It's both where most newcomers to data science start and also where cutting edge research takes place. It has been able to support the diverse needs of its users through its decentralized nature, promoting creativity and collaboration.

As the size of data has increased and our compute has moved off of our single CPUs, the nature of libraries has evolved. Whereas in the past client code would generally call out to fast pre-compiled libraries (SciPy, NumPy, etc.), now it often works via calls to a variety of distributed, out-of-core, and specialized compilation and computation backends (PyTorch, Dask, Numba, Ibis, etc.). This means a growing number of libraries do not eagerly execute a computation in the CPython interpreter, but instead optimize and translate it to some other target.

At a high level, we can see this ecosystem as a large decentralized, embedded, domain-specific compiler, translating from high-level user expressions to different low-level primitives. This calls for an exploration of tooling to help enable this translation of programs between different representations, to facilitate the efficient use of code across this distributed ecosystem.

One approach to automating this translation among different representations is the rewriting technique called “equality saturation.” This allows us to construct a data structure of equivalent programs (an ‘e-graph’), and then search that space for a functionally-equivalent program that has desirable characteristics such as improved performance or memory efficiency. Building this translation tooling once can enhance sharing and collaboration between the libraries which use it.

In this talk, Saul Shanabrook goes over how e-graphs work, how they were developed, and different ways they can be used in the PyData ecosystem. Saul also surveys the egglog library, which is one specific tool for using e-graphs in Python.

OpenTeams Technical Talk

April 28, 2020: Using Altair, Ibis, and Vega for interactive exploration of OmniSci

Altair is a lovely tool that lets you build up complex interactive charts in Python. Ibis is also a lovely tool that lets you use a Pandas, like API to compose SQL expressions in OmniSci and other backends. By tying them together you can use the familiar syntax of Pandas, combined with the expressive power of Vega and Vega Lite, to visualize large amounts of data stored in OmniSci. This talk will walk through a number of examples of using this pipeline and then go through how it works.

The OmniSci summer sessions

December 8, 2019: metadsl: separating API from execution

metadsl is a Python framework for writing APIs that are detached from how they are executed. With it we can be framework agnostic definitions of concepts like "arrays" and compile them to backends like Tensorflow or LLVM. In this talk, we will use metadsl to build high performance scientific computing libraries.

PyData Austin 2019

November 4, 2019: Same API, Different Execution

Can the Python data science and scientific computing ecoystem remain in the hands of community open source projects? Or will increasingly complex performance and hardware requirements leave room only for vertically integrated corporate sponsored projects?

PyData New York 2019

November 17, 2018: uarray - Efficient and Generic Array Computation

Efficient array computing is required to continue advances in fields like IoT and AI. We demonstrate a system, uarray, that does array computation generically and targets different backends. We rely on a Mathematics of Arrays, a theory of shapes and indexing, to reduce array expressions. As a result, temporary arrays and unneeded calculations are eliminated leading to minimal memory and CPU usage.

PyData Washington DC 2018