latest | popular

Filter by
Getting started with large-scale ETL jobs using Dask and AWS EMR
EMR is AWS’s distributed data platform, which we can interact with and submit jobs to from a JupyterLab notebook running on our local machine.
exploratory-data-analysis dask aws notebook
Getting Oriented in the RAPIDS Distributed ML Ecosystem, ETL
This blog post, the first of two exploring this emerging ecosystem, is an introduction to distributed ETL using the dask, cudf, and dask_cudf APIs.
exploratory-data-analysis gpu rapids article
STUMPY: A Powerful and Scalable Python Library for Time Series
STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks.
time-series anomaly-detection pattern-matching matrix-profile
A flexible library for parallel computing in Python.
parallel-computing pandas numpy python
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark.
dask cudf pyspark exploratory-data-analysis
Large SVDs - Dask + CuPy + Zarr + Genomics
Using Dask to perform Singular Value Decomposition on large datasets
singular-value-decomposition dask genomics zarr
projects 1 - 7 of 7
Topic experts
Share a project
Share something you or the community has made with ML.