Photo by Alex Litvin on Unsplash

We’re very excited to announce that we’re publishing our previously private repository of data science best practices. It’s a Github repo created to help the Global Business Services (GBS) team work with our clients to produce consistent, scalable, and performant data science solutions.

The best practices documented here represent years of IBM expertise, and the repository acts to preserve knowledge gained while IBM data scientists built actual implementations. It’s a resource you can consult immediately as you need to plan your own data science projects, and you can use it to understand what an IBM data scientist needs to add…


Photo by Marc-Olivier Jodoin on Unsplash

Training (machine learning) ML models can take a long time depending on your dataset and available hardware, and that keeps you from experimenting quickly. That can be a problem for any data scientist on a deadline, but at the very least it’s certainly a pain as they have to sit and wait for their results. Snap ML is an exciting library to help address that pain. As a drop-in replacement for scikit-learn it’s particularly easy to use. …


This report we’ve written with O’Reilly Media, “Operationalizing AI: How to Accelerate and Scale Across People, Processes, and Platforms” contains our thoughts on how to help organizations understand what does and does not work in in practice when it comes to companies building predictive solutions. If you’re in a data science team tasked with building predictive services, this a brief report for you with essential principles on how to organize your teams and tooling you to accelerate and scale your work.

As the subtitle says, it’s a review of “How to Accelerate and Scale Across People, Processes, and Platforms.” It…


An image showing data inputs, necessary steps in the machine learning process, and a visualization of a neural network.
An image showing data inputs, necessary steps in the machine learning process, and a visualization of a neural network.
AutoML Pipelines

IBM Research gave a workshop at NeurIPS 2020 on AutoAI and automated machine learning (AutoML) together. It offered a comprehensive view of the differences between the two terms, as well as a contextual understanding from their backgrounds and future trends.

First, what is AutoAI? We can first refer to IBM Researcher Lisa Amini’s abstract from the presentation.

“Automated Machine Learning (Auto ML) refers broadly to technologies that automate the process of generating, evaluating, and selecting an ML pipeline optimized for a specific dataset. Techniques tackle both traditional ML pipelines with data pre-processing, feature engineering & selection, algorithm selection and hyper…


Wikimedia Commons

Machine learning is a technique, and data science is the practice. That’s a familiar narrative, and it helps ground leaders new to the field in an understanding of what they’re trying to execute. The view from the field nowadays, though, is that the greatest challenge facing the enterprise comes less from enabling the practice, and more in how enterprises operationalize it. What do I mean by this?

In theory, machine learning is a technique for cleaning data, choosing algorithms, and training models. The output of those models is a prediction, and those predictions are valuable in improving business processes across…

Will Roberts

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store