The Downside of Data Science

The dark side of big data

In her book, Cathy discusses several examples of models gone wrong. These models are being used in fields such as:

Education: Some school districts are now evaluating teacher performance based on algorithms that factor in student test scores vs. expected scores, and are using these models over other factors such as observed reviews. When teachers question the models, they are prohibited from knowing exactly what factors go into them.

Lending: Many loan decisions are being made based on models that factor in variables such as ZIP code. Failure to get a loan can result in lost access to education and resources, thereby perpetuating the cycle of poverty.

Hiring: Some companies are now using algorithms that predict work performance based on qualities such as extraversion, agreeableness, conscientiousness, neuroticism, and openness to ideas. There are several problems with this approach. Among them: 1) the models do not incorporate actual performance into them so they are not getting feedback as to the accuracy; and 2) research shows that personality tests are poor predictors of performance.

These are just a few examples that Cathy uses in her book, but they illustrate some of the downside of big data. Organizations typically don’t intend to use data in bad ways; however, it can often be difficult to distinguish between “good uses” and “bad uses” of data.

So what do you do when you are sitting down and trying to make use of the treasure trove of data you have in front of you? Here are some key things I think we all should consider.

1. Be aware of what data is going into your model

You might have a whole lot of data at your disposal, but is it right to use all of it? In my view, it’s important to consider your ultimate outcome and work backwards from there. Do you want certain pieces of data to factor into that decision? For example, if you are a student loan provider, do you want ZIP code information to largely factor into your model? If your ultimate goal is profit, perhaps you’d say yes. But if your goal is to provide access to education, your answer might be different. There’s some data out there that if I had access to it, I’d feel icky about using it in most models.

2. Gain feedback on your model

One of the issues with the personality tests that are used for hiring that Cathy discussed is that the companies that use them do not gain feedback on the model. Were the people they hired actually better than those they didn’t? There’s really no way for them to know.

Some models are bad. Some get stale. Just because you have data, it doesn’t mean you have to use it. Make sure you understand the success of the outcomes of your model, and incorporate that feedback to refine the model moving forward.

3. Be transparent

If you’re fired from your job, you naturally want to know why. However, as Cathy wrote, for teachers who were let go from their jobs because they performed poorly according to an algorithm, they don’t know the answer. When the teachers she spoke to asked to have access to the factors that went into the model they were being judged by, their school districts did not know the answer. And the company that developed the model would not divulge that information.

From my perspective, if you’re going to use the data, be upfront about it. “Here is what I am judging you on and this is why.” Then perhaps you can have an actual discussion about the results. That leads to my last point…

4. Use the data as a discussion point, not necessarily as a decision-maker

It’s true that good data leads to better decisions. But I would argue that the data should guide your decision, not be the ultimate arbiter. Take, for example, a scenario that many of our healthcare customers face. These customers use our software to measure physician performance, by examining data such as DRG, cost, average length of stay, mortality, readmits, and much more, and then comparing them to benchmarks.

When I have talked to customers about their process as to how they are using the data, the best use cases involve the hospitals using the data as a starting point and sparking a conversation with physicians to understand why certain anomalies are occurring. The data ends up not being a judgment, but serves as a reference point. From there, physicians and administrators can have productive discussions rather than point fingers.

These customers also use the “diving” capability of our software (hence, the Diver name) to explore the data and answer questions that arise. Here, the data provided by the model serves more as a question than as the actual answer.

A big thanks to Cathy O’Neil for her interesting presentation and book – it gave me a lot to think about! What am I missing here? Any other thoughts on how we as data experts can combat bad uses of data? Let me know in the comments below.

Author
Recent Posts

Follow me

Kathy Sucich

Kathy Sucich is vice president of marketing at Dimensional Insight, where she is responsible for the company’s branding, events, content, and communications strategy.

Kathy started her career in television news writing and producing. She then worked at a public relations agency and as a freelance writer before joining Dimensional Insight in 2013.

Kathy has presented at several events in both healthcare and wine & spirits, including the Healthcare Innovation Summit, the Healthcare IT Marketing & PR Conference (HITMC), and the WSWA Women’s Leadership Council (WLC) Conference. Kathy is a Fellow in the American College of Healthcare Executives (ACHE) and a member of Women of the Vine & Spirits.

She is also host of the Smarter Healthcare Podcast (www.smarthcpodcast.com) and is a member of the Forbes Communications Council.

Kathy holds a bachelor’s degree from Dartmouth College and an MBA from Boston University.

Follow me

The Downside of Data Science

The dark side of big data

1. Be aware of what data is going into your model

2. Gain feedback on your model

3. Be transparent

4. Use the data as a discussion point, not necessarily as a decision-maker

You may also like

How Analytics Can Help Your Wholesale Nursery Grow

How Analytics Can Help CFOs Manage Financial Reporting

Categories

The Downside of Data Science

The dark side of big data

1. Be aware of what data is going into your model

2. Gain feedback on your model

3. Be transparent

4. Use the data as a discussion point, not necessarily as a decision-maker

You may also like

How Analytics Can Help Your Wholesale Nursery Grow

How Analytics Can Help CFOs Manage Financial Reporting

Categories

Tags