Kaggle Case Study – Charles Cadman
Kaggle Newcomers Place in Top Ten Percent of Competition
When Charles Cadman, FSA, MAAA, CERA, Ph.D., decided to participate in his first Kaggle challenge, he wasn’t sure what to expect. He figured his academic background in pure mathematics would be helpful in the competition. What he didn’t anticipate was his team of Kaggle newcomers placing in the top ten percent of their “Understanding the Amazon From Space” challenge, ranking 84th out of 938 teams.
Kaggle competitions are analytics and data science challenges—competitors build models to solve real-world machine learning problems for organizations ranging from the Department of Homeland Security to Intel. This year, the Society of Actuaries (SOA) created the Kaggle Involvement Program to incentivize actuaries to participate in Kaggle competitions.
Although everyone on his team was an actuary, they were novices to this area of research: image-based machine learning. Their challenge focused on using satellite data to track the human footprint in the Amazon rainforest.
In this competition, Planet, an Earth-imaging satellite company, and its Brazilian sister company SCCON challenged Kagglers to label satellite image clips with atmospheric conditions and various classes of land cover and land use. The hope was that the resulting algorithms would help people better understand where, how and why deforestation is happening all over the world—and how to respond when they see it in action.
Cadman said the team worked well together because everyone brought a slightly different skillset to the table. They were also used to expressing dissent, something that is encouraged within the actuarial profession in order to get better results.
“We pointed out flaws in each other’s thinking. There were no hard feelings; we knew we needed each other’s help to learn,” said Cadman.
Cadman’s team turned to artificial neural networks (ANNs) to face the challenge of image recognition. “There’s lots of data in an image,” said Cadman. “Neural networks offer a way of applying non-linearity to your model, which gives you a much larger range of possibilities.”
But Cadman’s team went beyond ANNs, using pre-trained neural networks that were already capable of recognizing image features like color blobs and textures. One of Cadman’s teammates bought a GPU and used it to run the pre-trained ANNs. A GPU, which is designed to use lots of data in imaging tasks such as gaming, was an ideal tool for their scenario, and Cadman said these pre-trained neural networks produced many predictions. In order to allow the ANNs to recognize all the features that were there, his team decided to change the pictures – rotating them, flipping them on an axis and more—all ways of giving the ANNs more information to use.
Cadman’s team used six total pre-trained ANNs. They realized that the more data they had, the better they were placing on the public leaderboards that allow teams to track how they’re doing in the Kaggle competition.
“It was adding that sixth network, adding to the pool of data, that pushed us to the top,” said Cadman.
He added that how his team handled the data made a big impact on his team’s placement, especially when it came to sharing the data amongst themselves. The team communicated well, focused on good programming and asked analytical questions about the data they received from other teams to ensure they kept their data clean.
Cadman’s main advice for actuaries who may be considering a Kaggle competition is to find a team with which you work well. “Start small, and don’t get too disappointed if things don’t work out. There’s a lot to learn if you’ve never done this kind of work before,” said Cadman.
But the learning process was also the most fun part of the competition for Cadman. “It was all very challenging,” he said. “We got a little bit lucky—the three of us have decided to keep doing competitions, even though so far this is the only one we’ve done. I really enjoyed learning new skills, like coding in Python for the first time.”