A data lake is an “operational data store with a much grander mission of storing all the categories of corporate data for a wide variety of data distribution use cases,” as Jared Hillam, vice president of emerging technologies at Intricity, explained in this video.
The Role of Data Scientists in a Data Lake
The key is figuring out what to do with the data in that large lake. Bozman said that’s where data scientists can be most beneficial. They come in to look at all that data and figure out what’s valuable and usable. She explained that data scientists know how to cross-pollinate different types of data sets to produce impactful results.
This is often done using open source tools like Python and R to query or collect desperate data then build models that fit into workflows, according to Curt Savoie, smart cities data analyst for IDC, a global market intelligence firm, and former chief analytics officer for the Mass. Department of Revenue.
In an interview he said using open source can bring about change and break down organizational silos in government, posing big challenges and new opportunities.
“We’ve got smart parking and smart lighting and lots of sensors being deployed for environmental and noise pollution,” Savoie said. “We’re starting to see all these pieces and that’s great, but as cities invest in the aftermath – we now have 10 different platform ecosystems across the city and no way to combine them really well.”
Securing the Data, and Staying in Privacy Lanes
He said most data is managed with on-premises solutions.
“What we are seeing often is purpose specific data lakes or warehouses built around a singular domain or research set of questions,” he said. “This is less often because of technical issues and more due to siloed work and sometimes legislation and privacy concerns.”
That’s because a lot of data collected by cities is regulated by laws like the Health Insurance and Accountability Act and the Family Educational Rights and Privacy Act (administrating to school districts). The data needs to be audited and secured. Some data needs to be anonymized, though that doesn’t necessarily take away from its value, said Savoie.
[Related: Laws Shaping the Future of Data Management]
“When we want to know how many people got measles in the United States, we don’t need to know their names in order for that data to still be useful,” Bozman said.
Cities can also be challenged by what to actually do with the data, especially data that comes from different platforms. This is the big issue, according to Savoie.
“You can be more agile and bring data in and out when you need it. Data lakes are monoliths, you only build a data lake when you know what the question is,” said Julie Pierce, director of data and digital at the Food Standards Agency (FSA), a UK governmental body that administers food safety. She has seen, “people spend millions building something and then they are not sure what questions to ask.”
Where Data Lakes are Working
Some smart cities are combining their data lakes to do significant things. Massachusetts is mapping opioid overdose deaths through the Chapter 55 Project, which used 10 different data sets from five different agencies to create a comprehensive look at the state’s opioid crisis.
St. Petersburg, Fla. is making data from the Codes Compliance Assistance Department available on a dashboard to map areas with high code violations and is identifying regions of blight, all through the city’s open data and accountability platform called StPeteStat. Cincinnati partnered with the University of Chicago’s Data Science for Social Good program to use historic EMS data to create a predictive analytic model that projects where and when EMS events are more likely to occur.
Tapping Into Citizen Hackers
Municipalities that were already stretched on time and budget are partnering with white hat hackers or civic hackers, who want to do good by tapping into publicly accessible government data to create applications that benefit cities.
Code For America, for example, is a non-profit working in municipalities both large and small. In 2018, they helped with projects like building the GetCalFresh platform, which is operating in 36 of California’s 58 counties. It’s a mobile-based approach to letting people sign up for food stamps through an easier, more intuitive system that has helped 140,000 people.
Code for America also piloted a program to deliver integrated safety net benefits to citizens in Colorado, Michigan, Vermont, Louisiana and Alaska. The group sponsors community fellowships that pay for mid-career professionals to work on civic hacking projects.
In addition, local governments partner with area universities to explore the potential of data lakes. The Boston Area Research Initiative is a partnership between Harvard and Northeastern in conjunction with the city of Boston. Philadelphia has Actionable Intelligence for Social Policy, which links the University of Pennsylvania with state and local governments to develop integrated data systems.
Making Data Public
In order for civic hacking projects and academic partnerships to work, Savoie said, municipalities need to make data publicly available. That’s already the case in big cities like New York, Chicago, Philadelphia and San Francisco, he said.
“You’re starting to see smaller and mid-sized cities find their way,” Savoie said. He cited the Jersey City Data project, which shares public data sets on everything from public safety to infrastructure, and Open Raleigh, a stable platform for the city’s evolving community services.
Moving applications, data storage and software to on- and-off-premises cloud platforms is aiding small to mid-sized areas along the way by helping those with limited budgets to afford to try new innovation and manage costs as needs change. These new data and application technologies are also shifting the role of municipal CIOs away from just keeping equipment up and running to more of leadership and decision-making role.
[Related: This Government CIO Gets Tech Tips at Coffee Shops and Racetracks]
As some of these projects show, smart city leaders can make new, more personalized services,using data in ways that can make life better for everyone. Now, if that data could just find us a parking spot at the post office.
Jen A. Miller is a writer, author and ultramarathoner living in New Jersey. You can find her on twitter @byjenamiller.
© 2019 Nutanix, Inc. All rights reserved. For additional legal information, please go here.