Government data is fuel for job creation, says Commerce Department CDO

iankalingotyoursixspeech.jpg
iankalingotyoursixspeech.jpg

Ian J. Kalin in November 2014 giving a Got Your Six Storyteller talk.

 Image: Got Your Six/YouTube

The US has been collecting and publishing nautical data since the 19th century, providing navigators with better maps of the oceans that they sailed and then steamed across. Today, government agencies publish data about labor, energy, health, transit, telecommunications, criminal justice, and just about everything else than can be measured, managed, performed, or regulated by state entities.

The 12 bureaus that make up the US Department of Commerce are among the most important collectors and publishers of data in the nation, and thus on the planet. While the US Census Bureau has been an international leader in publishing its data online for use and reuse across multiple platforms, the other 11 are still figuring out how to approach making data into a national strategic asset.

SEE: UK's first chief data officer to focus on making data a public asset

In Washington, DC, Commerce Secretary Penny Pritzker hired Ian J. Kalin in March 2015 to be the agency's first CDO, tasking him with improving the quantity and quality of data available to the public he serves. Our interview with Kalin, a former Presidential Innovation Fellow, director of open data at software vendor Socrata, energy entrepreneur, and US Navy veteran, follows, lightly edited for length and clarity.

If someone asks you to explain what a CDO is and does, what do you say?

Ian J. Kalin: My job is to help create jobs with information. That's what I do. Information helps people. Data is one way to talk about it. There's a lot of great data from the government that can help people create jobs and services. My job is to insure that there is a great quantity and quality of that information so that they can create those fantastic products.

What have you done so far in the role at Commerce?

Ian J. Kalin: Listened to my customers. The Commerce Department is an interesting and fantastic organization when it comes to data. There are whole bureaus, as you know, that constitute the major objectives for the Commerce Department. Those 12 bureaus are diverse, as the Secretary eloquently talked about, going from outer space to the deepest depths of the oceans. The missions, therefore, are also very diverse, and the data reflects that. So, in my first weeks, I'm on a listening tour. What are our products? Who's consuming them? How do you know that those are our customers? What are the customer pain points? How are we responding to them?

kalin-twitter.jpg
kalin-twitter.jpg

Ian J. Kalin

 Image: Ian J. Kalin/Twitter

Of course, when it comes to government information, it's not just a product focus, it's the "people, processes, and tools." Who are the internal customers of these data products or services or tools? How is that servicing folks? What's the customer satisfaction rate?

On my first trip with the Secretary, I was announced at the 2015 South by Southwest Festival. Of course, at "SouthBy," there's a great opportunity to meet the customers -- bold, unfiltered customers -- of your products, and let them tell you what's working and what's not working. And by the way, that's not just the American people, that's also other government employees who are fantastic public servants but also are struggling with the ability to achieve their own goals.

A week in, I'm listening, I'm meeting with folks, I have a fantastic partner in crime, Lynn Overmann, who is the deputy CDO, and it is an absolute privilege to work with her.

Which products created with Commerce data would you hold up as exemplars of other products that you'd like to see come into the world?

Ian J. Kalin: That's actually one of the most common questions that I'm asking people: who's building stuff with good data? From my personal experience, I'll say this: if you google "population of California" -- or use Bing, Yahoo, or whatever your preferred service provider is -- you don't just get the websites. You get what they call a "fact bar," or an actual answer to the question, and there's a little link to the Census on the website.

That's pretty cool. I like that it's helping people not just find websites but it's actual answers to their questions. I don't know what the biggest and best are around the world, but I'm actively collecting them.

A whole lot of singles, more than a home run, would be to share more in a centralized location of where specific companies are building products and services on Commerce data. I think we should just make it easier for folks to learn how this stuff is being used. It could be a blog. There are lots of ways to do that.

When I get around to publishing that, I can't help but think about my own experience. I came out of the Navy, and I had a really hard time trying to make it in the energy sector. I relied on government data for my company at that time. If we didn't have the Census and Bureau of Labor Statistics producer price indices for certain manufactured goods, we would never have been able to build some of our most successful products.

I've done it. I've been the consumer. I've tried to iterate with open data and make something of real value. I was really appreciative of the wholesale stuff that is generated that I turned into a retail product with some of the fantastic people at that company that advanced the cleantech sector. We supported about $35 billion of construction of power plants with that, and we wouldn't had been able to do that without government data. I empathize with people that are trying to do that same thing, and I want to know more about who that is.

The Obama administration has been asking for data scientists, and techies in general, to enter public service, and has been having some recent successes on that count.

Will you be working with DJ Patil, the White House's first data scientist?

Ian J. Kalin: I hope to work with him. As far as I know, there is no very clear organizational requirement, from a person-to-person perspective, but the man's brilliant. I think he's extraordinarily accomplished; he knows about the user perspective; he's worked in government; he's worked in the private sector, worked with stats. Anyone who comes into this role would make a terrible decision not to try, actively, to seek out his counsel and and guidance, formally and informally.

I believe, additionally, that DJ does bring data science principles and capabilities to this larger federal team that are relatively rare. I think a lot of the other executives, administrators, stewards, and custodians of data tend to have either a non-coding or non-tech management background, or say more of the traditional coder or software developer background. To include that new type of skillset is itself fantastic. I will say very honestly, I am not in the same data scientist bucket that DJ is. I would hope to bring a different kind of skillset to this organization and that work.

The press release announcing your role said that you're going to pull together a new platform for data. Doesn't that already exist at Data.gov?

Ian J. Kalin: I think the "at Commerce" was probably left out of that sentence. We do have a data.json file [as required by an executive order -Ed.], and there is a central catalog of Commerce data sets on a CKAN architecture, which is being pushed to the Data.gov folks. There's more to be done there.

I'm sure you also know about the FOIA request that's outstanding for the data inventory. We are publicly being shown on Data.gov to not have a comprehensive inventory or a comprehensive inventory of our data sets. I think that was the intent of that sentence [in the release], to get us "out of the red," so to speak, in terms of the latent delivery. The answer is no: building another Data.gov is not a great idea. It already exists, and it's working pretty well.

The Obama administration has also been urging people to put open government data to use since 2009. What have you learned from your different tours over that time that's going to inform your work? What will change as a result of those experiences?

Ian J. Kalin: I will confess that it does feel a bit early for me to have a great deal of confidence in the how. I have some ideas.

When I was at a small energy startup company called PowerAdvocate, we absolutely used Census data, Department of Labor statistics data, and, to some extent NOAA data, to build our own product that drove new businesses. I built products on top of this information, I was a consumer of it, and boy did I have opinions about it, as anyone should! If you fly, you have opinions about the airline, too.

I think that consumer perspective is driving a lot of my theories, my questions about what should be done. Until I can meet the people and get a sense of the basic tools, it's going to be really hard for me to come up with any basic recommendation. There are certainly areas that I'm prioritizing.

I think that the Commerce Department has very clearly led in some areas. As stated very clearly in the work of Data.gov and various organizations, some of the inventory, the products, and applications, some of the data collection and dissemination programs, are perhaps not where they could be, or where the Commerce Department has already said we want to be.

I consider my responsibility to be helping the bureaus and the people here to achieve those goals. I don't know what they're going to need, though, in terms of processes or tools. Ultimately, I should be accountable for that delivery. To the extent that it fails, you should hold me accountable. If it succeeds, it's going to be a team effort, with the support of the great people who have made that possible.

You see this stuff in a bunch of different organizations. Commerce is kind of a funny one. You have the foundations of bureaus like the Census, or NOAA, or the Patent Bureau. In the case of two of those three, you have a foundation back to Article 1 of the Constitution. We've been doing this data thing long before it was cool. In many ways, it's actually defined the concept of what a library is, at its most basic definition, which is what open data is all about: information condensed into a central area so that it can empower folks to achieve more than they could on their own.

That's what open data is now. It's just a digital version, which translates to a change in scale, which translates to a change in kind. Census and Patent, in particular, have been doing this a while. I may take advantage or build any type of program on top of that foundation, asking questions like how are we doing on the data inventory? How are we doing on the data services? How are we doing at understanding, from a central perspective, who are our customers? What are the products that they're using the most right now? How can we understand and engage those customers in a way that can deliver better, higher quality products at a certain time?

I think that a lot of the challenge will be, because Commerce is by definition decentralized and has such diverse bureaus, that there will be a "hub and spoke"-like problem, as in any organization of this size. So, what is the central value that can be delivered? How do you take a success from one bureau and show it to another? How do you manage security and privacy controls that will absolutely be essential but will probably be done a little bit differently in any of the different bureaus as much as different agencies.

Those are the challenges I expect on the way in. My early discussions with individuals have validated some of my theories, but there's still a lot more that I need to learn.

People talk about data in different ways depending upon how they are using it or would like it to be used, from accountability to services to economic outcomes. Industry buys a lot of data from government. Commerce has a lot of data that it sells. How are you thinking about signals from revenue as a demand signal?

Ian J. Kalin: It's an interesting question and, frankly, one that doesn't come up very often to me, and I've been with this open data game for a little while. I have to share a personal reflection, which has absolutely nothing to do with my current role with the Commerce Department.

About a year and a half ago, I was invited to give a presentation to an open data incubator and hackathon -- they had different words for it, but it was that type of event. I was in Paris, and there was an international panel of open data folks from different countries. I was talking about how, from my own research, the French government charges for the majority of the data that we would consider to normally be open. I simply asked it as a question to the audience -- about 100 people in the room -- wouldn't it be interesting if France started to give it away for free, similar to the American experience or the British experience with Data.gov and data.gov.uk?

The whole crowd instantly became angry at me, and not just because of whatever cultural translation. The developers in the room were basically telling me, in unison, don't you dare advocate for the French government to give this data away for free. I, as an international student of such things, asked why are you all so angry at me?

I think there's also a philosophy, which I care about personally, about how the data probably doesn't belong to the government: it belongs to the taxpayers. Ian. J. Kalin

They said because if you give it away for free, the quality will go down, and we rely on the quality so much. "I'd even be willing to pay more for it," said 50 people, seemingly in unison, just to ensure that this information doesn't goes away, because if the fuel goes away, our products will be impacted. If it's going to cost me more to get the data, I'd rather pay for it. I have to confess that was a lesson for me. It was an assumption that I had wrong. I learned a lot in that one meeting about some of the international differences for such a product.

Translating that personal experience down to my current role, I would say that it really depends upon the customer's needs and the quality of the product. I'm not sure if there is a bureau-level stronger guidance about "free vs. paid" for individual data products. I think it should be defined by the ecosystem of that product.

I consider data to be a fuel. Generally speaking, in my opinion, the government tends to excel at its comparative advantage of wholesale data generation. There are some fantastic exceptions to that -- I used to the work at the Department of Energy, and they have some of the best retail-level simulators in the world. I look at some of the products out of the Census Bureau -- things like American FactFinder or the American Community Survey -- different examples of wholesale vs. retail.

I think when we do retail that it should be done very, very well. It really depends upon what stage you're on in the ecosystem. When you take oil out of the earth, it costs differently than when you pump it out at the gas station. I think data can probably flow the same way. It should be cheaper at the point of wholesale generation, but as data goes through its own lifecycle -- generation, refinement, liberation, integration into products and services, which then itself generates more data -- at each step there should be a value chain improvement.

Just like in business, if I refine something or clean something or add value to it, there's a monetary result from that value added. I guess I'm not offended by that if that exists with data as well.

What I think is a unique obligation, though, from the government perspective, is one of equity. Some of the information is easier to access if you have certain tools vs. another, if you have a computer or if you don't.

I think there's also a philosophy, which I care about personally, about how the data probably doesn't belong to the government: it belongs to the taxpayers. We have a responsibility to be good custodians of that data, to be responsible stewards, to protect privacy and confidentiality. Ultimately, it is a service that we provide. For the American taxpayers paying for the data, there should be a way and a process for the people to have the data brought back to them. That's a combination of my personal opinion and experiences with the current systems. Whether or not any bureau chooses to continue charging for a data product or not, I couldn't say. I would need to learn more about that specific instance.

Let's take as a shared premise that there's a lot more data in the government, much of which can be unlocked for public consumption, that hasn't been digitized or published. What's your strategy for liberating that data? And how will you approach the challenge of government bureaucracies where workers may view data as their possession, not the public's?

Ian J. Kalin: I agree that it is not always in the best interest of the reward structure of the original custodian to want to publish more data. I don't agree the reason that you shared. I don't think it's because of "information is power" or "it's my data, versus your data." I think it has to do more with basic inventive structure. To put anything from one place to another is work.

If I'm doing new work, am I going to get paid more? Am I going to get promoted? Am I going to get an award for serving the American people in a better way? That kind of stuff is really hard sometimes in the federal government, in any government -- I've worked in city and state governments, and it's the same problem. It's the market factor and the individual reward structures for the employees.

When I, in a hypothetical scenario, go to a good American servant and say "hey, I need you to put data in a place that it's never been before," they're probably going to have to work longer. There's probably going to be some concern about whether I'm really allowed to do it and that I'm going to need to be able to describe the data, because no one is going to understand it if I just publish it with all these acronyms. Am I going to get in trouble from somebody who maybe hasn't seen this data before? That's a real disincentive, in a lot of ways.

I think too many data policy leaders forget about what it's really like at the ground level. When you're the data librarian or custodian or whatever, and you have to publish that first time, that really needs to be understood. In many ways, these people are my customers, not just the American people that are consumers, but the publishers. How easy is it to publish information? How seamless is it? Am I actually going to get rewarded for doing more things to return this information to people that deserve it? Does it advance the goals of the organization? I've been in those shoes. I feel like it's too often forgotten what the experience is like when we come up with new rules about how data is supposed to be shared.

The second thing I'll say, in terms of how am I going to organize this is: the truth is, there are actually fantastic examples within Census. Even in my first week, I can think about data being shared between bureaus, for the public good.

I saw a demonstration this morning from NOAA combining information from the Census to get a number of models and tools to define how the aquatic commerce system -- anything that's related to coastal sectors, think about fishing and marine tourism -- is impacted by weather, in a good, old-fashioned data mashup.

I think about some of the labor models, in terms of how certain project developments could be done in certain areas. There's a great model out of BEA [the Bureau of Economic Analysis] to help measure the labor impact. That's achieved because of a data integration and mashup effort between Census, NOAA and, in this case, BEA. There is a really fascinating report from the Bureau of Industry and Security over controlled commodities -- things like certain fuels or avionics equipment. Anything that goes out of the United States, there's a shipping report associated with that, and in order to get that information, there's a direct coordination with not just the Census but also the Department of Homeland Security.

These data integrations are happening right now between bureaus and agencies, and there are structures to support them. What I think is going to be an opportunity is to take those success stories, take those lessons learned, and then amplify them throughout the organization so that we can get a more comprehensive view of who the customers are throughout the entire department.

Let's dwell a bit more on the metaphor of "data as the new oil," since it's a popular one these days: can we compare an infinitely replicable digital commodity to a natural resource that exists in the physical world?

Ian J. Kalin: If I could argue the analogy, just for fun, data may be infinitely usable, in terms of its accessibility on the internet, so to speak, but in the technology tools to perform the ETL [Extract, Transform, and Load], the insights that will allow good data cleansing to be performed, the stewardship and obligation to ensure the data set that is part of a certain standard maintains the standard, when the standard revises, there's still labor, there's still a scarcity that prevents data quality from being instantaneous.

That principle, that reality that you don't have enough people who know how to do this stuff and make it better, I think drives data to be a type of scarce resource, or at least quality data, and brings it closer to the analogy of fuel.

What do you think of service level agreements (SLAs) for government APIs that would guarantee quality? A freemium model, where raw, dirty data is free and clean has a cost?

Ian J. Kalin: The way I'd answer that is to point to a highly successful experiment at NOAA. They have an innovative RFI [request for information] turned into a RFP [request for proposal] to determine the right type of public-private partnership that would empower NOAA to liberate so much more data than they could on their own. I don't know how they're ultimately going to award or achieve that, it's in the middle of a process, but what I love is that they're having a bold and transparent conversation with technology providers and other very innovative firms to even understand how they should answer that question. I love it more because of the execution style than the actual answer. They know that they don't have all the answers about the right way to do that, but they're engaging folks in, I think, a very fair process to determine how they should start to think about it.

Data businesses, I should hope, are knocking down our doors to help us figure out better ways to use our information. Ian J. Kalin

I do think that part of the innovation around data often has nothing to do with data. It's what processes exist and what tools am I allowed to use. Am I allowed to use GitHub or not? These are still questions that are being asked throughout all parts of the federal government. That's not really a data question -- that's a governance question. You need to figure out those things first before you can even get to how you can release information in a better way. Part of that is the processes and partnerships within individual companies.

We're the Commerce Department. The Secretary said, very clearly, that she feels an obligation to be the voice of business in America. You can't be the voice of business unless you're speaking to businesses. Data businesses, I should hope, are knocking down our doors to help us figure out better ways to use our information. If that leads to an SLA-type partnership or forum or council, I don't know, I guess I'd be open to it as a concept, but only if it supports the overall objectives of the organization. If it helps us create American jobs and improve our competitiveness abroad, and does so in a fair way, then yes, let's talk about it. If the Googles or Amazon Web Services of the world, or smaller organizations that may have even more fantastic innovations, can help us achieve those goals for the American people, I think it's an obligation for us to listen to those ideas and figure out a way to advance and integrate them in some way.

Are you doing anything to try to understand who is consuming your APIs and what impact that use is having?

Ian J. Kalin: We have a few fantastic developer portals. I know, even from the first week, that there are some brilliant folks in some of the bureaus who have a very, very good sense of how the data is being used. Two that come to mind instantaneously are some of the APIs out of Census and Trade, but there are many more than that. Those folks, when I ask them who their customers were, went into great detail based upon the API token registration to define how the information is being consumed, and how they're preparing for the next stage of their developer portals. They're looking to see how they can use that information to improve upon it.

I do know that there are some folks who have an API-first strategy for open data, which is great. I think that we could benefit from taking those lessons and seeing how we could expand them to other parts of Commerce. In general, if you go to Commerce.gov and play with some of the developer tools, I think it is fair to say that it is a differentiated customer experience. I do think, just as there are opportunities to pull together all the data sets, there are opportunities for improvements to pull together a platform for APIs as well.

How would you like your progress to be measured between now and January 2017, or whenever this tour of public service ends? What metrics should we use to assess impact?

Ian J. Kalin: Frankly, this came up a lot in my interviews. I don't mind saying this, but you know the speed of government. Any good government management should ask the person like me the question of "let's assume that you only have this hypothetical end date of January 2017. What can you accomplish in that time?" How much can you accomplish in any federal agency with big ideas in that period of time? I've worked for big corporations, and some of the same problems exist: how much can you really get done if you at least behave like you have an end point? I'm obsessed with that question. I've been driving even my wife crazy with how to even define this stuff.

I'm inspired by two specific examples. One is the citizen dashboard for Edmonton, Alberta, and the other is StateStat for recently departed Maryland Governor Martin O'Malley. Both of those government administrations have really good dashboards. Good, highly specific, data-driven, business-owned. They have scorecards that show the underlying information behind them, with live updates, which show if I'm doing badly as much as I'm doing well. They were transparent about it, they were open about the failures as much as about the wins. That's pretty rare. I love the way that those two governments did that.

I guess I hope to achieve something like that in this federal role, to come up with very clear measurable goals and definitions of success, and publish it in some way for the public to consume, and then to measure myself against that real data. I have a hunch about what those metrics should be, but I'm not going to know for sure until I talk to more people about what's really possible here at Commerce.

Also see