[PIG] Ad Fontes: Enabling Data Accessibility and Utilization for California Water Management

Shwetha Rao @shwetharao | Patrick Atwater @patwater

Ever heard of the Renaissance imperative “Ad fontes”? It means “to the source,” and that’s exactly what we’re doing with this project—getting to the heart of California’s water data challenges.

Climate change has been increasing water scarcity across the globe. Regional and national governments around the world collect information from urban and agricultural water providers to inform and enforce policies to manage that increasingly scarce resource. Basic information about how much water is being used where, when and to what purpose is shockingly sparse given the criticality of that challenge. Even affluent, “developed” parts of the world like California have an astonishingly complex and confusing array of various reports providing that information. This project team has worked on digital public goods broadly and California water data specifically for almost two decades. This project will pilot new water data reporting protocols using a “t-shaped” approach that capitalizes on the project team’s deep local experience along with broad global perspective and connections.

1. Local depth: the Crescenta Valley Water District (CVWD) is an “anytown California” typical suburban water retailer that operates squarely in the middle of the bell curve. The project team has a unique in with management at the district and the opportunity to deploy live experiments. Below represents potential vectors for developing new protocols.

A. Generalizable Protocols for Reporting Water Data:
One approach involves the development of standardized protocols for reporting water data, such as the monthly SAFER report, which plays a pivotal role in urban water conservation and drought adaptation efforts. Currently, this process entails significant interdepartmental coordination, manual data entry, lack of standardization across retailers presenting opportunities for automation and generalizable protocols. By integrating data directly from existing systems, such as the billing system for demand data and the Supervisory Control and Data Acquisition (SCADA) system for supply data, we can streamline reporting processes and improve data accuracy and timeliness.

B. Formalizing and Automating Water Quality Data Protocols:
Another avenue for protocol development focuses on enhancing protocols for monitoring and reporting water quality data. Currently, CVWD employs a combination of pen-and-paper field logs, SCADA system data, field analyzers, and third-party laboratory tests, resulting in time-intensive and error-prone data collection processes. By formalizing and automating these protocols, we can centralize data streams, develop automated workflows, and reduce reliance on manual data entry. This initiative aligns with CVWD’s ongoing water quality data transformation efforts and contributes to broader regional and global water data protocol improvement initiatives.

2. Global breadth: other utilities in California and other regions like Australia, Israel, Denmark, India and the UK have implemented water data modernization efforts. Our project team is connected to many of those initiatives and plans to utilize comparative research to inform local experimentation and vice versa.

A. The experiments to develop a generalized protocol for water data reporting, beginning with Crescenta Valley, will be rooted in extensive global research. Shwetha’s deep involvement in the evolution and evangelism of an open and decentralized protocol called beckn protocol will guide our exploration. We’ll assess whether the beckn protocol or other existing protocols can serve as reference points for developing a new protocol for water data sharing. Additionally, we’ll engage with utilities worldwide, leveraging existing connections to facilitate discussions and gather insights for our endeavor.

B. That global research will be coupled with a deep understanding of the CA water data reporting landscape that is well described in the CA Water Data Consortium Study linked here. Patrick has been deeply involved in that water data movement for a decade. For example, he helped launch the California Data Collaborative (theCaDC.org), including a protocol based approach to water rates which won California’s “moonshot” award at the annual state sponsored water data challenge. That water rate project is still live and lessons learned could be resurrected through a CVWD pilot written up in a blog post reflecting on what worked, what didn’t and lessons learned from that protocol as a way to test out interest and energy in other water networks across the globe. Open-Water-Rate-Specification/full_utility_rates at master · California-Data-Collaborative/Open-Water-Rate-Specification · GitHub

Questions and Answers

What is the existing target protocol you are hoping to improve or enhance? Eg: hand-washing, traffic system, connector standards, carbon trading.

California law, regulation and industry standard practices require a variety of regularly reported water supply, demand, quality and other critical data. This project will pilot and experiment with a new protocol to help streamline water data reporting, utilization and accessibility.

What is the core idea or insight about potential improvement you want to pursue?

Over the past half decade, there has been broad consensus amongst California’s two state water agencies, policymakers and industry leaders about the need for and importance of improving water data infrastructure. State reporting still is incredibly complex, time consuming and confusing to even the most exquisitely trained industry insiders. Previous efforts to streamline reporting have focused on trying to harmonize the data categories that are sent to the state. This project proposes to dig deeper into the underlying systems of record and propose standardized data specifications that meet state reporting requirements. Those systems of record for example will include utility billing systems on the demand side and SCADA (Supervisory Control and Data Acquisition) on the supply side) to improve urban water supply and demand reporting. There are also opportunities to streamline water quality, groundwater and really the entire spectrum of water data: Pioneering Spirit | The next frontier of the CaDC

What is your discovery methodology for investigating the current state of the target protocol? Eg: field observation, expert interviews, historical data analysis, failure event analysis*

  1. Live Implementation and Experimentation: Utilizing the unique management buy in at a local water utility where part of the project team works. In addition, as previously mentioned, Patrick helped launch a water data collaborative (theCaDC.org) that would be a useful resource in testing any protocol refinements at other water agencies.
  2. Historical Analysis: Research about the existing protocols around the world to streamline water data reporting and sharing.
  3. Expert Interviews: A network like apolitical.co might be helpful in finding folks who are involved in the trenches in those reporting mechanics and interviewing about what other water data reporting protocols are out there.
  4. Community Immersion: Involvement in working group meetings of organizations such as the California Water Data Consortium (CWDC) to understand how water data sharing protocols are being shaped today.
  5. Gap Analysis: Understand how the existing data reporting processes can be standardized for California.

In what form will you prototype your improvement idea? Eg: Code, reference design implementation, draft proposal shared with experts for feedback, A/B test of ideas with a test audience, prototype hardware, etc.

To prototype our improvement idea, we will conduct a series of experiments, aimed at streamlining state reporting requirements. Crescenta Valley will serve as the testing ground to implement these experiments.

Our core module will involve developing a Standard Operating Protocol (SOP) for CVWD to submit monthly reports to regulatory bodies such as the State Water Resources Control Board (SWRCB) for the SAFER report and the Department of Drinking Water (DDW) for water quality reports, among others. These experiments will include the development and testing of Python scripts designed for transforming data for public reporting purposes. Feedback on our experiments will be actively sought from experts identified as judges for the project, ensuring that our prototype undergoes rigorous evaluation and refinement. Therefore, the prototype aims to develop a robust and standardized protocol for sharing water data in Crescenta Valley, paving the way to expand this protocol statewide in California.

How will you field-test your improvement idea? Eg: run a restricted pilot at an event, simulation, workshop, etc.

Patrick works at a CA urban water utility (CVWD) which we can use as our lab and previously helped launch a network of water utilities (theCaDC.org) committed to modernizing data management.

Who will be able to judge the quality of your output? Ideally name a few suitable judges.

The project team has past professional relationships with the following key California water data stakeholders.

  1. Meredith Lee, Western States Big Data Hub – a convener and coalition builder in applied academic circles relevant to water data
  2. Greg Gearhardt, SWRCB – Deputy Director of the Office of Information, Management and Analysis – a key operational implementer of AB 1755 at one of the major state agencies, informally known as an open data revolutionary
  3. Newsha Ajami, Lawrence Berkeley National Lab – a key applied water and natural resources researcher
  4. David Harris, CNRA – a key architect of AB 1755 implementation

How will you publish and evangelize your improvement idea? Eg: Submit proposal to a standards body, publish open-source code, produce and release a software development kit etc.

  1. Our project team will publish a series of blog posts documenting the results of the experiments described above. Our project team will share those posts with water data colleagues, new and old. In addition, our project team will share those posts with the California Data Collaborative (CaDC) management and ask that those be included in the CaDC newsletter which goes out to over a thousand leading California water managers and data practitioners.
  2. The python scripts created for water data reporting will be shared on a public GitHub repository to ensure transparency and accessibility.
  3. At the end of the summer, our project team will submit a proposal to the California Water Data Consortium to help accelerate the creation of protocols for data sharing, documentation, quality control, and public access of water data under Open and Transparent Water Data Act. Additionally, CVWD reports submitted to the state will ultimately become available on the state water data platform specified by AB 1755.

What is the success vision for your idea?

The dream scenario is that this work catalyzes change so that local water utilities only have to report a single piece of information – for instance the amount of water consumed in a particular month in their service area – once. The path to achieve that dream involves utilizing the existing legislative mandate laid out by AB 1755. The project teams experiments and documentation will be shared to help the State Water agencies continue to make progress toward the shared goals articulated in the AB 1755 Strategic Plan under the Open and Transparent Water Data Act.– namely, that data are: (1) sufficient, (2) accessible, (3) useful, and (4) used to inform water management in California.


Love the idea (perhaps unsurprisingly given my background). In my mind, there are really a couple pieces to this problem.

The first is a having a standard format - some sort of water accounting standard with a standardized set of data fields, with standardized customer class types, normalized using commonly accepted rules, etc.

Then you need adapters to get source data into the standard format. The dream would be for billing systems to do this work on their end and publish compliant data via APIs.

Lastly, you need to either get the state or other regulators to accept the standard data as-is, or you need adapters from the standard format into the formats required by various state data systems.


Seems like this idea could be very useful, and there’s a clear institutional need for it + recognition of that need. Kicking the tires a bit:

Agree with @ctull about the pieces and importance of standardization. It seems like adoption is really the key outcome to drive towards. Suppose you’re successful here. Is there a pathway to getting another CA town to fold in? Are there existing “protocols” (overloading the term a bit) re: fields and format you can adopt to avoid xkcd: Standards ?

You’ve listed a few use cases, e.g., billing, supply and demand recording, quality recording. Does one stand out now as the piece that would really drive adoption/utility to the utilities? To my ignorant eyes billing seems like maybe the most immediately useful, but also maybe the most already-covered-somehow. Supply and demand might be a nice middle ground (maybe).

Finally, and maybe most off the mark, is there a standard dashboard you can replicate and extend to show the value of your approach?


Not saying it’s exactly the correct model to follow, but in geospatial world we have the Open Geospatial Consortium (OGC) as a governing body that works globally to standardize geospatial data formats and access: https://www.ogc.org/. What I have found working on new cloud optimized data formats, that as much as we want them to come from community led discussions, it’s difficult to move pieces forward without some sort of governance structure. I think the OGC provides this, although it also comes with bureaucracy and a pay to play model.

If you think of using this call to not only discuss creating these standardized formats, but also as a place to discuss and propose a governance structure which could advocate for the work to develop in an open-source, FAIR, model - that would be powerful.


This idea appeals to me on a number of levels.

First, and probably most obvious, it would help provide for water arbitrage between utilities to deal with local shortages and surpluses. Provided there’s a physical connection. Without that standard data set, even if there were a physical connection, it would probably be very difficult to make water transfers.

Next, one of the challenges facing utilities of all stripes is the looming “Silver Tsunami”; the wave of retirements coming up in the next few years. These are senior personnel who have the deep knowledge needed to run utility systems. Standardized data protocols will help utilities train new workers and would increase worker mobility between utilities. This would be one less thing employees moving from one utility to another would need to learn. This standardization would also make it possible for colleges and universities to offer courses on manage water utility data and better prepare new workers for employment in the sector.

Third, it seems like data standardization would create new markets for information services to utilities by reducing barriers to entry.

I almost left out my own personal interest as an academic researcher in economics and policy. Standardizing data this way would make it far easier to perform impact studies and similar analyses.

1 Like