Structured data in discovery

Handling Structured Data in Discovery

Much of the focus of electronic discovery in recent years has centred on preserving and obtaining text messages and workplace collaboration content. And yet, there are other key sources of electronically stored information (ESI) that are often overlooked and may be more significant for claims and defences than digital-age communications. Structured data is one of those sources.

Found in a variety of different repositories that are generally and broadly characterized as “databases,” structured data presents unique preservation, collection, and production difficulties. Indeed, requests for the discovery of structured data require far more than just a broadly worded request to produce “documents” that might ordinarily suffice for obtaining relevant communications, Microsoft Office materials, PDF files, or other unstructured information.

This article examines key issues surrounding the discovery of relevant structured data in civil litigation. In particular, we provide an overview of structured data that counsel may encounter in discovery. In addition, we explore key issues affecting discovery such as understanding the nature of the database housing the structured data, determining how to obtain ESI from a structured data repository, and seeking a reasonably usable production format for structured data. We conclude by discussing recommended practices—particularly the use of experts—for handling structured data discovery.

What is structured data?

Structured data, as the name implies, is data that is stored in a standardized format for ease of access and analysis. Excel spreadsheets and SQL databases are two examples of structured data formats in common use today.

Within structured data sources, schemas—blueprints that describe the structure of a database—are used to organize the data into records and strictly define elements (or fields) by specifying the names, types, and lengths of the fields and their relationship to each other. New data being entered must conform to that defined structure. Applications like Microsoft Excel and Access, and programming languages like Simple Query Language (SQL), are then used to access the data for management and analysis. Structured data may come as prepackaged “off the shelf” database systems, such as online applications like Salesforce and Hubspot. Some of these systems allow users to customize the look and feel of their content when displayed on a screen or exported in a formatted report, but behind the scenes, the content is maintained in the structured format established in the schema for that system. Rather than relying on publicly available databases, some organizations may choose to create their own structured data repositories, colloquially referred to as “bespoke” databases.

In contrast, unstructured data does not have a standardized format. It is freeform in nature and can take on many different formats. Images, audio and video media, and text-based data such as emails and articles are a few examples.

Organizations use structured data repositories because they provide easier access and management of information, scalability, indexing for faster searching and filtering, and simplified storage of large amounts of data. Of course, there are disadvantages to structured data systems, including strict limitations on what type of data can be stored in a particular source. If, for example, a user attempts to load data that does not adhere to the schema, the system can become corrupted and the data lost.

What are the key discovery issues with structured data?

There are any number of issues that could arise in discovery with structured data. Some of the most important issues include the following.

Understanding the nature of the database

Understanding the nature of the repository housing relevant structured data is a key initial issue. It is unlikely a client will be able to formulate proper search queries or have the information produced in a reasonably usable format if counsel cannot grasp the type of database at issue and related questions regarding how data is generated, maintained and accessed.

Obtaining ESI from a structured data repository

Another key discovery issue is obtaining usable ESI from a structured data repository. Given the specialized and dynamic nature of structured data and the myriad of different types of databases with varying schemas in which data could be stored, requesting parties should work closely with responding parties to formulate measured queries that target responsive information and are not unduly burdensome. Courts will examine the burdens of structured data discovery and may reject or modify demands that are not proportional to the needs of a particular case.

Production format

Getting a reasonably usable format for structured data production is an essential aspect of database discovery. If the production is in a format that limits a requesting party’s ability to comprehend the information or access related details or records, courts—taking into account burdens and other factors—should consider ordering the production in a more reasonably usable format. In some cases, the structured data may be stored in a standard, commercially available database program that permits exporting the raw data from one system and importing that data easily into another system. In other cases, the data may not be so easily transferred from one system to another or the cost of doing so may be disproportionate to the needs of the case. In certain situations, the only reasonable option may be to agree on queries that can be run or reports generated and produced. Blair is exemplary on this point, where the court was able to rely on expert testimony in ordering a native format production to ensure the requesting parties had ready access to underlying data relevant to their claims.

Recommended practices for handling structured data discovery

These discovery issues underscore the need to adopt certain practices for handling structured data discovery. Three practices that can particularly aid parties on the issues include the following.

  1. Use experts. Parties should consider engaging experts to help with structured data discovery. Structured data experts can help fashion reasonable search queries, determine an appropriate export format for production, and educate the court through written or oral expert testimony. 
  1. Meet and confer. Parties should meet and confer on issues regarding the discovery of structured data and involve their respective experts in their discussions. Topics on which the parties may confer include exploring the contents and structure of the databases at issue, discussing the fields and information that are pertinent to the requesting party’s inquiry, reaching an agreement on a set of queries to be made for discoverable information, and determining an appropriate production format. The parties may also consider whether the responding party should have an obligation to explain codes, abbreviations, or other information necessary to ensure that the produced information is reasonably usable.

3. Use ESI protocols. After their meeting and conferring, the parties should consider memorializing in an ESI protocol the provisions on which they have agreed for handling structured data. Using an agreed-upon protocol could very well ameliorate future disputes over structured data. Even with an ESI protocol addressing structured data, parties should nonetheless be willing to address issues that could arise during discovery since even the most thorough protocols may not address every conceivable issue with structured data discovery.

Follow the link to review the full “Handling Structured Data in Discoveryarticle with relative US case information.


Philip Favro

Phil Favro is a leading expert on issues relating to the discovery of electronically stored information. Phil serves as a court-appointed special master, expert witness, and trusted advisor to law firms and organizations on matters involving electronic discovery and ESI. He is a nationally recognized scholar on electronic discovery, with courts and academic journals citing his articles.

Gregg Parker

Gregg Parker possesses more than 20 years of hands-on experience and success in IT leadership, including 10+ years in eDiscovery operations and in-depth knowledge of the use of Relativity and associated tools to conduct eDiscovery operations. He has technical expertise and legal qualifications with a track record of leading diverse teams of professionals to new levels of success in a variety of disciplines, including information technology, legal, and humanitarian operations.

Subscribe to the Legal Practice Intelligence fortnightly eBulletin.   

Disclaimer: The views and opinions expressed in this article do not necessarily reflect the official policy or position of Novum Learning or Legal Practice Intelligence (LPI). While every attempt has been made to ensure that the information in this article has been obtained from reliable sources, neither Novum Learning or LPI nor the author is responsible for any errors or omissions, or for the results obtained from the use of this information, as the content published here is for information purposes only. The article does not constitute a comprehensive or complete statement of the matters discussed or the law relating thereto and does not constitute professional and/or financial advice.

Back to blog