LIDI Project To Lower Legal Data Barriers, Help NLP Training

A new project to increase access to legal data, called the Legal Innovation Data Institute (LIDI), has launched in Canada.

‘LIDI lowers the barriers, changes the equation, and expands the circle of innovation in Canadian legal data beyond the small and closed group of legal publishers that currently possess extensive primary law collections (i.e. judgments, legislation and regulations),’ the group said.

And when they say ‘the small and closed group of legal publishers‘, you know who they are talking about….

LIDI will operate as ‘a steward of public, but sensitive court and tribunal rulings and other legal data’. In short it will make it easier for lawyers and other interested parties, including tech companies that perhaps need legal data for NLP machine learning training, to access and make use of a broad collection of legal information.

In terms of legal technology the group added that: ‘There is a direct correlation between the accessibility of legal data and… legal data innovation. If the barriers to access are too high, the effort deemed too great, and the payoff deemed too little, then the opportunity for advancement is lost.’

This point is especially important to the training of machine learning systems based on NLP (natural language processing). Without enough relevant data to build up its accuracy, NLP cannot return a useful result. But, if case law is buried behind an expensive pay wall, or has strict licence rules, then this effectively shuts out all but the richest companies.

The lack of access to comprehensive and well-organised court data has also been a source of discussion in the UK, where a variety of private deals between courts and large companies, and patchy data access rules, have put smaller legal tech companies at a disadvantage when it comes to developing their own dispute-related NLP tools. There has even been an open consultation on the issue here.

Colin Lachance, founder and Executive Director of the project, told Artificial Lawyer: ‘API and bulk access to legal data compilations is foundational to legal innovation. Sharing content in this way has been a personal mission for years, so I’m thrilled to have finally sorted out a model and attracted a sufficient base of supporters to make it a reality.’

Lachance has most recently been vLex’s general manager for North America, and the global legal research company is backing the project along with several others such as Justia, which provides free case law information.

The group explained its goal like this: ‘Imagine trying to write a story with only a partial alphabet. Or trying to write a song with access to just a few notes. It’s the same with legal data – the more you can access, the more you can do. Through a unique member and collaborator model, LIDI lowers legal data access barriers in Canada and facilitates innovation on an unprecedented scale.’

LIDI will be not-for-profit and ‘takes inspiration from the global free access to law community’. That said, membership is needed for most parties to access the data. Although they are also looking for sponsorship for those who cannot pay for that membership. They added: ‘Where full membership [could be] out-of-reach for some groups, individuals and startups, in these circumstances, LIDI would still like to enable short-term access to these innovators and researchers…with [sponsor] help’.

All well and good, but what is going into this initially Canadian legal data trust?

The base collection is supplied by Compass, the successor to the Maritime Law Book (a Canadian publisher of semi-official provincial and national case law reporter series) and the source of the Canadian case law in the vLex legal databases.

Also, the initial content licensed into the LIDI Data Trust includes nearly all judgments published by 43 Canadian courts.

In addition, the collection includes nearly 200,000 case law headnotes and over 580,000 topic digests ordered according to a 150 topic Key Number System. The content is available to members and collaborators as bulk data and through highly extensible and customisable API’s using the ThinkData Works’ Namara platform.

The project will also be collaborating with public and private sector entities in the following areas:

1) protection of personal privacy, including co-development with Private AI of intelligent de-identification machine learning models that differentiate between justice system participants and private citizens engaged as parties or witnesses;

2) data clean-up, normalization and enrichment;

3) development of free public legal apps; and

4) advancing French language access to justice through extending to French materials the innovations and legal artificial intelligence models developed for English content. (AL Note: around 20% of Canadians speak French as their first language.)

The other main members of the project are:

  • Dr. Randy Goebel – co-founder of the Alberta Machine Intelligence Institute and Principal Researcher of the Amii xAI (explainable) Lab
  • Meredith Brown – justice system innovation expert and partner with Calibrate Solutions, and former Innovation Office Executive Director within the Ontario Ministry of Attorney General and past Chief Legal Officer to three Deputy Attorneys General
  • Noel Corriveau – ‘responsible AI’ expert and legal counsel with INQ Data Law, and former Special Advisor on Artificial Intelligence to the Chief Information Officer of Canada within the Treasury Board of Canada Secretariat
  • Sarah Glassmeyer – American Bar Association Centre for Innovation project specialist and legal counsel to ABA Standing Committee on Delivery of Legal Services, a “lawyer, librarian and technologist” and former Research Fellow of the Harvard Library Innovation Lab
  • Cory Janssen – CEO of AltaML, one of Canada’s largest “pure-play” machine learning companies building transformational software applications.