euro-pravda.org.ua

A new method for analyzing legal texts has been proposed.

The field of law faces the urgent need for immediate and accurate analysis of numerous legal documents, court rulings, and legislative acts. Traditional analysis methods have often proven to be insufficiently effective, highlighting the demand for modern technological solutions. In particular, the TF-IDF method, which serves as a foundation for constructing decision trees, is an effective tool for identifying keywords and concepts. This approach has been proposed by researchers at the Moscow Technical University of Communications and Informatics (MTUSI) for the analysis of legal texts.
Представлен новый способ анализа правовых документов.

A decision tree is a machine learning method, represented as a tree-like structure where each node signifies a question or test regarding a specific property of the data, each branch corresponds to a possible answer to that question, and each leaf of the tree represents a prediction or decision. Constructing a decision tree based on the TF-IDF algorithm allows for the consideration of word importance, highlighting key terms while filtering out frequently occurring words. This approach facilitates the handling of text data, enhances the interpretability of results, and has minimal preprocessing requirements, making it suitable for categorization and thematic analysis tasks.

At MTUCI, a new methodology for applying decision trees based on the TF-IDF method for natural language analysis in civil law tasks has been developed by: Elena Aleksandrovna Skorodumova, Associate Professor of the Department of TViPM, Candidate of Physical and Mathematical Sciences, and Dianna Zakharieva, a student at MTUCI.

During the research, they collected a dataset from the web resource https://sudact.ru/, which was then subjected to detailed analysis with a focus on identifying relevant chapters and articles of the civil code.

“As part of the information collection, 12 civil law cases were extracted, which were subsequently studied and analyzed in detail. The extracted verdicts were processed to highlight the reasoning parts of the claims, which were later incorporated into the developed program for further investigation. Ultimately, the program generated a list of chapters and articles from the civil and family codes, with each associated with a numerical value reflecting the degree of correspondence between the reasoning part of the claim and the content of a specific chapter and article. The matching and similarity assessment procedure was conducted for each chapter and article separately,” notes Elena Aleksandrovna.

The researchers emphasize that prior to analyzing the correspondence of articles, it is essential to identify the relevant chapters based on their position in a list sorted in descending order of relevance metrics.

“The decision tree was formed in several stages. Initially, TF-IDF values were calculated for the codes, followed by the sections of those codes. Subsequent stages involved calculating TF-IDF for subdivisions and, finally, for chapters. The TF-IDF values obtained at each level of the hierarchy were multiplied together. The resulting list then underwent an ordering process, arranging the elements in descending order of values. This allowed for the identification of those chapters that most accurately correspond to the claim,” explained Dianna Zakharieva about the research.

In constructing a decision tree based on the TF-IDF algorithm for finding relevant chapters, factors affecting model quality were identified: low efficiency when dealing with large volumes of text and a lack of context consideration. In analyzing the similarity of articles and claims, it was found that relevant articles were located in the first half of the sorted list in descending order of metrics.

It was established that using a decision tree based on the TF-IDF algorithm effectively filters out the most irrelevant articles and chapters. In other words, this method can eliminate approximately half of the chapters, and within each relevant chapter, it can also discard about half of the articles based on their degree of correspondence.

The researchers are confident that the new method has potential for further development. They plan to conduct additional studies and adapt the methodology for broader application in various contexts, which will open new horizons for effective text analysis in the field of law.

The material is prepared based on the article “Application of a Decision Tree Based on the TF-IDF Method for Natural Language Analysis in Civil Law Tasks,” published in the collection of works “Technologies of the Information Society” (XVIII International Industry Scientific and Technical Conference).