Lucas Broszies
This study evaluated a total of 10 summaries which were derived from news articles found on the BBC-web page. First, the summaries' retention ratios were calculated by dividing their word count by that of their corresponding original text. Next, the summaries were assessed using the ROUGE1 metric which is divided into the ROUGE-1 precision score, the ROUGE-1 recall score and the ROUGE-1 F1 score, the harmonious mean of the two others. These were calculated using tokenisation of the original texts and the quantification of unigram overlap of said tokens. Specifically, the ROUGE-1 precision score measures the amount of words in the summary which are relevant (i.e. keywords present in the original text) relative to the entire summary's length, whereas the ROUGE-1 recall score measures the proportion of words which were used in the summary that are also present in the original text. The ROUGE-1 F1 score is derived from multiplying the product of precision and recall scores divided by their sum by two. After this, the cosine similarity of all the summaries and their corresponding texts was calculated by, again, tokenising the texts, vectorising the two texts within the multidimensional space than arises from expressing each token as a dimension. To calculate the cosine similarity of the two vectors in the multidimensional space, the dot product of the vectors was divided by the product of their magnitudes. This then yielded a decimal representing the overall semantic similarity of the generated summary and the article. Overall, we reached the conclusion that ChatGPT's summary quality declines with longer texts, showing higher precision but lower summary retention, cosine similarity, and ROUGE-1 scores, with shorter texts performing better overall. This shows us the impact that input text length has upon the result. This suggests that the model used in the study may need further refinement for handling longer documents, particularly in terms of retaining key information and maintaining semantic similarity.
ChatGPT, Text Summarization, ROUGE-1, Cosine Similarity, Large Language Models, Natural Language Processing, AI Evaluation, Summary Quality, Semantic Similarity, Information Overload, Text Retention, Automated Summarisation.
Lucas Broszies, Independent Researcher, Schiller-Gymnasium Berlin, Germany.
Broszies, L. (2026). Summarised: An Analysis and Evaluation of ChatGPT as a Text-Summarisation Tool in an Era of Boundless Information. J Digi Assets Monetary Res. 1(1), 01-15.