Open-source artificial intelligence

[1] These attributes extend to each of the system's components, including datasets, code, and model parameters, promoting a collaborative and transparent approach to AI development.

[3] Only the owning company or organization can modify or distribute a closed-source artificial intelligence system, prioritizing control and protection of intellectual property over external contributions and transparency.

[19] Open-source AI has evolved significantly over the past few decades, with contributions from various academic institutions, research labs, tech companies, and independent developers.

[28] Scikit-learn became one of the most widely used libraries for machine learning due to its ease of use and robust functionality, providing implementations of common algorithms like regression, classification, and clustering.

[32][33] These frameworks allowed researchers and developers to build and train sophisticated neural networks for tasks like image recognition, natural language processing (NLP), and autonomous driving.

[34][35] During this time, AI models like Google's BERT (2018) for natural language processing and OpenAI's GPT series (2018–present) for text generation also became widely available in open-source form.

Companies and research organizations began to release large-scale pre-trained models to the public, which led to a boom in both commercial and academic applications of AI.

Notably, Hugging Face, a company focused on NLP, became a hub for the development and distribution of state-of-the-art AI models, including open-source versions of transformers like GPT-2 and BERT.

[44] As of October 2024, the foundation comprised 77 member companies from North America, Europe, and Asia, and hosted 67 open-source software (OSS) projects contributed by a diverse array of organizations, including silicon valley giants such as Nvidia, Amazon, Intel, and Microsoft.

[47] Upon its inception, the foundation formed a governing board comprising representatives from its initial members: AMD, Amazon Web Services, Google Cloud, Hugging Face, IBM, Intel, Meta, Microsoft, and NVIDIA.

[53][54] As a result, frameworks for responsible AI development and the creation of guidelines for documenting ethical considerations, such as the Model Card concept introduced by Google, have gained popularity, though studies show the continued need for their adoption to avoid unintended negative outcomes.

[59] Tensorflow, initially developed by Google, supports large-scale ML models, especially in production environments requiring scalability, such as healthcare, finance, and retail.

[60] PyTorch, favored for its flexibility and ease of use, has been particularly popular in research and academia, supporting everything from basic ML models to advanced deep learning applications, and it is now widely used by the industry, too.

[61] Open-source AI has played a crucial role in developing and adopting of Large Language Models (LLMs), transforming text generation and comprehension capabilities.

[62] These open-source LLMs have democratized access to advanced language technologies, enabling developers to create applications such as personalized assistants, legal document analysis, and educational tools without relying on proprietary systems.

Hugging Face's MarianMT is a prominent example, providing support for a wide range of language pairs, becoming a valuable tool for translation and global communication.

[68] OpenCV provides a comprehensive set of functions that can support real-time computer vision applications, such as image recognition, motion tracking, and facial detection.

[68][69] The library includes a range of pre-trained models and utilities for handling common tasks, making OpenCV into a valuable resource for both beginners and experts of the field.

Beyond OpenCV, other open-source computer vision models like YOLO (You Only Look Once) and Detectron2 offer specialized frameworks for object detection, classification, and segmentation, contributing to advancements in applications like security, autonomous vehicles, and medical imaging.

[72] This shift from convolutional operations to attention mechanisms enables ViT models to achieve state-of-the-art accuracy in image classification and other tasks, pushing the boundaries of computer vision applications.

[77] Open-source libraries like Tensorflow and PyTorch have been applied extensively in medical imaging for tasks such as tumor detection, improving the speed and accuracy of diagnostic processes.

[78][77] Additionally, OpenChem, an open-source library specifically geared toward chemistry and biology applications, enables the development of predictive models for drug discovery, helping researchers identify potential compounds for treatment.

[51] Chinese researchers used an earlier version of Llama to develop tools like ChatBIT, optimized for military intelligence and decision-making, prompting Meta to expand its partnerships with U.S. contractors to ensure the technology could be used strategically for national security.

By making AI tools freely available, open-source platforms empower individuals, research institutions, and companies to contribute, adapt, and innovate on top of existing technologies.

[31][24] Beyond enhancements directly within ML and deep learning, this collaboration can lead to faster advancements in the products of AI, as shared knowledge and expertise are pooled together.

[85] With contributions from a broad spectrum of perspectives, open-source AI has the potential to create more fair, accountable, and impactful technologies that better serve global communities.

These frameworks, often products of independent studies and interdisciplinary collaborations, are frequently adapted and shared across platforms like GitHub and Hugging Face to encourage community-driven enhancements.

[93] These issues are compounded by AI documentation practices, which often lack actionable guidance and only briefly outline ethical risks without providing concrete solutions.

[91] This lack of interpretability can hinder accountability, making it difficult to identify why a model made a particular decision or to ensure it operates fairly across diverse groups.

A video about the importance of transparency of AI in medicine