Artificial Intelligence Glossarium: 1000 terms

Bidirectional language model (Двунаправленная языковая модель) – A language model that determines the probability that a given token is present at a given location in an excerpt of text based on the preceding and following text.

Big data (Большие данные) is a term for sets of digital data whose large size, rate of increase or complexity requires significant computing power for processing and special software tools for analysis and presentation in the form of human-perceptible results.

Big O notation (Запись Big O notation) – A mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. It is a member of a family of notations invented by Paul Bachmann, Edmund Landau, and others, collectively called Bachmann – Landau notation or asymptotic notation [[81 - Big O notation [Электронный ресурс] // upread.ru URL: https://upread.ru/art.php?id=659 (дата обращения: 04.02.2022)]].

Bigram (Биграмм) – An N-gram in which N=2.

Binary choice regression model (Регрессионная модель бинарного выбора) is a regression model in which the dependent variable is dichotomous or binary. Dependent variable can take only two values and mean, for example, belonging to a particular group.

Binary classification (Двоичная, бинарная или дихотомическая классификация) — A type of classification task that outputs one of two mutually exclusive classes. For example, a machine learning model that evaluates email messages and outputs either “spam” or “not spam” is a binary classifier.

Binary format (Двоичный формат) Any file format in which information is encoded in some format other than a standard character-encoding scheme. A file written in binary format contains information that is not displayable as characters. Software capable of understanding the particular binary format method of encoding information must be used to interpret the information in a binary-formatted file. Binary formats are often used to store more information in less space than possible in a character format file. They can also be searched and analyzed more quickly by appropriate software. A file written in binary format could store the number “7” as a binary number (instead of as a character) in as little as 3 bits (i.e., 111), but would more typically use 4 bits (i.e., 0111). Binary formats are not normally portable, however. Software program files are written in binary format. Examples of numeric data files distributed in binary format include the IBM-binary versions of the Center for Research in Security Prices files and the U.S. Department of Commerce’s National Trade Data Bank on CD-ROM. The International Monetary Fund distributes International Financial Statistics in a mixed-character format and binary (packed-decimal) format. SAS and SPSS store their system files in binary format. [[82 - Binary format [Электронный ресурс] www.umich.edu URL: https://www.icpsr.umich.edu/web/ICPSR/cms/2042#B (https://www.icpsr.umich.edu/web/ICPSR/cms/2042#B) (дата обращения: 07.07.2022)]]

Binary number (Двоичное число) A number written using binary notation which only uses zeros and ones. Example: Decimal number 7 in binary notation is: 111. [[83 - Binary number [Электронный ресурс] www.umich.edu URL: https://www.icpsr.umich.edu/web/ICPSR/cms/2042#B (https://www.icpsr.umich.edu/web/ICPSR/cms/2042#B) (дата обращения: 07.07.2022)]]

Binary tree (Бинарное дерево) – A tree data structure in which each node has at most two children, which are referred to as the left child and the right child. A recursive definition using just set theory notions is that a (non-empty) binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set. Some authors allow the binary tree to be the empty set as well. [[84 - Binary tree [Электронный ресурс] // habr.com URL: https://habr.com/ru/post/267855/ (https://habr.com/ru/post/267855/) (дата обращения: 31.01.2022)]]

Binning (Биннинг) is the process of combining charge from neighboring pixels in a CCD during readout. This process is performed prior to digitization in the CCD chip using dedicated serial and parallel register control. The two main benefits of binning are improved signal-to-noise ratio (SNR) and the ability to increase frame rates, albeit at the cost of reduced spatial resolution.

Bioconservatism (Биоконсерватизм) (a portmanteau of biology and conservatism) is a stance of hesitancy and skepticism regarding radical technological advances, especially those that seek to modify or enhance the human condition. Bioconservatism is characterized by a belief that technological trends in today’s society risk compromising human dignity, and by opposition to movements and technologies including transhumanism, human genetic modification, “strong” artificial intelligence, and the technological singularity. Many bioconservatives also oppose the use of technologies such as life extension and preimplantation genetic screening [[85 - Bioconservatism [Электронный ресурс] //en.wikipedia.org URL: https://en.wikipedia.org/wiki/Bioconservatism (https://en.wikipedia.org/wiki/Bioconservatism) (дата обращения: 07.07.2022)],[86 - .Bioconservatism [Электронный ресурс] www.wise-geek.com URL: https://www.wise-geek.com/what-is-bioconservatism.htm (https://www.wise-geek.com/what-is-bioconservatism.htm) (дата обращения: 07.07.2022)]].

Biometrics (Биометрия) is a people recognition system, one or more physical or behavioral traits.

Black box (Чёрный ящик) – A description of some deep learning system. They take an input and provide an output, but the calculations that occur in between are not easy for humans to interpret.

Blackboard system (Системы, использующие принцип классной доски) – An artificial intelligence approach based on the blackboard architectural model, where a common knowledge base, the “blackboard”, is iteratively updated by a diverse group of specialist knowledge sources, starting with a problem specification and ending with a solution. Each knowledge source updates the blackboard with a partial solution when its internal constraints match the blackboard state. In this way, the specialists work together to solve the problem.

BLEU (Bilingual Evaluation Understudy) (Алгоритм BLEU) – A score between 0.0 and 1.0, inclusive, indicating the quality of a translation between two human languages (for example, between English and Russian). A BLEU score of 1.0 indicates a perfect translation; a BLEU score of 0.0 indicates a terrible translation.

Blockchain (Блокчейн) is algorithms and protocols for decentralized storage and processing of transactions structured as a sequence of linked blocks without the possibility of their subsequent change.

Boltzmann machine (Also stochastic Hopfield network with hidden units) (Машина Больцмана) – A type of stochastic recurrent neural network and Markov random field. Boltzmann machines can be seen as the stochastic, generative counterpart of Hopfield networks [[87 - Boltzmann machine [Электронный ресурс] // dic.academic.ru URL: https://dic.academic.ru/dic.nsf/ruwiki/1828062 (дата обращения: 04.02.2022)]].

Boolean neural network (Булевая нейронная сеть) – is an artificial neural network approach which only consists of Boolean neurons (and, or, not). Such an approach reduces the use of memory space and computation time. It can be implemented to the programmable circuits such as FPGA (Field-Programmable Gate Array or Integrated circuit).

Boolean satisfiability problem (Also propositional satisfiability problem; abbreviated SATISFIABILITY or SAT) (Проблема булевой выполнимости) – is the problem of determining if there exists an interpretation that satisfies a given Boolean formula. In other words, it asks whether the variables of a given Boolean formula can be consistently replaced by the values TRUE or FALSE in such a way that the formula evaluates to TRUE. If this is the case, the formula is called satisfiable. On the other hand, if no such assignment exists, the function expressed by the formula is FALSE for all possible variable assignments and the formula is unsatisfiable. [[88 - Boolean satisfiability problem A. de Carvalho M.C. Fairhurst D.L. Bisset, An integrated Boolean neural network for pattern classification. Pattern Recognition Letters Volume 15, Issue 8, August 1994, Pages 807—813 (дата обращения: 10.02.2022)]].

Boosting (Бустинг) – A Machine Learning ensemble meta-algorithm for primarily reducing bias and variance in supervised learning, and a family of Machine Learning algorithms that convert weak learners to strong ones.

Bounding Box (Ограничивающая рамка) – Commonly used in image or video tagging, this is an imaginary box drawn on visual information. The contents of the box are labeled to help a model recognize it as a distinct type of object.

Brain technology (Also self-learning know-how system) (Мозговая технология) – A technology that employs the latest findings in neuroscience. The term was first introduced by the Artificial Intelligence Laboratory in Zurich, Switzerland, in the context of the ROBOY project. Brain Technology can be employed in robots, know-how management systems and any other application with self-learning capabilities. In particular, Brain Technology applications allow the visualization of the underlying learning architecture often coined as “know-how maps”.

Brain – computer interface (BCI, Интерфейс мозг-компьютер), sometimes called a brain – machine interface (BMI), is a direct communication pathway between the brain’s electrical activity and an external device, most commonly a computer or robotic limb. Research on brain – computer interface began in the 1970s by Jacques Vidal at the University of California, Los Angeles (UCLA) under a grant from the National Science Foundation, followed by a contract from DARPA. The Vidal’s 1973 paper marks the first appearance of the expression brain – computer interface in scientific literature [[89 - Brain – computer interface [Электронный ресурс] //en.wikipedia.org URL: https://en.wikipedia.org/wiki/Brain%E2%80%93computer_interface (https://en.wikipedia.org/wiki/Brain%E2%80%93computer_interface) (дата обращения: 07.07.2022)]].

Brain-inspired computing (Мозгоподобные вычисления) – calculations on brain-like structures, brain-like calculations using the principles of the brain (see also neurocomputing, neuromorphic engineering).

Branching factor (коэффициент ветвления дерева) – In computing, tree data structures, and game theory, the number of children at each node, the outdegree. If this value is not uniform, an average branching factor can be calculated.

Broadband (Широкополосный доступ) refers to various high-capacity transmission technologies that transmit data, voice, and video across long distances and at high speeds. Common mediums of transmission include coaxial cables, fiber optic cables, and radio waves. [[90 - Broadband [Электронный ресурс] www.investopedia.com URL: https://www.investopedia.com/terms/b/broadband.asp (https://www.investopedia.com/terms/b/broadband.asp) (дата обращения: 07.07.2022)]]

Brute-force search (Also exhaustive search or generate and test) (Полный перебор) – A very general problem-solving technique and algorithmic paradigm that consists of systematically enumerating all possible candidates for the solution and checking whether each candidate satisfies the problem’s statement.

Bucketing (Разделение на сегменты) – Converting a (usually continuous) feature into multiple binary features called buckets or bins, typically based on value range.

Byte (Байт) Eight bits. A byte is simply a chunk of 8 ones and zeros. For example: 01000001 is a byte. A computer often works with groups of bits rather than individual bits and the smallest group of bits that a computer usually works with is a byte. A byte is equal to one column in a file written in character format. [[91 - Byte [Электронный ресурс] www.umich.edu URL: https://www.icpsr.umich.edu/web/ICPSR/cms/2042#B (https://www.icpsr.umich.edu/web/ICPSR/cms/2042#B) (дата обращения: 07.07.2022)]]

“C”

Caffe – is short for Convolutional Archi- tecture for Fast Feature Embedding which is an open-source deep learning framework de- veloped in Berkeley AI Research. It supports many different deep learning architectures and GPU-based acceleration computation kernels.

Calibration layer (Калибровочный слой) – A post-prediction adjustment, typically to account for prediction bias. The adjusted predictions and probabilities should match the distribution of an observed set of labels.

Candidate generation (Генерация кандидатов) — The initial set of recommendations chosen by a recommendation system. [[92 - Candidate generation [Электронный ресурс] // developers.google.com URL: https://developers.google.com/machine-learning/recommendation/overview/candidate-generation (дата обращения: 10.01.2022)]].

Candidate sampling (Выборка кандидатов) — A training-time optimization in which a probability is calculated for all the positive labels, using, for example, softmax, but only for a random sample of negative labels. For example, if we have an example labeled beagle and dog candidate sampling computes the predicted probabilities and corresponding loss terms for the beagle and dog class outputs in addition to a random subset of the remaining classes (cat, lollipop, fence). The idea is that the negative classes can learn from less frequent negative reinforcement as long as positive classes always get proper positive reinforcement, and this is indeed observed empirically. The motivation for candidate sampling is a computational efficiency win from not computing predictions for all negatives.

Canonical Formats (Канонические форматы) In information technology, canonicalization is the process of making something [conform] with some specification… and is in an approved format. Canonicalization may sometimes mean generating canonical data from noncanonical data. Canonical formats are widely supported and considered to be optimal for long-term preservation. [[93 - Canonical Formats [Электронный ресурс] www.umich.edu URL: https://www.icpsr.umich.edu/web/ICPSR/cms/2042#C (https://www.icpsr.umich.edu/web/ICPSR/cms/2042#C) (дата обращения: 07.07.2022)]]

Capsule neural network (CapsNet) (Капсульная нейронная сеть) – A machine learning system that is a type of artificial neural network (ANN) that can be used to better model hierarchical relationships. [[94 - Capsule neural network [Электронный ресурс] // ru.what-this.com URL: https://ru.what-this.com/7202531/1/kapsulnaya-neyronnaya-set.html (https://ru.what-this.com/7202531/1/kapsulnaya-neyronnaya-set.html) (дата обращения: 07.02.2022)]] The approach is an attempt to more closely mimic biological neural organization [[95 - Capsule neural network [Электронный ресурс] // neurohive.io URL: https://neurohive.io/ru/osnovy-data-science/kapsulnaja-nejronnaja-set-capsnet/ (https://neurohive.io/ru/osnovy-data-science/kapsulnaja-nejronnaja-set-capsnet/) (дата обращения: 08.02.2022)]]

Case-Based Reasoning (CBR) (Рассуждения по прецедентам) – is a way to solve a new problem by using solutions to similar problems. It has been formalized to a process consisting of case retrieve, solution reuse, solution revise, and case retention [[96 - Case-Based Reasoning [Электронный ресурс] www.telusinternational.com URL: https://www.telusinternational.com/articles/50-beginner-ai-terms-you-should-know (дата обращения 15.01.2022)]].

Categorical data (Категориальные данные) — Features having a discrete set of possible values. For example, consider a categorical feature named house style, which has a discrete set of three possible values: Tudor, ranch, colonial. By representing house style as categorical data, the model can learn the separate impacts of Tudor, ranch, and colonial on house price. Sometimes, values in the discrete set are mutually exclusive, and only one value can be applied to a given example. For example, a car maker categorical feature would probably permit only a single value (Toyota) per example. Other times, more than one value may be applicable. A single car could be painted more than one different color, so a car color categorical feature would likely permit a single example to have multiple values (for example, red and white). Categorical features are sometimes called discrete features. Contrast with numerical data [[97 - Categorical data [Электронный ресурс] // machinelearningmastery.ru URL: https://www.machinelearningmastery.ru/understanding-feature-engineering-part-2-categorical-data-f54324193e63/ (дата обращения: 03.03.2022)]].

Center for Technological Competence (Центр технологических компетенций) is an organization that owns the results, tools for conducting fundamental research and platform solutions available to market participants to create applied solutions (products) on their basis. The Technology Competence Center can be a separate organization or be part of an application technology holding company.

Central Processing Units (CPU) (Центральный процессор) is a von Neumann cyclic processor designed to execute complex computer programs.

Centralized control (Централизованное управление) is a process in which control signals are generated in a single control center and transmitted from it to numerous control objects.

Centroid (Центроид) – The center of a cluster as determined by a k-means or k-median algorithm. For instance, if k is 3, then the k-means or k-median algorithm finds 3 centroids.

Centroid-based clustering (Кластеризация на основе центроида) – A category of clustering algorithms that organizes data into nonhierarchical clusters. k-means is the most widely used centroid-based clustering algorithm. Contrast with hierarchical clustering algorithms.

Character format (Формат символов)

Any file format in which information is encoded as characters using only a standard character-encoding scheme. A file written in “character format” contains only those bytes that are prescribed in the encoding scheme as corresponding to the characters in the scheme (e.g., alphabetic and numeric characters, punctuation marks, and spaces). [[98 - Character format [Электронный ресурс] www.umich.edu URL: https://www.icpsr.umich.edu/web/ICPSR/cms/2042#C (https://www.icpsr.umich.edu/web/ICPSR/cms/2042#C) (дата обращения: 07.07.2022)]]

Chatbot (Чат-бот) is a software application designed to simulate human conversation with users via text or speech. Also referred to as virtual agents, interactive agents, digital assistants, or conversational AI, chatbots are often integrated into applications, websites, or messaging platforms to provide support to users without the use of live human agents. Chatbots originally started out by offering users simple menus of choices, and then evolved to react to particular keywords. “But humans are very inventive in their use of language,” says Forrester’s McKeon-White. Someone looking for a password reset might say they’ve forgotten their access code, or are having problems getting into their account. “There are a lot of different ways to say the same thing,” he says. This is where AI comes in. Natural language processing is a subset of machine learning that enables a system to understand the meaning of written or even spoken language, even where there is a lot of variation in the phrasing. To succeed, a chatbot that relies on AI or machine learning needs first to be trained using a data set. In general, the bigger the training data set, and the narrower the domain, the more accurate and helpful a chatbot will be [[99 - Сhatbot [Электронный ресурс] www.cio.com URL: https://www.cio.com/article/189347/what-is-a-chatbot-simulating-human-conversation-for-service.html (https://www.cio.com/article/189347/what-is-a-chatbot-simulating-human-conversation-for-service.html) (дата обращения: 07.07.2022)]].

Checkpoint (Контрольная точка) — Data that captures the state of the variables of a model at a particular time. Checkpoints enable exporting model weights, as well as performing training across multiple sessions. Checkpoints also enable training to continue past errors (for example, job preemption). Note that the graph itself is not included in a checkpoint.

Chip (Чип) – an electronic microcircuit of arbitrary complexity, made on a semiconductor substrate and placed in a non-separable case or without it, if included in the micro assembly.

Class (Класс) — One of a set of enumerated target values for a label. For example, in a binary classification model that detects spam, the two classes are spam and not spam. In a multi-class classification model that identifies dog breeds, the classes would be poodle, beagle, pug, and so on.

Classification (Классификация). Classification problems use an algorithm to accurately assign test data into specific categories, such as separating apples from oranges. Or, in the real world, supervised learning algorithms can be used to classify spam in a separate folder from your inbox. Linear classifiers, support vector machines, decision trees and random forest are all common types of classification algorithms.