Commonly used terms in Data and Analytics

Posted adelaide

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Commonly used terms in Data and Analytics相关的知识,希望对你有一定的参考价值。

General terms

Analytics as a Service (AaaS) The provision of analytics through Web-delivered technologies. These solutions offer businesses an alternative to developing internal hardware setups to perform business analytics.

Artificial Intelligence (AI) The theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception and speech recognition.

Big data Large data sets which cannot be analysed using standard analytical techniques. We normally evaluate big data across four techniques: volume, variety, velocity and veracity.

Business intelligence (BI) The set of techniques and tools for the transformation of raw data into meaningful and useful information for business analysis purposes.

Cloud computing A model for delivering information technology services in which resources are retrieved from the internet through Web-based tools and applications, rather than a direct connection to a server.

Data Analytics Data analytics is the collecting, organising and examining of large volumes of data with the aim of discovering useful insights, suggesting conclusions, and supporting decision-making.

Data as a Service (DaaS) Data that can be provided on demand to the user, regardless of geographic or organisational separation of provider and consumer.

Database A collection of information that is organised so that it can be easily accessed, managed, and updated.

Internet of Things The concept of connecting any device, or component of a device, to the internet (and/or to each other).

Visualisation A visual abstraction of data designed for the purpose of deriving meaning or communicating information more effectively.

Mergers & Acquisition (M&A) A general term that refers to consolidation of companies or assets.

Data types

Byte A unit of digital information that most commonly consists of eight bits.

  • 1 gigabyte = 1024 megabytes
  • 1 terabyte = 1024 gigabytes
  • 1 petabyte = 1024 terabytes
  • 1 exabyte = 1024 petabytes
  • 1 zettabyte = 1024 exabytes
  • 1 yottabyte = 1024 zettabytes

Open-source data Data that is freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.

Proprietary data Data that is owned by an individual or organisation, which is deemed important enough so that it gives competitive advantage to that individual or organisation. This data can be protected under copyright laws or patents.

Semi-structured data Data that is not structured by a formal data model, but provides other means of describing the data and hierarchies.

Structured data Refers to data that is identifiable as it is organised in structure-like rows and columns. The data resides in fixed fields within a record or file or the data is tagged correctly and can be accurately identified.

Unstructured data Refers to information that does not have a predefined data model or is not organised in a predefined manner. Examples include emails, SMS, video, audio, PDFs and social media.

Data management concepts

Data governance The practice of organising and implementing policies, procedures and standards for the effective use of an organisation’s data.

Data integration Data integration involves combining data residing in different sources and providing users with a unified view of this data.

Data quality Represents the reliability and effectiveness of data to serve its purpose in a given context.

Data warehouse A central repository of integrated data, from one or more disparate sources, which stores current and historical data.

Extract, transform load (ETL) A process used in data warehousing to prepare data for use in reporting or analytics.

In-memory Data that is loaded into memory (Random Access Memory (RAM) or flash memory) instead of hard discs so IT resource spends less development time on data modelling, query analysis, cube building and table design.

Online analytical processing (OLAP) OLAP tools enable users to analyse multidimensional data interactively from multiple perspectives. OLAP consists of three basic analytical operations: consolidation, drill down,and slicing and dicing.

Online transaction processing (OLTP) Refers to a class of information systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing.

Structured query language (SQL) A programming language for managing data held in a relational database management system.

Data management technology

Hadoop An open-source framework which is built to enable the process and storage of big data across a distributed file system. Essentially, it accomplishes two tasks: massive data storage and faster processing.

Microsoft SQL Server Microsoft SQL Server is a relational database management system whose primary function is to store and retrieve data as requested by other software applications.

MongoDB Built on an architecture of collections and documents, instead of using tables and rows as in relational databases. Documents comprise sets of key value pairs (KVPs) and are the basic unit of data in MongoDB.

Neo4 A type of database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.

SAP HANA SAP HANA is an in-memory, column oriented, relational database management system, developed and marketed by SAP.

Data visualization tools

Microsoft PowerBI A collection of online services and features that enables users to find and visualise data, share discoveries and collaborate in intuitive ways.

QlikView A tool that supports the creation of visualisations that effectively organise, communicate and share analysis with clients. It offers more advanced functionality when compared with Tableau.

Tableau A visualisation tool that supports the creation of dashboards and interactive visualisations that we use to effectively organise, communicate and share analysis with clients.

TIBCO Spotfire Analytics’ software designed for data exploration. It enables users to discover and depict critical insights in data.

Analytical approaches

A/B testing An experiment whereby two versions (A and B) are compared. They are identical except for one variation that might affect a user’s behaviour. Version A might be the currently used version (control), while Version B is modified in some respect (treatment).

Data discovery A business intelligence architecture which allows users to explore data for hidden patterns and trends. It focuses on dynamic, easy-to-use reports, whereas traditional business intelligence reports are static.

Descriptive analytics Summarises what happened in a given situation or scenario. Examples include number of posts, mentions, followers, page views, comments and likes.

Optimisation Finding an alternative with the most cost effective or highest achievable performance under the given constraints, by maximising desired factors and minimising undesired ones.

Predictive analytics Uses statistical functions on one or more data sets to predict trends or future events.

Prescriptive analytics Recommends one or more courses of action and shows the likely outcome of each decision.

Analytical techniques

Cluster analysis The task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar, in some sense or another, to each other than to those in other groups (clusters).

Comparative analysis A step-by-step procedure of comparisons and calculations to detect patterns within very large data sets.

Decision tree analysis A decision support tool that uses a tree-like graph of decisions and their possible consequences including chance event outcomes, resource costs and utility.

Factor analysis Used to analyse large numbers of dependent variables to detect certain aspects of the independent variables (factors) affecting those dependent variables.

Machine learning A type of artificial intelligence which provides computers with the ability to learn without being explicitly programmed.

Multivariate analysis The observation and analysis of more than one statistical outcome variable at a time.

Regression analysis A statistical process for estimating relationships between a dependent variable and one or more independent variables.

Segmentation analysis Divides a broad category into subsets that have, or are perceived to have, common features, needs, interests or priorities.

Sentiment analysis The process of identifying and categorising opinions expressed in a piece of text to determine whether the writer’s attitude towards a topic or issue is positive, negative or neutral.

Simulation The imitation of the operation of a real world process or system over time. It requires a model that represents the key characteristics or behaviours of the selected physical or abstract system or process.

Time Series analysis Comprises methods for analysing time series data to extract meaningful statistics and other characteristics of the data.

Analytics technology

MATLAB An abbreviation of the words matrix and laboratory. It is a computing environment which allows matrix manipulations, plotting of functions and data and implementation of algorithms.

Python An open-source general purpose programming language that can be used for everything from building web applications and enterprise programs to performing analysis on large amounts of data.

R. A software environment for statistical computing and graphics. It provides a wide variety of statistical and graphical techniques, and is highly extensible.

River Logic A modelling and analytics platform that leverages diagnostic, predictive and prescriptive analytics to conduct what-if and optimisation analysis.

SAS A leader in advanced analytics and business intelligence software. It offers a range of data management and analytics solutions.

Simul8 SIMUL8 is a tool that allows users to create a computer simulation, which takes into account existing constraints, capacities and other factors affecting the total performance of production.

SPSS Statistical Package for Social Sciences (SPSS) is a software package used for statistical analysis.

STATA An abbreviation of the words statistics and data. It is a statistical\software package and its capabilities include data management, statistical analysis and regression analysis.

以上是关于Commonly used terms in Data and Analytics的主要内容,如果未能解决你的问题,请参考以下文章

Note 3: Commonly used question sentences

EXCEL Skills Commonly Used

轻松看懂机器学习十大常用算法 (Machine Learning Top 10 Commonly Used Algorithms)

Unsupervised Learning and Text Mining of Emotion Terms Using R

Terms of Use For Dog App

Oracle commonly uesd tables/views and processes