dc.description.abstract | The amount and variety of data that is available is growing rapidly and at a quicker pace. There is a wider range of data available in many formats, including audio, video, computer logs, text, satellite, purchase transactions, sensors and social networking sites. This has created large, often unstructured data sets that are available, potentially in real time. At the same time, new data science techniques for maximizing the value of these newer types of data and other data sources are constantly being developed. The tools and techniques for analysing these data sets, which are often very large, are also being developed. Data can no longer be managed using spreadsheets, and open source programming languages such as R and Python are becoming widely used by national statistics offices. Distributed processing is required for large datasets, and sophisticated techniques, such as machine learning and natural language processing are required to extract value from these new data sets. National Statistics Institutes (NSIs) in countries around the world have been working to understand the opportunities and challenges that big data and data science offer to enhance and supplement more traditional statistical processes and outputs. This paper outlines: What data science is; Examples of using data science from the UK, Canada and Rwanda; Common challenges and how NSIs can get started. | en |