Allgemein
Archives in the Digital Age
Archiving has become an increasingly complex process. The challenge is no longer how to store the data but how to store it intelligently, in order to exploit it over time, while maintaining its integrity and authenticity.Digital technologies bring about major transformations, not only in terms of the types of documents that are transferred to and stored in archives, in the behaviors and practices of the humanities and social sciences (digital humanities), but also in terms of the volume of data and the technological capacity for managing and preserving archives (Big Data). Archives in The Digital Age focuses on the impact of these various digital transformations on archives, and examines how the right to memory and the information of future generations is confronted with the right to be forgotten; a digital prerogative that guarantees individuals their private lives and freedoms. ABDERRAZAK MKADMI holds a PhD in Information and Communication Sciences from the University of Paris 8, France, and is a Research Professor at the Higher Institute of Documentation (Manouba University, Tunisia). Preface ixIntroduction xiCHAPTER 1. DIGITAL ARCHIVES: ELEMENTS OF DEFINITION 11.1. Key concepts of digital archives 11.1.1. Archives 11.1.2. Archive management 21.1.3. Archival management tools 41.1.4. Digital archives 71.2. Electronic Records Management 71.2.1. ERM: elements of definition 71.2.2. ERM: implementation steps 101.3. Records management 181.3.1. Structure of standard 15489 191.3.2. Content of the standard 201.3.3. Design and implementation of an RM project according to the standard 221.3.4. MoReq: the added value of RM 251.4. EDRMS: merging ERM and RM 261.5. ECM: the overall data management strategy 271.6. Conclusion 30CHAPTER 2. DIGITAL ARCHIVING: METHODS AND STRATEGIES 312.1. Introduction 312.2. Digital archiving: elements of definition 312.3. Digital archiving: the essential standards 342.3.1. NF Z 42-013/ISO 14641 standard 362.3.2. NF 461: electronic archiving system 382.3.3. OAIS (ISO 14721): Open Archival Information System 392.3.4. ISO 19905 (PDF/A) 422.3.5. ISO 30300, ISO 30301 and ISO 30302 series of standards 442.3.6. ISO 23081 442.4. Methodology for setting up a digital archiving process 462.4.1. Qualifying and classifying information 462.4.2. Classification scheme 472.4.3. Retention schedule or retention standard 512.4.4. Metadata 522.4.5. Archiving processes and procedures 552.5. Archiving of audiovisual documents 582.5.1. Definition of audiovisual archives 582.5.2. Treatment of audiovisual archives 602.5.3. Migration of audiovisual documents 622.5.4. Digital archiving of audiovisual documents 632.6. Email archiving 652.6.1. Email archiving and legislation 662.6.2. Why archive emails? 672.7. Conclusion 69CHAPTER 3. ARCHIVES IN THE AGE OF DIGITAL HUMANITIES 713.1. Introduction 713.2. History of the digital humanities 723.2.1. “Literary and Linguistic Computing”: 1940–1980 723.2.2. “Humanities computing”: 1980–1994 743.2.3. “Digital humanities”: since 1994 773.3. Definitions of the digital humanities 783.4. Archives in the age of the digital humanities 803.4.1. Digital archive platforms 813.4.2. Software managing digital archives 843.4.3. Digital humanities at the heart of long-term preservation 893.4.4. Digital humanities and the liberation of the humanities: access and accessibility 1073.5. Conclusion 112CHAPTER 4. DIGITAL ARCHIVING AND BIG DATA 1134.1. Introduction 1134.2. Definition of Big Data 1154.3. Big Data issues 1194.4. Big Data: challenges and areas of application 1204.5. Data archiving in the age of Big Data 1224.5.1. Management and archiving of Big Data 1224.5.2. Big Data technologies and tools 1254.5.3. Blockchain, the future of digital archiving of Big Data 1374.6. Conclusion 147CHAPTER 5. PRESERVATION OF ARCHIVES VERSUS THE RIGHT TO BE FORGOTTEN 1495.1. Introduction 1495.2. Forgetting 1505.3. The right to be forgotten 1505.3.1. Limits to the right to be forgotten 1505.3.2. European Directive on the protection of personal data 1515.3.3. General Data Protection Regulation 1535.3.4. The right to dereferencing: common criteria 1565.4. Effectiveness of the right to be forgotten 1565.4.1. Technical challenge of the effectiveness of the right to be forgotten 1575.4.2. Legal challenge of the effectiveness of the right to be forgotten 1605.5. The right to digital oblivion: a controversial subject 1635.6. Public archives versus the right to be forgotten 1655.6.1. Archives: exemptions from the right to be forgotten 1675.6.2. Online publication of archives and finding aids containing personal data 1685.6.3. Private digital archives and the right to be forgotten 1715.6.4. Web archiving and the right to be forgotten 1725.7. Google and the right to be forgotten 1735.8. Conclusion 178Conclusion 181List of Acronyms 185References 193Index 207
Data Center Handbook
DATA CENTER HANDBOOKWritten by 59 experts and reviewed by a seasoned technical advisory board, the Data Center Handbook is a thoroughly revised, one-stop resource that clearly explains the fundamentals, advanced technologies, and best practices used in planning, designing, building and operating a mission-critical, energy-efficient, sustainable data center. This handbook, in its second edition, covers anatomy, ecosystem and taxonomy of data centers that enable the Internet of Things and artificial intelligent ecosystems and encompass the following:SECTION 1: DATA CENTER OVERVIEW AND STRATEGIC PLANNING* Megatrends, the IoT, artificial intelligence, 5G network, cloud and edge computing* Strategic planning forces, location plan, and capacity planning * Green design & construction guidelines and best practices* Energy demand, conservation, and sustainability strategies* Data center financial analysis & risk managementSECTION 2: DATA CENTER TECHNOLOGIES* Software-defined environment* Computing, storage, network resource management* Wireless sensor networks in data centers* ASHRAE data center guidelines* Data center telecommunication cabling, BICSI and TIA 942* Rack-level and server-level cooling* Corrosion and contamination control* Energy saving technologies and server design* Microgrid and data centersSECTION 3: DATA CENTER DESIGN & CONSTRUCTION* Data center site selection* Architecture design: rack floor plan and facility layout* Mechanical design and cooling technologies* Electrical design and UPS* Fire protection* Structural design* Reliability engineering* Computational fluid dynamics* Project managementSECTION 4: DATA CENTER OPERATIONS TECHNOLOGIES* Benchmarking metrics and assessment* Data center infrastructure management* Data center air management* Disaster recovery and business continuity managementThe Data Center Handbook: Plan, Design, Build, and Operations of a Smart Data Center belongs on the bookshelves of any professionals who work in, with, or around a data center. HWAIYU GENG P.E. (Palo Alto, California, USA) is the founder and managing director at AmicaResearch.org promoting green planning, designing, building and operating of high-tech projects. He has over four decades of planning, engineering and management experience having worked with Westinghouse, Applied Materials, Hewlett Packard, Intel and Juniper Networks. He is a frequent speaker at international conferences. Mr. Geng, a patent holder, is also the editor/author of the IoT and Data Analytics Handbook, Manufacturing Engineering Handbook (2nd edition), and Semiconductor Manufacturing Handbook (2nd edition).ContibutorsChapter 1: Sustainable Data Center Strategic Planning, Design, Construction, and Operations with Emerging TechnologiesChapter 2: Global Data Center Energy Demand and Strategies to Conserve EnergyChapter 3 Energy and Sustainability in Data CentersChapter 4: Data Center Architecture and InfrastructureChapter 5 Cloud and Edge ComputingChapter 6: Financial Analysis, ROI and TCOChapter 7: Managing Data Center RiskChapter 8: Software Defined EnvironmentChapter 9: Computing, Storage, Networking Resource Management in Data CentersChapter 10: Wireless Sensor Networks to Improve Energy Efficiency in Data CentersChapter 11: ASHRAE Standards & Practices for Data CentersChapter 12: Data Center Telecommunications Cabling and TIA StandardsChapter 13: Air Side Economizer TechnologiesChapter 14: Rack-Level Cooling and Server-Level CoolingChapter 15: Corrosion (Contamination) Control for Mission Critical FacilitiesChapter 16: Rack PDU for Green Data CentersChapter 17: Fiber Cabling Fundamentals, Installation and MaintenanceChapter 18: Design of Energy Efficiency IT EquipmentChapter 19: Energy Saving Technologies of Servers in Data CentersChapter 20: Cyber-Security and Data CentersChapter 21: Consideration Of Microgrids For Data CentersChapter 22: Data Center Site Search and SelectionChapter 23: Architecture: Data Center Rack Floor Plan and Facility Layout DesignChapter 24: Mechanical Design in Data CentersChapter 25: Data Center Electrical DesignChapter 26: Electrical: Uninterruptible Power Supply SystemChapter 27: Structural Design in Data Centers: Natural Disaster ResilienceChapter 28: Fire Protection and Life Safety Design in Data CentersChapter 29: Reliability Engineering For Data Centers InfrastructuresChapter 30: Computational Fluid Dynamics for Data CentersChapter 31: Data Center Project ManagementChapter 32: Data Center Benchmark MetricsChapter 33: Data Center Infrastructure ManagementChapter 34: Data Center Air ManagementChapter 35: Energy Efficiency Assessment of Data Centers using Measurement and Management TechnologyChapter 36: Drive Data Center Management and Build Better AI with IT Devices as SensorsChapter 37: Preparing Data Centers for Natural Disasters and PandemicsIndex
Understanding Infrastructure Edge Computing
UNDERSTANDING INFRASTRUCTURE EDGE COMPUTINGA COMPREHENSIVE REVIEW OF THE KEY EMERGING TECHNOLOGIES THAT WILL DIRECTLY IMPACT AREAS OF COMPUTER TECHNOLOGY OVER THE NEXT FIVE YEARSInfrastructure edge computing is the model of data center and network infrastructure deployment which distributes a large number of physically small data centers around an area to deliver better performance and to enable new economical applications. It is vital for those operating at business or technical levels to be positioned to capitalize on the changes that will occur as a result of infrastructure edge computing.This book provides a thorough understanding of the growth of internet infrastructure from its inception to the emergence of infrastructure edge computing. Author Alex Marcham, an acknowledged leader in the field who coined the term ‘infrastructure edge computing,’ presents an accessible, accurate, and expansive view of the next generation of internet infrastructure. The book features illustrative examples of 5G mobile cellular networks, city-scale AI systems, self-driving cars, drones, industrial robots, and more—technologies that increase efficiency, save time and money, and improve safety. Covering state-of-the-art topics, this timely and authoritative book:* Presents a clear and accurate survey of the key emerging technologies that will impact data centers, 5G networks, artificial intelligence and cyber-physical systems, and other areas of computer technology* Explores how and why Internet infrastructure has evolved to where it stands today and where it needs to be in the near future* Covers a wide range of topics including distributed application workload operation, infrastructure and application security, and related technologies such as multi-access edge computing (MEC) and fog computing* Provides numerous use cases and examples of real-world applications which depend upon underlying edge infrastructureWritten for Information Technology practitioners, computer technology practitioners, and students, Understanding Infrastructure Edge Computing is essential reading for those looking to benefit from the coming changes in computer technology. ALEX MARCHAM has been working in infrastructure edge computing from the shaping of the market and establishment of the terminology and key concepts at numerous companies and open source projects which have been leading its development. Alex has been involved with most elements of infrastructure design and deployment as well as the architecture and development of the key use cases for this tier of Internet infrastructure.Preface xvAbout the Author xviiAcknowledgements xix1 INTRODUCTION 12 WHAT IS EDGE COMPUTING? 32.1 Overview 32.2 Defining the Terminology 32.3 Where Is the Edge? 42.3.1 A Tale of Many Edges 52.3.2 Infrastructure Edge 62.3.3 Device Edge 62.4 A Brief History 82.4.1 Third Act of the Internet 82.4.2 Network Regionalisation 102.4.3 CDNs and Early Examples 102.5 Why Edge Computing? 122.5.1 Latency 122.5.2 Data Gravity 132.5.3 Data Velocity 132.5.4 Transport Cost 142.5.5 Locality 142.6 Basic Edge Computing Operation 152.7 Summary 18References 183 INTRODUCTION TO NETWORK TECHNOLOGY 213.1 Overview 213.2 Structure of the Internet 213.2.1 1970s 223.2.2 1990s 223.2.3 2010s 233.2.4 2020s 233.2.5 Change over Time 233.3 The OSI Model 243.3.1 Layer 1 253.3.2 Layer 2 253.3.3 Layer 3 263.3.4 Layer 4 263.3.5 Layers 5, 6, and 7 273.4 Ethernet 283.5 IPv4 and IPv6 293.6 Routing and Switching 293.6.1 Routing 303.6.2 Routing Protocols 313.6.3 Routing Process 343.7 LAN, MAN, and WAN 413.8 Interconnection and Exchange 423.9 Fronthaul, Backhaul, and Midhaul 443.10 Last Mile or Access Networks 453.11 Network Transport and Transit 463.12 Serve Transit Fail (STF) Metric 483.13 Summary 51References 524 INTRODUCTION TO DATA CENTRE TECHNOLOGY 534.1 Overview 534.2 Physical Size and Design 534.3 Cooling and Power Efficiency 544.4 Airflow Design 564.5 Power Distribution 574.6 Redundancy and Resiliency 584.7 Environmental Control 614.8 Data Centre Network Design 614.9 Information Technology (IT) Equipment Capacity 654.10 Data Centre Operation 664.10.1 Notification 674.10.2 Security 674.10.3 Equipment Deployment 674.10.4 Service Offerings 684.10.5 Managed Colocation 684.11 Data Centre Deployment 694.11.1 Deployment Costing 694.11.2 Brownfield and Greenfield Sites 694.11.3 Other Factors 704.12 Summary 70References 705 INFRASTRUCTURE EDGE COMPUTING NETWORKS 715.1 Overview 715.2 Network Connectivity and Coverage Area 715.3 Network Topology 725.3.1 Full Mesh 745.3.2 Partial Mesh 745.3.3 Hub and Spoke 755.3.4 Ring 765.3.5 Tree 765.3.6 Optimal Topology 765.3.7 Inter-area Connectivity 775.4 Transmission Medium 785.4.1 Fibre 785.4.2 Copper 785.4.3 Wireless 795.5 Scaling and Tiered Network Architecture 805.6 Other Considerations 815.7 Summary 826 INFRASTRUCTURE EDGE DATA CENTRES 836.1 Overview 836.2 Physical Size and Design 836.2.1 Defining an Infrastructure Edge Data Centre 846.2.2 Size Categories 846.3 Heating and Cooling 1026.4 Airflow Design 1056.4.1 Traditional Designs 1076.4.2 Non-traditional Designs 1096.5 Power Distribution 1136.6 Redundancy and Resiliency 1146.6.1 Electrical Power Delivery and Generation 1166.6.2 Network Connectivity 1186.6.3 Cooling Systems 1206.6.4 Market Design 1226.6.5 Redundancy Certification 1246.6.6 Software Service Resiliency 1256.6.7 Physical Redundancy 1266.6.8 System Resiliency Example 1276.7 Environmental Control 1286.8 Data Centre Network Design 1316.9 Information Technology (IT) Equipment Capacity 1346.9.1 Operational Headroom 1356.10 Data Centre Operation 1356.10.1 Site Automation 1366.10.2 Single or Multi-tenant 1426.10.3 Neutral Host 1446.10.4 Network Operations Centre (NOC) 1456.11 Brownfield and Greenfield Sites 1476.12 Summary 1517 INTERCONNECTION AND EDGE EXCHANGE 1537.1 Overview 1537.2 Access or Last Mile Network Interconnection 1537.3 Backhaul and Midhaul Network Interconnection 1587.4 Internet Exchange 1607.5 Edge Exchange 1647.6 Interconnection Network Technology 1677.6.1 5G Networks 1687.6.2 4G Networks 1697.6.3 Cable Networks 1707.6.4 Fibre Networks 1727.6.5 Other Networks 1737.6.6 Meet Me Room (MMR) 1737.6.7 Cross Connection 1747.6.8 Virtual Cross Connection 1767.6.9 Interconnection as a Resource 1797.7 Peering 1807.8 Cloud On-ramps 1817.9 Beneficial Impact 1837.9.1 Latency 1837.9.2 Data Transport Cost 1847.9.3 Platform Benefit 1857.10 Alternatives to Interconnection 1867.11 Business Arrangements 1877.12 Summary 1888 INFRASTRUCTURE EDGE COMPUTING DEPLOYMENT 1898.1 Overview 1898.2 Physical Facilities 1898.3 Site Locations 1918.3.1 kW per kM2 1928.3.2 Customer Facility Selection 1938.3.3 Site Characteristics 1948.4 Coverage Areas 1958.5 Points of Interest 1978.6 Codes and Regulations 1988.7 Summary 2009 COMPUTING SYSTEMS AT THE INFRASTRUCTURE EDGE 2039.1 Overview 2039.2 What Is Suitable? 2039.3 Equipment Hardening 2049.4 Rack Densification 2059.4.1 Heterogenous Servers 2079.4.2 Processor Densification 2089.4.3 Supporting Equipment 2109.5 Parallel Accelerators 2119.5.1 Field Programmable Gate Arrays (FPGAs) 2139.5.2 Tensor Processing Units (TPUs) 2139.5.3 Graphics Processing Units (GPUs) 2149.5.4 Smart Network Interface Cards (NICs) 2159.5.5 Cryptographic Accelerators 2169.5.6 Other Accelerators 2179.5.7 FPGA, TPU, or GPU? 2179.6 Ideal Infrastructure 2189.6.1 Network Compute Utilisation 2189.7 Adapting Legacy Infrastructure 2219.8 Summary 221References 22210 MULTI-TIER DEVICE, DATA CENTRE, AND NETWORK RESOURCES 22310.1 Overview 22310.2 Multi-tier Resources 22310.3 Multi-tier Applications 22610.4 Core to Edge Applications 22810.5 Edge to Core Applications 23010.6 Infrastructure Edge and Device Edge Interoperation 23110.7 Summary 23411 DISTRIBUTED APPLICATION WORKLOAD OPERATION 23511.1 Overview 23511.2 Microservices 23511.3 Redundancy and Resiliency 23611.4 Multi-site Operation 23711.5 Workload Orchestration 23811.5.1 Processing Requirements 24011.5.2 Data Storage Requirements 24011.5.3 Network Performance Requirements 24111.5.4 Application Workload Cost Profile 24111.5.5 Redundancy and Resiliency Requirements 24211.5.6 Resource Marketplaces 24311.5.7 Workload Requirement Declaration 24311.6 Infrastructure Visibility 24411.7 Summary 24512 INFRASTRUCTURE AND APPLICATION SECURITY 24712.1 Overview 24712.2 Threat Modelling 24712.3 Physical Security 24912.4 Logical Security 25012.5 Common Security Issues 25112.5.1 Staff 25112.5.2 Visitors 25212.5.3 Network Attacks 25212.6 Application Security 25312.7 Security Policy 25412.8 Summary 25513 RELATED TECHNOLOGIES 25713.1 Overview 25713.2 Multi-access Edge Computing (MEC) 25713.3 Internet of Things (IoT) and Industrial Internet of Things (IIoT) 25813.4 Fog and Mist Computing 25913.5 Summary 260Reference 26014 USE CASE EXAMPLE: 5G 26114.1 Overview 26114.2 What Is 5G? 26114.2.1 5G New Radio (NR) 26214.2.2 5G Core Network (CN) 26314.3 5G at the Infrastructure Edge 26414.3.1 Benefits 26414.3.2 Architecture 26414.3.3 Considerations 26514.4 Summary 26615 USE CASE EXAMPLE: DISTRIBUTED AI 26715.1 Overview 26715.2 What Is AI? 26815.2.1 Machine Learning (ML) 26815.2.2 Deep Learning (DL) 26915.3 AI at the Infrastructure Edge 27015.3.1 Benefits 27015.3.2 Architecture 27115.3.3 Considerations 27215.4 Summary 27316 USE CASE EXAMPLE: CYBER-PHYSICAL SYSTEMS 27516.1 Overview 27516.2 What Are Cyber-physical Systems? 27516.2.1 Autonomous Vehicles 27616.2.2 Drones 27816.2.3 Robotics 28016.2.4 Other Use Cases 28016.3 Cyber-physical Systems at the Infrastructure Edge 28016.3.1 Benefits 28016.3.2 Architecture 28116.3.3 Considerations 28216.4 Summary 282Reference 28317 USE CASE EXAMPLE: PUBLIC OR PRIVATE CLOUD 28517.1 Overview 28517.2 What Is Cloud Computing? 28617.2.1 Public Clouds 28617.2.2 Private Clouds 28717.2.3 Hybrid Clouds 28717.2.4 Edge Cloud 28817.3 Cloud Computing at the Infrastructure Edge 28817.3.1 Benefits 28817.3.2 Architecture 28917.3.3 Considerations 29017.4 Summary 29018 OTHER INFRASTRUCTURE EDGE COMPUTING USE CASES 29118.1 Overview 29118.2 Near Premises Services 29118.3 Video Surveillance 29318.4 SD-WAN 29418.5 Security Services 29518.6 Video Conferencing 29618.7 Content Delivery 29718.8 Other Use Cases 29818.9 Summary 29919 END TO END: AN INFRASTRUCTURE EDGE PROJECT EXAMPLE 30119.1 Overview 30119.2 Defining Requirements 30119.2.1 Deciding on a Use Case 30219.2.2 Determining Deployment Locations 30419.2.3 Identifying Required Equipment 30619.2.4 Choosing an Infrastructure Edge Computing Network Operator 30719.2.5 Regional or National Data Centres 30719.3 Success Criteria 30719.4 Comparing Costs 30819.5 Alternative Options 30919.6 Initial Deployment 31019.7 Ongoing Operation 31119.7.1 SLA Breaches 31219.8 Project Conclusion 31219.9 Summary 31420 THE FUTURE OF INFRASTRUCTURE EDGE COMPUTING 31520.1 Overview 31520.2 Today and Tomorrow 31520.3 The Next Five Years 31620.4 The Next 10 Years 31620.5 Summary 31621 CONCLUSION 317Appendix A: Acronyms and Abbreviations 319Index 323
Inside the World of Computing
Computers and the Internet are an undeniable and inextricable part of our daily lives. This book is for those who wish to better understand how this came to be. It explores the technological bases of computers, networks, software and data management, leading to the development of four �pillars� on which the essential applications that have a strong impact on individuals and society are based: embedded systems, Artificial Intelligence, the Internet, image processing and vision.We will travel to the heart of major application areas: robotics, virtual reality, health, mobility, energy, the factory of the future, not forgetting the major questions that this �digitization� can raise. This book is the author�s testimony after fifty years spent in environments that are very open to new technologies. It offers perspectives on the evolution of the digital world that we live in. JEAN-LOIC DELHAYE has a PhD in Artificial Intelligence. He directed the Centre National Universitaire Sud de Calcul, France, before piloting partnerships and the valorization of research at the Centre Inria Rennes?Bretagne Altlantique, France. He has also been very active in national and European collaborations on high performance computing. Foreword xiJean-Pierre BANÂTREPreface xvAcknowledgments xxiCHAPTER 1. FROM THE CALCULATOR TO THE SUPERCOMPUTER 11.1. Introduction 11.2. Some important concepts 11.2.1. Information and data 11.2.2. Binary system 31.2.3. Coding 31.2.4. Algorithm 51.2.5. Program 71.3. Towards automation of calculations 71.3.1. Slide rule 71.3.2. The Pascaline 81.3.3. The Jacquard loom 91.3.4. Babbage’s machine 91.3.5. The first desktop calculators 101.3.6. Hollerith’s machine 111.4. The first programmable computers 121.4.1. Konrad Zuse’s machines 121.4.2. Colossus 131.4.3. ENIAC 131.5. Generations of computers 141.5.1. First generation: the transition to electronics 151.5.2. Second generation: the era of the transistor 171.5.3. Third generation: the era of integrated circuits 201.5.4. Fourth generation: the era of microprocessors 241.6. Supercomputers 281.6.1. Some fields of use 281.6.2. History of supercomputers 291.6.3. Towards exaflops 331.7. What about the future? 351.7.1. An energy and ecological challenge 351.7.2. Revolutions in sight? 36CHAPTER 2. COMPUTER NETWORKS AND THEIR APPLICATIONS 372.1. Introduction 372.2. A long history 382.3. Computer network infrastructure 422.3.1. Geographic coverage: from PAN to WAN 432.3.2. Communication media 442.3.3. Interconnection equipment and topologies 482.3.4. Two other characteristics of computer networks 522.3.5. Quality of service 542.4. Communication protocols and the Internet 552.4.1. The first protocols 552.4.2. The OSI model 562.4.3. The history of the Internet 572.4.4. The TCP/IP protocol 582.4.5. IP addressing 592.4.6. Management and use of the Internet 602.4.7. Evolving technologies 612.4.8. What future? 622.5. Applications 632.5.1. The World Wide Web 642.5.2. Cloud computing 672.5.3. The Internet of Things 682.5.4. Ubiquitous computing and spontaneous networks 722.6. Networks and security 742.6.1. Vulnerabilities 742.6.2. The protection of a network 762.6.3. Message encryption 762.6.4. Checking its security 77CHAPTER 3. SOFTWARE 793.1. Introduction 793.2. From algorithm to computer program 803.2.1. Programs and subprograms 823.2.2. Programming languages 833.3. Basic languages and operating systems 853.3.1. Basic languages 853.3.2. Operating system functions 863.3.3. A bit of history 883.3.4. Universal operating systems 913.3.5. Targeted operating systems 933.4. “High-level” programming and applications 963.4.1. Imperative languages 963.4.2. Functional languages 983.4.3. Object programming 993.4.4. Other programming languages 1003.4.5. The most used languages 1013.5. Software development 1023.5.1. Software categories 1023.5.2. Software quality 1033.5.3. Development methods 1043.5.4. Software engineering 1073.6. Software verification and validation 1073.6.1. Errors with sometimes tragic consequences 1073.6.2. Software testing 1093.6.3. Formal methods 1113.6.4. Software certification 1143.7. Legal protection and distribution of software 1153.7.1. Legal protection of software 1153.7.2. Licenses 1163.7.3. Free software and open source 1173.8. The software market 118CHAPTER 4. DATA: FROM BINARY ELEMENT TO INTELLIGENCE 1214.1. Introduction 1214.2. Data and information 1224.2.1. Digitization of data 1224.2.2. Data compression 1254.3. The structuring of data towards information 1254.3.1. Structured data 1264.3.2. Semi-structured data and the Web 1274.4. Files and their formats 1284.5. Databases 1294.5.1. The main characteristics 1294.5.2. DBMS models 1314.5.3. Database design 1334.5.4. Enterprise resource planning (ERP) systems 1334.5.5. Other types of databases 1344.5.6. Data protection in a DB 1374.6. Intelligence and Big Data 1374.7. Data ownership and Open Data 1414.7.1. Personal data 1414.7.2. Opening up public data: Open Data 142CHAPTER 5. TECHNOLOGY BUILDING BLOCKS 1455.1. Embedded systems 1455.1.1. Specific architectures 1465.1.2. Some fields of use 1475.2. Artificial intelligence (AI) 1505.2.1. A bit of history 1505.2.2. Intelligence or statistics? 1525.2.3. Important work around automatic learning 1525.2.4. A multiplication of applications 1545.2.5. The challenges of AI 1555.2.6. What about intelligence? 1565.3. The Internet 1575.3.1. Mobility 1575.3.2. Social networks 1585.3.3. The Internet of Things 1595.3.4. The Cloud 1595.3.5. Blockchain 1595.3.6. Vulnerabilities 1605.4. Image processing and vision 1605.4.1. A bit of history 1605.4.2. Image sources and their uses 1615.4.3. The digital image 1625.4.4. Image storage and compression 1635.4.5. Computing and images 1645.4.6. Some applications 1655.5. Conclusion 166CHAPTER 6. SOME AREAS OF APPLICATION 1676.1. Robots 1676.1.1. A bit of history 1686.1.2. Fields of use regarding robots today 1696.1.3. Communication in the world of robots 1736.1.4. Fear of robots 1746.1.5. Challenges for researchers 1756.2. Virtual reality and augmented reality 1756.2.1. A bit of history 1766.2.2. Hardware configurations of virtual reality 1776.2.3. Fields of use of virtual reality 1796.2.4. Augmented reality 1806.3. Health 1816.3.1. Health informatics 1826.3.2. Information technology at the service of our health 1846.4. The connected (and soon autonomous?) car 1856.4.1. Levels of autonomy 1866.4.2. Challenges associated with the autonomous car 1876.4.3. Advantages and disadvantages of the autonomous car 1886.5. The smart city 1896.5.1. Smart energy 1906.5.2. Smart buildings 1906.5.3. Smart infrastructure 1916.5.4. Smart governance 1926.5.5. Dangers 1936.6. Smart mobility 1936.7. The factory of the future 1956.7.1. Technologies 1956.7.2. Issues 1976.7.3. The place of the human 198CHAPTER 7. SOCIETAL ISSUES 1997.1. Security 1997.1.1. Specific characteristics 2007.1.2. Some great threats 2007.1.3. Acting to protect oneself 2027.2. The respect of private life 2027.2.1. Our personal data 2027.2.2. Uses of our data 2047.2.3. What about the future? 205x Inside the World of Computing7.3. Influence on social life 2067.3.1. The development of social ties 2067.3.2. Citizen participation 2077.3.3. The socialization of knowledge 2077.4. Dangers to democracy 2087.4.1. The liberation of speech 2087.4.2. Private life under surveillance 2087.4.3. Job insecurity 2097.4.4. The power of the big Internet firms 2097.5. The digital divide 2107.5.1. From division to exclusion 2107.5.2. Digital technology and education 2117.6. Mastering the use of artificial intelligence 2127.7. The intelligent prosthesis and the bionic man 2137.8. Transhumanism 2147.9. What kind of society for tomorrow? 215Bibliography 217Index 219
Chance, Calculation and Life
Chance, Calculation and Life brings together 16 original papers from the colloquium of the same name, organized by the International Cultural Center of Cerisy in 2019. From mathematics to the humanities and biology, there are many concepts and questions related to chance. What are the different types of chance? Does chance correspond to a lack of knowledge about the causes of events, or is there a truly intrinsic and irreducible chance? Does chance preside over our decisions? Does it govern evolution? Is it at the origin of life? What part do chance and necessity play in biology? This book answers these fundamental questions by bringing together the clear and richly documented contributions of mathematicians, physicists, biologists and philosophers who make this book an incomparable tool for work and reflection. THIERRY GAUDIN is an engineer at MINES ParisTech and holds a doctorate in Information Sciences and Communication from Paris Nanterre University, France. He is a widely renowned expert in innovation policy and has worked with the OECD, the European Commission and the World Bank. MARIE-CHRISTINE MAUREL is Professor at Sorbonne University and a researcher at the Institute of Systematics, Evolution, Biodiversity, MNHN, Paris, France. JEAN-CHARLES POMEROL is Professor Emeritus at Sorbonne University, France. He is a specialist in Decision Support Systems and former project leader for information technology in the Engineering Sciences Department at the CNRS. He was formerly in charge of the Artificial Intelligence laboratory at UPMC, Paris, as well as the President of UPMC between 2006 and 2011. Preface xiThierry GAUDIN, Marie-Christine MAUREL, Jean-Charles POMEROLIntroduction xvThierry GAUDIN, Marie-Christine MAUREL, Jean-Charles POMEROLPART 1. RANDOMNESS IN ALL OF ITS ASPECTS 1CHAPTER 1. CLASSICAL, QUANTUM AND BIOLOGICAL RANDOMNESS AS RELATIVE UNPREDICTABILITY 3Cristian S. CALUDE and Giuseppe LONGO1.1. Introduction 31.1.1. Brief historical overview 41.1.2. Preliminary remarks 51.2. Randomness in classical dynamics 61.3. Quantum randomness 81.4. Randomness in biology 151.5. Random sequences: a theory invariant approach 211.6. Classical and quantum randomness revisited 241.6.1. Classical versus algorithmic randomness 241.6.2. Quantum versus algorithmic randomness 261.7. Conclusion and opening: toward a proper biological randomness 271.8. Acknowledgments 301.9. References 30CHAPTER 2. IN THE NAME OF CHANCE 37Gilles PAGÈS2.1. The birth of probabilities and games of chance 372.1.1. Solutions 382.1.2. To what end? 402.2. A very brief history of probabilities 412.3. Chance? What chance? 422.4. Prospective possibility 452.4.1. LLN + CLT + ENIAC = MC 452.4.2. Generating chance through numbers 462.4.3. Going back the other way 482.4.4. Prospective possibility as master of the world? 502.5. Appendix: Congruent generators, can prospective chance be periodic? 532.5.1. A little modulo n arithmetic 532.5.2. From erratic arithmetic to algorithmic randomness 562.5.3. And, the winner is... Mersenne Twister 623.. 602.6. References 61CHAPTER 3. CHANCE IN A FEW LANGUAGES 63Clarisse HERRENSCHMIDT3.1. Classical Sanskrit 643.2. Persian and Arabic 653.3. Ancient Greek 663.4. Russian 673.5. Latin 673.6. French 693.7. English 713.8. Dice, chance and the symbolic world 723.9. References 77CHAPTER 4. THE COLLECTIVE DETERMINISM OF QUANTUM RANDOMNESS 79François VANNUCCI4.1. True or false chance 794.2. Chance sneaks into uncertainty 814.3. The world of the infinitely small 824.4. A more figurative example 844.5. Einstein’s act of resistance 864.6. Schrödinger’s cat to neutrino oscillations 874.7. Chance versus the anthropic principle 904.8. And luck in life? 924.9. Chance and freedom 94CHAPTER 5. WAVE-PARTICLE CHAOS TO THE STABILITY OF LIVING 97Stéphane DOUADY5.1. Introduction 975.2. The chaos of the wave-particle 975.3. The stability of living things 1045.4. Conclusion 1075.5. Acknowledgments 1085.6. References 108CHAPTER 6. CHANCE IN COSMOLOGY: RANDOM AND TURBULENT CREATION OF MULTIPLE COSMOS 109Michel CASSÉ6.1. Is quantum cosmology oxymoronic? 1096.2. Between two realities – at the entrance and exit – is virtuality 1206.3. Who will sing the metamorphoses of this high vacuum? 1206.4. Loop lament 1216.5. The quantum vacuum exists, Casimir has met it 1226.6. The generosity of the quantum vacuum 1226.7. Landscapes 1266.8. The good works of Inflation 1286.9. Sub species aeternitatis 1296.10. The smiling vacuum 130CHAPTER 7. THE CHANCE IN DECISION: WHEN NEURONS FLIP A COIN 133Mathias PESSIGLIONE7.1. A very subjective utility 1337.2. A minimum rationality 1347.3. There is noise in the choices 1357.4. On the volatility of parameters 1377.5. When the brain wears rose-tinted glasses 1387.6. The neurons that take a vote 1407.7. The will to move an index finger 1427.8. Free will in debate 1437.9. The virtue of chance 1447.10. References 145CHAPTER 8. TO HAVE A SENSE OF LIFE: A POETIC RECONNAISSANCE 147Georges AMAR8.1. References 157CHAPTER 9. DIVINE CHANCE 159Bertrand VERGELY9.1. Thinking by chance 1599.2. Chance, need: why choose? 1609.3. When chance is not chance 1629.4. When chance comes from elsewhere 166CHAPTER 10. CHANCE AND THE CREATIVE PROCESS 169Ivan MAGRIN-CHAGNOLLEAU10.1. Introduction 16910.2. Chance 17010.3. Creation 17310.4. Chance in the artistic creative process 17610.5. An art of the present moment 17910.6. Conclusion 18110.7. References 182PART 2. RANDOMNESS, BIOLOGY AND EVOLUTION 185CHAPTER 11. EPIGENETICS, DNA AND CHROMATIN DYNAMICS: WHERE IS THE CHANCE AND WHERE IS THE NECESSITY? 187David SITBON and Jonathan B. WEITZMAN11.1. Introduction 18711.2. Random combinations 18711.3. Random alterations 18811.4. Beyond the gene 18911.5. Epigenetic variation 19011.6. Concluding remarks 19211.7. Acknowledgments 19311.8. References 193CHAPTER 12. WHEN ACQUIRED CHARACTERISTICS BECOME HERITABLE: THE LESSON OF GENOMES 197Bernard DUJON12.1. Introduction 19712.2. Horizontal genetic exchange in prokaryotes 19912.3. Two specificities of eukaryotes theoretically oppose horizontal gene transfer 20012.4. Criteria for genomic analysis 20112.5. Abundance of horizontal transfers in unicellular eukaryotes 20212.6. Remarkable horizontal genetic transfers in pluricellular eukaryotes 20312.7. Main mechanisms of horizontal genetic transfers 20412.8. Introgressions and limits to the concept of species 20712.9. Conclusion 20812.10. References 208CHAPTER 13. THE EVOLUTIONARY TRAJECTORIES OF ORGANISMS ARE NOT STOCHASTIC213Philippe GRANDCOLAS13.1. Evolution and stochasticity: a few metaphors 21313.2. The Gouldian metaphor of the “replay” of evolution 21413.3. The replay of evolution: what happened 21513.4. Evolutionary replay experiments 21713.5. Phylogenies versus experiments 21813.6. Stochasticity, evolution and extinction 21913.7. Conclusion 21913.8. References 220CHAPTER 14. EVOLUTION IN THE FACE OF CHANCE 221Amaury LAMBERT14.1. Introduction 22114.2. Waddington and the concept of canalization 22414.3. A stochastic model of Darwinian evolution 22814.3.1. Redundancy and neutral networks 22814.3.2. A toy model 22914.3.3. Mutation-selection algorithm 23114.4. Numerical results 23114.4.1. Canalization 23114.4.2. Target selection 23414.4.3. Neighborhood selection 23514.5. Discussion 23814.6. Acknowledgments 239CHAPTER 15. CHANCE, CONTINGENCY AND THE ORIGINS OF LIFE: SOME HISTORICAL ISSUES 241Antonio LAZCANO15.1. Acknowledgments 24615.2. References 246CHAPTER 16. CHANCE, COMPLEXITY AND THE IDEA OF A UNIVERSAL ETHICS 249Jean-Paul DELAHAYE16.1. Cosmic evolution and advances in computation 25016.2. Two notions of complexity 25116.3. Biological computations 25216.4. Energy and emergy 25316.5. What we hold onto 25416.6. Noah knew this already! 25416.7. Create, protect and collect 25516.8. An ethics of organized complexity 25516.9. Not so easy 25616.10. References 258List of Authors 261Index 265
PostgreSQL Query Optimization
Write optimized queries. This book helps you write queries that perform fast and deliver results on time. You will learn that query optimization is not a dark art practiced by a small, secretive cabal of sorcerers. Any motivated professional can learn to write efficient queries from the get-go and capably optimize existing queries. You will learn to look at the process of writing a query from the database engine’s point of view, and know how to think like the database optimizer.The book begins with a discussion of what a performant system is and progresses to measuring performance and setting performance goals. It introduces different classes of queries and optimization techniques suitable to each, such as the use of indexes and specific join algorithms. You will learn to read and understand query execution plans along with techniques for influencing those plans for better performance. The book also covers advanced topics such as the use of functions and procedures, dynamic SQL, and generated queries. All of these techniques are then used together to produce performant applications, avoiding the pitfalls of object-relational mappers.WHAT YOU WILL LEARN* Identify optimization goals in OLTP and OLAP systems* Read and understand PostgreSQL execution plans* Distinguish between short queries and long queries* Choose the right optimization technique for each query type* Identify indexes that will improve query performance* Optimize full table scans* Avoid the pitfalls of object-relational mapping systems* Optimize the entire application rather than just database queriesWHO THIS BOOK IS FORIT professionals working in PostgreSQL who want to develop performant and scalable applications, anyone whose job title contains the words “database developer” or “database administrator" or who is a backend developer charged with programming database calls, and system architects involved in the overall design of application systems running against a PostgreSQL databaseHENRIETTA DOMBROVSKAYA is a database researcher and developer with over 35 years of academic and industrial experience. She holds a PhD in computer science from the University of Saint Petersburg, Russia. At present, she is Associate Director of Databases at Braviant Holdings, Chicago, Illinois. She is an active member of the PostgreSQL community, a frequent speaker at the PostgreSQL conference, and a local organizer of the Chicago PostgreSQL User Group. Her research interests are tightly coupled with practice and are focused on developing efficient interactions between applications and databases. She is a winner of the “Technologist of the Year” 2019 award of the Illinois Technology Association.BORIS NOVIKOV is currently a professor in the Department of Informatics at National Research University Higher School of Economics in Saint Petersburg, Russia. He graduated from Leningrad University’s School of Mathematics and Mechanics. He has worked for Saint Petersburg University for a number of years and moved to his current position in January, 2019. His research interests are in a broad area of information management and include several aspects of design, development, and tuning of databases, applications, and database management systems. He also has interests in distributed scalable systems for stream processing and analytics.ANNA BAILLIEKOVA is Senior Data Engineer at Zendesk. Previously, she built ETL pipelines, data warehouse resources, and reporting tools as a team lead on the Division Operations team at Epic. She has also held analyst roles on a variety of political campaigns and at Greenberg Quinlan Rosner Research. She received her undergraduate degree cum laude with College Honors in political science and computer science from Knox College in Galesburg, Illinois. 1. Why Optimize?2. Theory - Yes, We Need It!3. Even More Theory Algorithms4. Understanding Execution Plans5. Short Queries and Indexes6. Long Queries and Full Scans7. Long Queries: Additional Techniques8. Optimizing Data Modification9. Design Matters10. Application Development and Performance11. Functions12. Dynamic SQL13. Avoiding the Pitfalls of Object-Relational Mapping14. More Complex Filtering and Search15. Ultimate Optimization Algorithm16. Conclusion
Pointers in C Programming
Gain a better understanding of pointers, from the basics of how pointers function at the machine level, to using them for a variety of common and advanced scenarios. This short contemporary guide book on pointers in C programming provides a resource for professionals and advanced students needing in-depth hands-on coverage of pointer basics and advanced features. It includes the latest versions of the C language, C20, C17, and C14.You’ll see how pointers are used to provide vital C features, such as strings, arrays, higher-order functions and polymorphic data structures. Along the way, you’ll cover how pointers can optimize a program to run faster or use less memory than it would otherwise.There are plenty of code examples in the book to emulate and adapt to meet your specific needs.WHAT YOU WILL LEARN* Work effectively with pointers in your C programming* Learn how to effectively manage dynamic memory* Program with strings and arrays* Create recursive data structures* Implement function pointersWHO THIS BOOK IS FORIntermediate to advanced level professional programmers, software developers, and advanced students or researchers. Prior experience with C programming is expected.Thomas Mailund is an associate professor in bioinformatics at Aarhus University, Denmark. He has a background in math and computer science, including experience programming and teaching in the C and R programming languages. For the last decade, his main focus has been on genetics and evolutionary studies, particularly comparative genomics, speciation, and gene flow between emerging species.1. Pointers and the random access memory model2. Memory management3. Strings and arrays4. Recursive data structures5. Function pointers
Responsible Data Science
EXPLORE THE MOST SERIOUS PREVALENT ETHICAL ISSUES IN DATA SCIENCE WITH THIS INSIGHTFUL NEW RESOURCEThe increasing popularity of data science has resulted in numerous well-publicized cases of bias, injustice, and discrimination. The widespread deployment of “Black box” algorithms that are difficult or impossible to understand and explain, even for their developers, is a primary source of these unanticipated harms, making modern techniques and methods for manipulating large data sets seem sinister, even dangerous. When put in the hands of authoritarian governments, these algorithms have enabled suppression of political dissent and persecution of minorities. To prevent these harms, data scientists everywhere must come to understand how the algorithms that they build and deploy may harm certain groups or be unfair.Responsible Data Science delivers a comprehensive, practical treatment of how to implement data science solutions in an even-handed and ethical manner that minimizes the risk of undue harm to vulnerable members of society. Both data science practitioners and managers of analytics teams will learn how to:* Improve model transparency, even for black box models* Diagnose bias and unfairness within models using multiple metrics* Audit projects to ensure fairness and minimize the possibility of unintended harmPerfect for data science practitioners, Responsible Data Science will also earn a spot on the bookshelves of technically inclined managers, software developers, and statisticians.GRANT FLEMING is a Data Scientist at Elder Research Inc. His professional focus is on machine learning for social science applications, model interpretability, civic technology, and building software tools for reproducible data science.PETER BRUCE is the Senior Learning Officer at Elder Research, Inc., author of several best-selling texts on data science, and Founder of the Institute for Statistics Education at Statistics.com, an Elder Research Company.Introduction xixPART I MOTIVATION FOR ETHICAL DATA SCIENCE AND BACKGROUND KNOWLEDGE 1CHAPTER 1 RESPONSIBLE DATA SCIENCE 3The Optum Disaster 4Jekyll and Hyde 5Eugenics 7Galton, Pearson, and Fisher 7Ties between Eugenics and Statistics 7Ethical Problems in Data Science Today 9Predictive Models 10From Explaining to Predicting 10Predictive Modeling 11Setting the Stage for Ethical Issues to Arise 12Classic Statistical Models 12Black-Box Methods 14Important Concepts in Predictive Modeling 19Feature Selection 19Model-Centric vs. Data-Centric Models 20Holdout Sample and Cross-Validation 20Overfitting 21Unsupervised Learning 22The Ethical Challenge of Black Boxes 23Two Opposing Forces 24Pressure for More Powerful AI 24Public Resistance and Anxiety 24Summary 25CHAPTER 2 BACKGROUND: MODELING AND THE BLACK-BOX ALGORITHM 27Assessing Model Performance 27Predicting Class Membership 28The Rare Class Problem 28Lift and Gains 28Area Under the Curve 29AUC vs. Lift (Gains) 31Predicting Numeric Values 32Goodness-of-Fit 32Holdout Sets and Cross-Validation 33Optimization and Loss Functions 34Intrinsically Interpretable Models vs. Black-Box Models 35Ethical Challenges with Interpretable Models 38Black-Box Models 39Ensembles 39Nearest Neighbors 41Clustering 41Association Rules 42Collaborative Filters 42Artificial Neural Nets and Deep Neural Nets 43Problems with Black-Box Predictive Models 45Problems with Unsupervised Algorithms 47Summary 48CHAPTER 3 THE WAYS AI GOES WRONG, AND THE LEGAL IMPLICATIONS 49AI and Intentional Consequences by Design 50Deepfakes 50Supporting State Surveillance and Suppression 51Behavioral Manipulation 52Automated Testing to Fine-Tune Targeting 53AI and Unintended Consequences 55Healthcare 56Finance 57Law Enforcement 58Technology 60The Legal and Regulatory Landscape around AI 61Ignorance Is No Defense: AI in the Context of Existing Law and Policy 63A Finger in the Dam: Data Rights, Data Privacy, and Consumer Protection Regulations 64Trends in Emerging Law and Policy Related to AI 66Summary 69PART II THE ETHICAL DATA SCIENCE PROCESS 71CHAPTER 4 THE RESPONSIBLE DATA SCIENCE FRAMEWORK 73Why We Keep Building Harmful AI 74Misguided Need for Cutting-Edge Models 74Excessive Focus on Predictive Performance 74Ease of Access and the Curse of Simplicity 76The Common Cause 76The Face Thieves 78An Anatomy of Modeling Harms 79The World: Context Matters for Modeling 80The Data: Representation Is Everything 83The Model: Garbage In, Danger Out 85Model Interpretability: Human Understanding for Superhuman Models 86Efforts Toward a More Responsible Data Science 89Principles Are the Focus 90Nonmaleficence 90Fairness 90Transparency 91Accountability 91Privacy 92Bridging the Gap Between Principles and Practice with the Responsible Data Science (RDS) Framework 92Justification 94Compilation 94Preparation 95Modeling 96Auditing 96Summary 97CHAPTER 5 MODEL INTERPRETABILITY: THE WHAT AND THE WHY 99The Sexist Résumé Screener 99The Necessity of Model Interpretability 101Connections Between Predictive Performance and Interpretability 103Uniting (High) Model Performance and Model Interpretability 105Categories of Interpretability Methods 107Global Methods 107Local Methods 113Real-World Successes of Interpretability Methods 113Facilitating Debugging and Audit 114Leveraging the Improved Performance of Black-Box Models 116Acquiring New Knowledge 116Addressing Critiques of Interpretability Methods 117Explanations Generated by Interpretability Methods Are Not Robust 118Explanations Generated by Interpretability Methods Are Low Fidelity 120The Forking Paths of Model Interpretability 121The Four-Measure Baseline 122Building Our Own Credit Scoring Model 124Using Train-Test Splits 125Feature Selection and Feature Engineering 125Baseline Models 127The Importance of Making Your Code Work for Everyone 129Execution Variability 129Addressing Execution Variability with Functionalized Code 130Stochastic Variability 130Addressing Stochastic Variability via Resampling 130Summary 133PART III EDS IN PRACTICE 135CHAPTER 6 BEGINNING A RESPONSIBLE DATA SCIENCE PROJECT 137How the Responsible Data Science Framework Addresses the Common Cause 138Datasets Used 140Regression Datasets—Communities and Crime 140Classification Datasets—COMPAS 140Common Elements Across Our Analyses 141Project Structure and Documentation 141Project Structure for the Responsible DataScience Framework: Everything in Its Place 142Documentation: The Responsible Thing to Do 145Beginning a Responsible Data Science Project 151Communities and Crime (Regression) 151Justification 151Compilation 154Identifying Protected Classes 157Preparation—Data Splitting and Feature Engineering 159Datasheets 161COMPAS (Classification) 164Justification 164Compilation 166Identifying Protected Classes 168Preparation 169Summary 172CHAPTER 7 AUDITING A RESPONSIBLE DATA SCIENCE PROJECT 173Fairness and Data Science in Practice 175The Many Different Conceptions of Fairness 175Different Forms of Fairness Are Trade-Offs with Each Other 177Quantifying Predictive Fairness Within a Data Science Project 179Mitigating Bias to Improve Fairness 185Preprocessing 185In-processing 186Postprocessing 186Classification Example: COMPAS 187Prework: Code Practices, Modeling, and Auditing 187Justification, Compilation, and Preparation Review 189Modeling 191Auditing 200Per-Group Metrics: Overall 200Per-Group Metrics: Error 202Fairness Metrics 204Interpreting Our Models: Why Are They Unfair? 207Analysis for Different Groups 209Bias Mitigation 214Preprocessing: Oversampling 214Postprocessing: Optimizing ThresholdsAutomatically 218Postprocessing: Optimizing Thresholds Manually 219Summary 223CHAPTER 8 AUDITING FOR NEURAL NETWORKS 225Why Neural Networks Merit Their Own Chapter 227Neural Networks Vary Greatly in Structure 227Neural Networks Treat Features Differently 229Neural Networks Repeat Themselves 231A More Impenetrable Black Box 232Baseline Methods 233Representation Methods 233Distillation Methods 234Intrinsic Methods 235Beginning a Responsible Neural Network Project 236Justification 236Moving Forward 239Compilation 239Tracking Experiments 241Preparation 244Modeling 245Auditing 247Per-Group Metrics: Overall 247Per-Group Metrics: Unusual Definitions of “False Positive” 248Fairness Metrics 249Interpreting Our Models: Why Are They Unfair? 252Bias Mitigation 253Wrap-Up 255Auditing Neural Networks for Natural Language Processing 258Identifying and Addressing Sources of Bias in NLP 258The Real World 259Data 260Models 261Model Interpretability 262Summary 262CHAPTER 9 CONCLUSION 265How Can We Do Better? 267The Responsible Data Science Framework 267Doing Better As Managers 269Doing Better As Practitioners 270A Better Future If We Can Keep It 271Index 273
Das Medium aus der Maschine
»Die Informatik entwirft drei sehr unterschiedliche Bilder von Computer: Maschine – Werkzeug – Medium. Wie können so gegensätzliche Vorstellungen im gleichen Artefakt einen technologischen Ausdruck finden? Zu welchen Widersprüchen führen so differierende Sichtweisen in der Forschungspraxis der Informatik? Welches sind die Konzepte, über die sie sich verbinden lassen? Und wie verändert sich das Gewicht der Bilder von Maschine, Werkzeug und Medium in der Entwicklungsgeschichte des Computers und der Informatik?«Aus der EinleitungUnveränderter NachdruckHeidi Schelhowe, Prof. Dr., ist Professorin für Digitale Medien in der Bildung in der Informatik an der Universität Bremen und leitet dort die Arbeitsgruppe "Digitale Medien in der Bildung" (dimeb).
Basiswissen Mobile App Testing
Grundlegende Methoden, Verfahren und Werkzeuge zum Testen von mobilen Applikationen.»Basiswissen Mobile App Testing« vermittelt die Grundlagen des Testens mobiler Apps und gibt einen fundierten Überblick über geeignete Testarten, Testmethoden, den Testprozess und das Testkonzept für mobile Anwendungen. Auch auf Qualitätskriterien, mobile App-Plattformen, Werkzeuge und die Automatisierung der Testausführung wird eingegangen. Viele Beispiele aus realen Kundenprojekten erleichtern die Umsetzung des Gelernten in die Praxis.Die Themen im Einzelnen:Geschäftliche & technische Faktoren, Herausforderungen & Risiken, Teststrategien für mobile AppsTests mit Bezug zur mobilen PlattformÜbliche Testarten und der Testprozess für mobile AppsMobile App-Plattformen, Werkzeuge und UmgebungenAutomatisierung der TestausführungDas Buch ist konform zum ISTQB®-Lehrplan »Certified Mobile Application Tester« und eignet sich mit vielen Beispielen und Übungen nicht nur bestens für die Prüfungsvorbereitung, sondern dient gleichzeitig als kompaktes Basiswerk zum Thema in der Praxis und an Hochschulen.Über die Autoren:Björn Lemke ist Managing Consultant bei der trendig technology services GmbH. Die Schwerpunkte seiner Arbeit sind Softwarequalitätssicherung, Integrated Technology and Operations (ITOps), IT-Service-Management (ITIL), Testmanagement, Testdatenmanagement, Testinfrastrukturmanagement sowie Mobile Application Testing in kleinen bis hin zu sehr grossen Projekten.Nils Röttger arbeitet bei der imbus AG in Möhrendorf als Berater, Projektleiter und Speaker und ist u. a. verantwortlich für die Ausbildung und den Bereich Mobile Testing. In seinen Vorträgen beschäftigt er sich immer wieder mit Themen wie exploratives Testen, Usability oder Ethik im Softwaretest.
Introducing .NET for Apache Spark
Get started using Apache Spark via C# or F# and the .NET for Apache Spark bindings. This book is an introduction to both Apache Spark and the .NET bindings. Readers new to Apache Spark will get up to speed quickly using Spark for data processing tasks performed against large and very large datasets. You will learn how to combine your knowledge of .NET with Apache Spark to bring massive computing power to bear by distributed processing of extremely large datasets across multiple servers.This book covers how to get a local instance of Apache Spark running on your developer machine and shows you how to create your first .NET program that uses the Microsoft .NET bindings for Apache Spark. Techniques shown in the book allow you to use Apache Spark to distribute your data processing tasks over multiple compute nodes. You will learn to process data using both batch mode and streaming mode so you can make the right choice depending on whether you are processing an existing dataset or are working against new records in micro-batches as they arrive. The goal of the book is leave you comfortable in bringing the power of Apache Spark to your favorite .NET language.WHAT YOU WILL LEARN* Install and configure Spark .NET on Windows, Linux, and macOS * Write Apache Spark programs in C# and F# using the .NET bindings* Access and invoke the Apache Spark APIs from .NET with the same high performance as Python, Scala, and R* Encapsulate functionality in user-defined functions* Transform and aggregate large datasets * Execute SQL queries against files through Apache Hive* Distribute processing of large datasets across multiple servers* Create your own batch, streaming, and machine learning programsWHO THIS BOOK IS FOR.NET developers who want to perform big data processing without having to migrate to Python, Scala, or R; and Apache Spark developers who want to run natively on .NET and take advantage of the C# and F# ecosystemsED ELLIOTT is a data engineer who has been working in IT for 20 years and has focused on data for the last 15 years. He uses Apache Spark at work and has been contributing to the Microsoft .NET for Apache Spark open source project since it was released in 2019. Ed has been blogging and writing since 2014 at his own blog as well as for SQL Server Central and Redgate. He has spoken at a number of events such as SQLBits, SQL Saturday, and the GroupBy conference.IntroductionPART I. GETTING STARTED1. Understanding Apache Spark2. Setting up Spark3. Programming with .NET for Apache SparkPART II. THE APIS4. User-Defined Functions5. The DataFrame API6. Spark SQL and Hive Tables7. Spark Machine Learning APIPART III. EXAMPLES8. Batch Mode Processing9. Structured Streaming10. Troubleshooting11. Delta LakePART IV. APPENDICESAppendix A. Running in the CloudAppendix B. Implementing .Net for Apache Spark Code
Becoming a Data Head
"TURN YOURSELF INTO A DATA HEAD. YOU'LL BECOME A MORE VALUABLE EMPLOYEE AND MAKE YOUR ORGANIZATION MORE SUCCESSFUL."Thomas H. Davenport, Research Fellow, Author of Competing on Analytics, Big Data @ Work, and The AI AdvantageYOU'VE HEARD THE HYPE AROUND DATA—NOW GET THE FACTS.In Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning, award-winning data scientists Alex Gutman and Jordan Goldmeier pull back the curtain on data science and give you the language and tools necessary to talk and think critically about it.You'll learn how to:* Think statistically and understand the role variation plays in your life and decision making* Speak intelligently and ask the right questions about the statistics and results you encounter in the workplace* Understand what's really going on with machine learning, text analytics, deep learning, and artificial intelligence* Avoid common pitfalls when working with and interpreting dataBecoming a Data Head is a complete guide for data science in the workplace: covering everything from the personalities you’ll work with to the math behind the algorithms. The authors have spent years in data trenches and sought to create a fun, approachable, and eminently readable book. Anyone can become a Data Head—an active participant in data science, statistics, and machine learning. Whether you're a business professional, engineer, executive, or aspiring data scientist, this book is for you.ALEX J. GUTMAN, PHD, is a Data Scientist, Corporate Trainer, and Accredited Professional Statistician. His professional focus is on statistical and machine learning and he has extensive experience working as a Data Scientist for the Department of Defense and two Fortune 50 companies.JORDAN GOLDMEIER is a Data Scientist, author, speaker, and community leader. He is a seven-time recipient of the Microsoft Most Valuable Professional Award and he has taught analytics to members of the Pentagon and Fortune 500 companies.Acknowledgments xiiiForeword xxiiiIntroduction xxviiPART ONE THINKING LIKE A DATA HEADCHAPTER 1 WHAT IS THE PROBLEM? 3Questions a Data Head Should Ask 4Why Is This Problem Important? 4Who Does This Problem Affect? 6What If We Don’t Have the Right Data? 6When Is the Project Over? 7What If We Don’t Like the Results? 7Understanding Why Data Projects Fail 8Customer Perception 8Discussion 10Working on Problems That Matter 11Chapter Summary 11CHAPTER 2 WHAT IS DATA? 13Data vs. Information 13An Example Dataset 14Data Types 15How Data Is Collected and Structured 16Observational vs. Experimental Data 16Structured vs. Unstructured Data 17Basic Summary Statistics 18Chapter Summary 19CHAPTER 3 PREPARE TO THINK STATISTICALLY 21Ask Questions 22There Is Variation in All Things 23Scenario: Customer Perception (The Sequel) 24Case Study: Kidney-Cancer Rates 26Probabilities and Statistics 28Probability vs. Intuition 29Discovery with Statistics 31Chapter Summary 33PART TWO SPEAKING LIKE A DATA HEADCHAPTER 4 ARGUE WITH THE DATA 37What Would You Do? 38Missing Data Disaster 39Tell Me the Data Origin Story 43Who Collected the Data? 44How Was the Data Collected? 44Is the Data Representative? 45Is There Sampling Bias? 46What Did You Do with Outliers? 46What Data Am I Not Seeing? 47How Did You Deal with Missing Values? 47Can the Data Measure What You Want It to Measure? 48Argue with Data of All Sizes 48Chapter Summary 49CHAPTER 5 EXPLORE THE DATA 51Exploratory Data Analysis and You 52Embracing the Exploratory Mindset 52Questions to Guide You 53The Setup 53Can the Data Answer the Question? 54Set Expectations and Use Common Sense 54Do the Values Make Intuitive Sense? 54Watch Out: Outliers and Missing Values 58Did You Discover Any Relationships? 59Understanding Correlation 59Watch Out: Misinterpreting Correlation 60Watch Out: Correlation Does Not Imply Causation 62Did You Find New Opportunities in the Data? 63Chapter Summary 63CHAPTER 6 EXAMINE THE PROBABILITIES 65Take a Guess 66The Rules of the Game 66Notation 67Conditional Probability and Independent Events 69The Probability of Multiple Events 69Two Things That Happen Together 69One Thing or the Other 70Probability Thought Exercise 72Next Steps 73Be Careful Assuming Independence 74Don’t Fall for the Gambler’s Fallacy 74All Probabilities Are Conditional 75Don’t Swap Dependencies 76Bayes’ Theorem 76Ensure the Probabilities Have Meaning 79Calibration 80Rare Events Can, and Do, Happen 80Chapter Summary 81CHAPTER 7 CHALLENGE THE STATISTICS 83Quick Lessons on Inference 83Give Yourself Some Wiggle Room 84More Data, More Evidence 84Challenge the Status Quo 85Evidence to the Contrary 86Balance Decision Errors 88The Process of Statistical Inference 89The Questions You Should Ask to Challenge the Statistics 90What Is the Context for These Statistics? 90What Is the Sample Size? 91What Are You Testing? 92What Is the Null Hypothesis? 92Assuming Equivalence 93What Is the Significance Level? 93How Many Tests Are You Doing? 94Can I See the Confidence Intervals? 95Is This Practically Significant? 96Are You Assuming Causality? 96Chapter Summary 97PART THREE UNDERSTANDING THE DATA SCIENTIST’S TOOLBOXCHAPTER 8 SEARCH FOR HIDDEN GROUPS 101Unsupervised Learning 102Dimensionality Reduction 102Creating Composite Features 103Principal Component Analysis 105Principal Components in Athletic Ability 105PCA Summary 108Potential Traps 109Clustering 110k-Means Clustering 111Clustering Retail Locations 111Potential Traps 113Chapter Summary 114CHAPTER 9 UNDERSTAND THE REGRESSION MODEL 117Supervised Learning 117Linear Regression: What It Does 119Least Squares Regression: Not Just a Clever Name 120Linear Regression: What It Gives You 123Extending to Many Features 124Linear Regression: What Confusion It Causes 125Omitted Variables 125Multicollinearity 126Data Leakage 127Extrapolation Failures 128Many Relationships Aren’t Linear 128Are You Explaining or Predicting? 128Regression Performance 130Other Regression Models 131Chapter Summary 131CHAPTER 10 UNDERSTAND THE CLASSIFICATION MODEL 133Introduction to Classification 133What You’ll Learn 134Classification Problem Setup 135Logistic Regression 135Logistic Regression: So What? 138Decision Trees 139Ensemble Methods 142Random Forests 143Gradient Boosted Trees 143Interpretability of Ensemble Models 145Watch Out for Pitfalls 145Misapplication of the Problem 146Data Leakage 146Not Splitting Your Data 146Choosing the Right Decision Threshold 147Misunderstanding Accuracy 147Confusion Matrices 148Chapter Summary 150CHAPTER 11 UNDERSTAND TEXT ANALYTICS 151Expectations of Text Analytics 151How Text Becomes Numbers 153A Big Bag of Words 153N-Grams 157Word Embeddings 158Topic Modeling 160Text Classification 163Naïve Bayes 164Sentiment Analysis 166Practical Considerations When Working with Text 167Big Tech Has the Upper Hand 168Chapter Summary 169CHAPTER 12 CONCEPTUALIZE DEEP LEARNING 171Neural Networks 172How Are Neural Networks Like the Brain? 172A Simple Neural Network 173How a Neural Network Learns 174A Slightly More Complex Neural Network 175Applications of Deep Learning 178The Benefits of Deep Learning 179How Computers “See” Images 180Convolutional Neural Networks 182Deep Learning on Language and Sequences 183Deep Learning in Practice 185Do You Have Data? 185Is Your Data Structured? 186What Will the Network Look Like? 186Artificial Intelligence and You 187Big Tech Has the Upper Hand 188Ethics in Deep Learning 189Chapter Summary 190PART FOUR ENSURING SUCCESSCHAPTER 13 WATCH OUT FOR PITFALLS 193Biases and Weird Phenomena in Data 194Survivorship Bias 194Regression to the Mean 195Simpson’s Paradox 195Confirmation Bias 197Effort Bias (aka the “Sunk Cost Fallacy”) 197Algorithmic Bias 198Uncategorized Bias 198The Big List of Pitfalls 199Statistical and Machine Learning Pitfalls 199Project Pitfalls 200Chapter Summary 202CHAPTER 14 KNOW THE PEOPLE AND PERSONALITIES 203Seven Scenes of Communication Breakdowns 204The Postmortem 204Storytime 205The Telephone Game 206Into the Weeds 206The Reality Check 207The Takeover 207The Blowhard 208Data Personalities 208Data Enthusiasts 209Data Cynics 209Data Heads 209Chapter Summary 210CHAPTER 15 WHAT’S NEXT? 211Index 215
Beginning HCL Programming
Get started with programming and using the Hashicorp Language (HCL). This book introduces you to the HCL syntax and its ecosystem then it shows you how to integrate it as part of an overall DevOps approach.Next, you’ll learn how to implement infrastructure as code, specifically, using the Terraform template, a set of cloud infrastructure automation tools. As part of this discussion, you’ll cover Consul, a service mesh solution providing a full-featured control plane with service discovery, configuration, and segmentation functionality. You’ll integrate these with Vault to build HCL-based infrastructure as code solutions.Finally, you’ll use Jenkins and HCL to provision and maintain the infrastructure as code system. After reading and using Beginning HCL Programming, you'll have the know-how and source code to get started with flexible HCL for all your cloud and DevOps needs.WHAT YOU WILL LEARN* Get started with programming and using HCL* Use Vault, Consul, and Terraform * Apply HCL to infrastructure as codeDefine the Terraform template with HCL * Configure Consul using HCL* Use HCL to configure Vault* Provision and maintain infrastructure as code using Jenkins and HCLWHO THIS BOOK IS FORAnyone new to HCL but who does have at least some prior programming experience as well as knowledge of DevOps in general.PIERLUIGI RITI is a senior DevOps engineer at Coupa Software and Sunchronoss Technologies. Prior to that, he was a senior software engineer at Ericsson and Tata. His experience includes implementing DevOps in the cloud using Google Cloud Platform as well as AWS and Azure. Also, he has over ten years of extensive experience in more general design and development of different scale applications particularly in the telco and financial industries. He has quality development skills using the latest technologies including Java, J2EE, C#, F#, .NET, Spring .NET, EF, WPF, WF, WinForm, WebAPI, MVC, Nunit, Scala, Spring, JSP, EJB, Struts, Struts2, SOAP, REST, C, C++, Hibernate, NHibernate, Weblogic, XML, XSLT, Unix script, Ruby, and Python.DAVID FLYNN is an Associate Analyst in Employee Access Business Operations at Mastercard. He is an Electronic Engineer with experience in telecommunications, networks, software, security and Financial Systems. David started out as a Telecommunications Engineer working on Voice, data and wireless systems for Energis and later Nortel Networks supporting systems such as Lucent G3r, Alcatel E10 & Nortel Passport. He then did some time in Transport and Private security abroad before retraining in Computing, Cyber Security and Cloud Systems plus doing Cyber Security & Telecomm research for the Civil Service. He has completed separate Diplomas in Computing and Cloud focusing on Windows, C# , Google, AWS and Powershell amongst other technologies. David also has worked as a C# Engineer. More recently David has worked for various fintech companies including Bank Of America Merril Lynch focusing on technical & Application Support encompassing such technologies as Rsa Igl, Rsa SecurID, IBM Tam/Isam, Postgres/Oracle databases, Mainframe, Tandem, CyberArk, MaxPro and Active Directory.1 Introduction to HCLDefine the history of HCL, the basic syntax and, show the basic configuration syntax and the basic usage of the HCL2 The Hashicorp ecosystemShow the different software create by Hashicorpt like Vault, Consul, Terraform3 Introduction to GoA small introduction on the Go language, we use Go to define the configuration template described in the book4 Infrastructure As CodeDefine what is the Infrastructure as Code and how we can do that5 Introduction to the Cloud and DevOpsIn this chapter, we have a short introduction to the Cloud and the DevOps6 Use HCL for TerraformWe start to use the HCL for define Terraform template7 Consul HCLIn this chapter we introduce the HCL for Consul, we learn how to configure Consul using the HCL8 Vault HCLUse the HCL for configure Vault9 Infrastructure as Code with HCLDesign the Infrastructure as Code use the Hashicorp language, in particular, we use Terraform, Vault and Consul10 Provisioning and Maintain the Infrastructure as CodeIn this chapter, we see how to use Jenkins and the HCL for provisioning and maintain the infrastructure as code
Practical Internet Server Configuration
Learn the skills to complete the full installation, configuration, and maintenance of an enterprise class internet server, no matter what Unix-like operating system you prefer. This book will rapidly guide you towards real system administration, with clear explanations along the way.After a chapter explaining the most important Unix basics, you will start with a vanilla server as delivered by a hosting provider and by the end of the book, you will have a fully functional and well-secured enterprise class internet server. You will also be equipped with the expertise needed to keep your server secured and up to date. All configuration examples are given for FreeBSD, Debian and CentOS, so you are free to choose your operating system.No single blueprint exists for an internet server, and an important part of the work of a system administrator consists of analyzing, interpreting and implementing specific wishes, demands and restrictions from different departments and viewpoints within an organization. Practical Internet Server Configuration provides the information you need to succeed as a sysadmin.WHAT YOU'LL LEARN* Configure DNS using Bind 9* Set up Apache and Nginx* Customize a mail server: IMAP (Dovecot) and SMTP (Postfix), spam filtering included* Authenticate mail users using LDAP* Install and maintain MariaDB and PostgreSQL databases* Prepare SSL/TLS certificates for the encryption of web, mail and LDAP traffic* Synchronize files, calendars and address books between devices* Build a firewall: PF for FreeBSD and nftables for LinuxWHO THIS BOOK IS FORThis book can be used by aspiring and beginning system administrators who are working on personal servers, or more experienced system administrators who may know Unix well but need a reference book for the more specialized work that falls outside the daily routine. Basic understanding of Unix and working on the command line is necessary.Robert La Lau has been active on the internet since the mid-90s. What started as a hobby – playing around with Linux, and developing small games and applications using Perl, HTML and JavaScript – turned into a job when he became a full-time freelance web developer in 1999. Shortly thereafter, a web hosting server and freelance Linux and FreeBSD administration were added. In the years that followed, new programming languages were learned, and software development was added to the range of services offered. In his spare time, Rob was involved in several smaller and larger open source projects; among other things, he was the initiator and first administrator for the official online KDE forums. After 15 years of freelance IT work, Rob thought he'd had enough of IT work, finished his running affairs, and left the Netherlands to discover the world. However, the IT kept calling him, and once installed in his new home country France, he decided to return to his old métier. Only this time, it was not to get his own hands dirty in the field, executing orders for clients, but to transfer his knowledge and experience onto the next generations of system administrators and developers. He rebooted his IT career translating and narrating educational books and videos, taught some Unix classes, and seems to have found his destination publishing books now.1. Introduction and Preparations2. Unix and POSIX in a Few Words3. Software management4. Network (Base) and Firewall5. User Management and Permissions6. Domain Name System (DNS)7. Secure shell (SSH)8. Task Scheduling9. Web Server Part 1: Apache/Nginx Basics10. Traffic Encryption: SSL/TLS11. Databases12. Email Basics13. Web Server Part 2: Advanced Apache/Nginx14. Advanced Email15. Backup and Monitoring16. Taking it Further
Essential TypeScript 4
Learn the essentials and more of TypeScript, a popular superset of the JavaScript language that adds support for static typing. TypeScript combines the typing features of C# or Java with the flexibility of JavaScript, reducing typing errors and providing an easier path to JavaScript development.Author ADAM FREEMAN explains how to get the most from TypeScript 4 in this second edition of his best-selling book.He begins by describing the TypeScript language and the benefits it offers and then shows you how to use TypeScript in real-world scenarios, including development with the DOM API, and popular frameworks such as Angular and React. He starts from the nuts-and-bolts and builds up to the most advanced and sophisticated features.Each topic is covered clearly and concisely, and is packed with the details you need to be effective. The most important features are given a no-nonsense, in-depth treatment and chapters include common problems and teach you how to avoid them.WHAT YOU WILL LEARN* Gain a solid understanding of the TypeScript language and tools* Use TypeScript for client- and server-side development* Extend and customize TypeScript* Test your TypeScript code* Apply TypeScript with the DOM API, Angular, React, and Vue.js WHO THIS BOOK IS FORJavaScript developers who want to use TypeScript to create client-side or server-side applicationsADAM FREEMAN is an experienced IT professional who has held senior positions at a range of companies, most recently serving as chief technology officer and chief operating officer of a global bank. Now retired, he spends his time writing and long-distance running.PART 1 - GETTING STARTED WITH TYPESCRIPT1. Your First TypeScript Application2. Understanding TypeScript3. JavaScript Types Primer, Part 14. JavaScript Types Primer, Part 25. Using the TypeScript Compiler6. Testing and Debugging TypeScriptPART 2 - WORKING WITH TYPESCRIPT7. Understanding Status Types8. Using Functions9. Using Arrays, Tuples and Enums10. Working with Objects11. Working with Classes and Interfaces12. Using Generic Types13. Advanced Generic Types14. Working with JavaScriptPART 3 - CREATING WEB APPLICATIONS15. Creating a Stand-Alone Web App, Part 116. Creating a Stand-Alone Web App, Part 217. Creating an Angular App, Part 118. Creating an Angular App, Part 219. Creating a React App, Part 120. Creating a React App, Part 221. Creating a Vue.js App, Part 122. Creating a Vue.js App, Part 2
Deep Learning with Python
Master the practical aspects of implementing deep learning solutions with PyTorch, using a hands-on approach to understanding both theory and practice. This updated edition will prepare you for applying deep learning to real world problems with a sound theoretical foundation and practical know-how with PyTorch, a platform developed by Facebook’s Artificial Intelligence Research Group.You'll start with a perspective on how and why deep learning with PyTorch has emerged as an path-breaking framework with a set of tools and techniques to solve real-world problems. Next, the book will ground you with the mathematical fundamentals of linear algebra, vector calculus, probability and optimization. Having established this foundation, you'll move on to key components and functionality of PyTorch including layers, loss functions and optimization algorithms.You'll also gain an understanding of Graphical Processing Unit (GPU) based computation, which is essential for training deep learning models. All the key architectures in deep learning are covered, including feedforward networks, convolution neural networks, recurrent neural networks, long short-term memory networks, autoencoders and generative adversarial networks. Backed by a number of tricks of the trade for training and optimizing deep learning models, this edition of Deep Learning with Python explains the best practices in taking these models to production with PyTorch.WHAT YOU'LL LEARN* Review machine learning fundamentals such as overfitting, underfitting, and regularization.* Understand deep learning fundamentals such as feed-forward networks, convolution neural networks, recurrent neural networks, automatic differentiation, and stochastic gradient descent.* Apply in-depth linear algebra with PyTorch* Explore PyTorch fundamentals and its building blocks* Work with tuning and optimizing models WHO THIS BOOK IS FORBeginners with a working knowledge of Python who want to understand Deep Learning in a practical, hands-on manner.Nikhil S. Ketkar currently leads the Machine Learning Platform team at Flipkart, India’s largest e-commerce company. He received his Ph.D. from Washington State University. Following that he conducted postdoctoral research at University of North Carolina at Charlotte, which was followed by a brief stint in high frequency trading at Transmaket in Chicago. More recently he led the data mining team in Guavus, a startup doing big data analytics in the telecom domain and Indix, a startup doing data science in the e-commerce domain. His research interests include machine learning and graph theory.Jojo Moolayil is an artificial intelligence, deep learning, machine learning, and decision science professional with over five years of industrial experience and is a published author of the book Smarter Decisions – The Intersection of IoT and Decision Science. He has worked with several industry leaders on high-impact and critical data science and machine learning projects across multiple verticals. He is currently associated with Amazon Web Services as a research scientist. He was born and raised in Pune, India and graduated from the University of Pune with a major in Information Technology Engineering. He started his career with Mu Sigma Inc., the world’s largest pure-play analytics provider and worked with the leaders of many Fortune 50 clients. He later worked with Flutura – an IoT analytics startup and GE. He currently resides in Vancouver, BC. Apart from writing books on decision science and IoT, Jojo has also been a technical reviewer for various books on machine learning, deep learning and business analytics with Apress and Packt publications. He is an active data science tutor and maintains a blog at http://blog.jojomoolayil.com.CHAPTER 1 – INTRODUCTION DEEP LEARNINGA brief introduction to Machine Learning and Deep Learning. We explore foundational topics within the subject that provide us the building blocks for several topics within the subject.CHAPTER 2 – INTRODUCTION TO PYTORCHA quick-start guide to PyTorch and a comprehensive introduction to tensors, linear algebra and mathematical operations for Tensors. The chapter provides the required PyTorch foundations for readers to meaningfully implement practical Deep Learning solutions for various topics within the book. Advanced PyTorch topics are explored as and when touch-based during the course of exercises in later chapter.CHAPTER 3- FEED FORWARD NETWORKS (30 PAGES)In this chapter, we explore the building blocks of a neural network and build an intuition on training and evaluating networks. We briefly explore loss functions, activation functions, optimizers, backpropagation, that could be used for training. Finally, we would stitch together each of these smaller components into a full-fledged feed-forward neural network with PyTorch.CHAPTER 4-AUTOMATIC DIFFERENTIATION IN DEEP LEARNINGIn this chapter we open this black box topic within backpropagation that enables training of neural networks i.e. automatic differentiation. We cover a brief history of other techniques that were ruled out in favor of automatic differentiation and study the topic with a practical example and implement the same using PyTorchs Autograd module.CHAPTER 5 – TRAINING DEEP NEURAL NETWORKSIn this chapter we explore few additional important topics around deep learning and implement them into a practical example. We will delve into specifics of model performance and study in detail about overfitting and underfitting, hyperparameter tuning and regularization. Finally, we will leverage a real dataset and combined our learnings from the beginning of this book into a practical example using PyTorch.CHAPTER 6 – CONVOLUTIONAL NEURAL NETWORKS (35 PAGES)Introduction to Convolutional Neural Networks for Computer Vision. We explore the core components with CNNs with examples to understand the internals of the network, build an intuition around the automated feature extraction, parameter sharing and thus understand the holistic process of training CNNs with incremental building blocks. We also leverage hands-on exercises to study the practical implementation of CNNs for a simple dataset i.e. MNIST (classification of handwritten digits), and later extend the exercise for a binary classification use-case with the popular cats and dogs’ dataset.CHAPTER 7 – RECURRENT NEURAL NETWORKSIntroduction to Recurrent Neural Networks and its variants (viz. Bidirectional RNNs and LSTMs). We explore the construction of a recurrent unit, study the mathematical background and build intuition around how RNNs are trained by exploring a simple four step unrolled network. We then explore hands-on exercises in natural language processing that leverages vanilla RNNs and later improve their performance by using Bidirectional RNNS combined with LSTM layers.CHAPTER 8 – RECENT ADVANCES IN DEEP LEARNINGA brief note of the cutting-edge advancements in the field will be added. We explore important inventions within the field with no implementation details, however focus on the applications and the path forward.
Pro ASP.NET Core Identity
Get the most from ASP.NET Core Identity. Best-selling author ADAM FREEMAN teaches developers common authentication and user management scenarios and explains how they are implemented in applications. He covers each topic clearly and concisely, and the book is packed with the essential details you need to be effective.The book takes a deep dive into the Identity framework and explains how the most important and useful features work in detail, creating custom implementations of key components to reveal the inner workings of ASP.NET Core Identity. ASP.NET Core Identity provides authentication and user management for ASP.NET Core applications. Identity is a complex framework in its own right, with support for a wide range of features, including authenticating users with services provided by Google, Facebook, and Twitter.WHAT YOU WILL LEARN* Gain a solid understanding of how Identity provides authentication and authorization for ASP.NET Core applications* Configure ASP.NET Core Identity for common application scenarios, including self-service registration, user management, and authentication with services provided by popular social media platforms* Create robust and reliable user management tools* Understand how Identity works in detailWHO THIS BOOK IS FORDevelopers with advanced knowledge of ASP.NET Core who are introducing Identity into their projects. Prior experience and knowledge of C#, ASP.NET Core is required, along with a basic understanding of authentication and authorization concepts.ADAM FREEMAN is an experienced IT professional who has held senior positions in a range of companies, most recently serving as chief technology officer and chief operating officer of a global bank. Now retired, he spends his time writing and long-distance running. Part 1 - Using ASP.NET Core Identity1. Getting Ready2. Your First Identity Application3. Creating the Example Project4. Using the Identity UI5. Configuring Identity6. Adapting Identity UI7. Using the Identity API8. Signing In and Out and Managing Passwords9. Creating and Deleting Accounts10. Using Roles and Claims11. Two-Factor and External Authentication12. Authenticating API ClientsPart 2 - Understanding ASP.NET Core Identity13. Creating the Example Project14. Working with ASP.NET Core15. Authorizing Requests16. Creating a User Store17. Claims, Roles, and Confirmations18. Signing In with Identity19. Creating a Role Store20. Lockouts and Two-Factor Sign Ins21. Authenticators and Recovery Codes22. External Authentication - Part 123. External Authentication - Part 2
Practical Machine Learning for Streaming Data with Python
Design, develop, and validate machine learning models with streaming data using the Scikit-Multiflow framework. This book is a quick start guide for data scientists and machine learning engineers looking to implement machine learning models for streaming data with Python to generate real-time insights.You'll start with an introduction to streaming data, the various challenges associated with it, some of its real-world business applications, and various windowing techniques. You'll then examine incremental and online learning algorithms, and the concept of model evaluation with streaming data and get introduced to the Scikit-Multiflow framework in Python. This is followed by a review of the various change detection/concept drift detection algorithms and the implementation of various datasets using Scikit-Multiflow.Introduction to the various supervised and unsupervised algorithms for streaming data, and their implementation on various datasets using Python are also covered. The book concludes by briefly covering other open-source tools available for streaming data such as Spark, MOA (Massive Online Analysis), Kafka, and more.WHAT YOU'LL LEARN* Understand machine learning with streaming data concepts* Review incremental and online learning* Develop models for detecting concept drift* Explore techniques for classification, regression, and ensemble learning in streaming data contexts* Apply best practices for debugging and validating machine learning models in streaming data context* Get introduced to other open-source frameworks for handling streaming data.WHO THIS BOOK IS FORMachine learning engineers and data science professionalsDr. Sayan Putatunda is an experienced data scientist and researcher. He holds a Ph.D. in Applied Statistics/ Machine Learning from the Indian Institute of Management, Ahmedabad (IIMA) where his research was on streaming data and its applications in the transportation industry. He has a rich experience of working in both senior individual contributor and managerial roles in the data science industry with multiple companies such as Amazon, VMware, Mu Sigma, and more. His research interests are in streaming data, deep learning, machine learning, spatial point processes, and directional statistics. As a researcher, he has multiple publications in top international peer-reviewed journals with reputed publishers. He has presented his work at various reputed international machine learning and statistics conferences. He is also a member of IEEE.Chapter 1: An Introduction to Streaming DataChapter Goal: Introduce the readers to the concept of streaming data, the various challenges associated with it, some of its real-world business applications, various windowing techniques along with the concepts of incremental and online learning algorithms. This chapter will also help in understanding the concept of model evaluation in case of streaming data and provide and introduction to the Scikit-Multiflow framework in Python.No of pages- 35Sub -Topics1. Streaming data2. Challenges of streaming data3. Concept drift4. Applications of streaming data5. Windowing techniques6. Incremental learning and online learning7. Illustration : Adopting batch learners into incremental learners8. Introduction to Scikit-Multiflow framework9. Evaluation of streaming algorithmsChapter 2: Change DetectionChapter Goal: Help the readers to understand the various change detection/concept drift detection algorithms and its implementation on various datasets using Scikit-Multiflow.No of pages : 35Sub - Topics:1. Change detection problem2. Concept drift detection algorithms3. ADWIN4. DDM5. EDDM6. Page HinkleyChapter 3: Supervised and Unsupervised Learning for Streaming DataChapter Goal: Help the readers to understand the various regression and classification (including Ensemble Learning) algorithms for streaming data and its implementation on various datasets using Scikit-Multiflow. Also, discuss some approaches for clustering with streaming data and its implementation using Python.No of pages: 35Sub - Topics:1. Regression with streaming data2. Classification with streaming data3. Ensemble Learning with streaming data4. Clustering with streaming dataChapter 4: Other Tools and the Path ForwardChapter Goal: Introduce the readers to the other open source tools for handling streaming data such as Spark streaming, MOA and more. Also, educate the reader about additional reading for advanced topics within streaming data analysis.No of pages: 35Sub - Topics:1. Other tools for handling streaming data1.1.1. Apache Spark1.1.2. Massive Online Analysis (MOA)1.1.3. Apache Kafka2. Active research areas and breakthroughs in streaming data analysis3. Conclusion
Protective Security
This book shows you how military counter-intelligence principles and objectives are applied. It provides you with valuable advice and guidance to help your business understand threat vectors and the measures needed to reduce the risks and impacts to your organization. You will know how business-critical assets are compromised: cyberattack, data breach, system outage, pandemic, natural disaster, and many more.Rather than being compliance-concentric, this book focuses on how your business can identify the assets that are most valuable to your organization and the threat vectors associated with these assets. You will learn how to apply appropriate mitigation controls to reduce the risks within suitable tolerances.You will gain a comprehensive understanding of the value that effective protective security provides and how to develop an effective strategy for your type of business.WHAT YOU WILL LEARN* Take a deep dive into legal and regulatory perspectives and how an effective protective security strategy can help fulfill these ever-changing requirements* Know where compliance fits into a company-wide protective security strategy* Secure your digital footprint* Build effective 5 D network architectures: Defend, detect, delay, disrupt, deter* Secure manufacturing environments to balance a minimal impact on productivity* Securing your supply chains and the measures needed to ensure that risks are minimizedWho This Book Is ForBusiness owners, C-suite, information security practitioners, CISOs, cybersecurity practitioners, risk managers, IT operations managers, IT auditors, and military enthusiastsJIM (JAMES) SEAMAN has been dedicated to the pursuit of security for his entire adult life. He served 22 years in the RAF Police, covering a number of specialist areas (physical security, aviation security, information security management, IT security management, cyber security management, security investigations, intelligence operations, incident response and disaster recovery), before successfully transitioning his skills to corporate environments (financial services, banking, retail, manufacturing, ecommerce, marketing, etc.) to help businesses enhance their cyber/InfoSec defensive measures working with various industry security standards.CHAPTER 1: WHAT IS PROTECTIVE SECURITY (PS)?An introduction to the term ‘Protective Security’ and a description of why this differs to other industry terms (e.g. Cyber Security, Information Security, IT Security, Network Security, etc.)?Why PS should be an integral for your business operations?CHAPTER 2: PROTECTIVE SECURITY (PS) IN TERMS OF THE LEGAL & REGULATORY PERSPECTIVE.A deep dive into the Legal and Regulatory perspectives and how an effective PS strategy can help fulfil these ever-changing requirements?PS and the European Union General Data Protection Act (EU-GDPR).CHAPTER 3: THE INTEGRATION OF COMPLIANCE WITH PROTECTIVE SECURITY (PS).A description of where compliance fits into a company-wide PS strategy.PS and the Payment Card Industry Data Security Standard (PCI DSS).CHAPTER 4: THE DEVELOPMENT OF AN EFFECTIVE PROTECTIVE SECURITY (PS) STRATEGY.A comprehensive guide to the development of an effective strategy, aligning business assets to their importance for the business objectives and goals, to incorporate the threats, risks, and core components of any strategy.Strategic alignment with the business context.CHAPTER 5: CYBER SECURITY.A deep dive into the concept of Cyber Security, with a focus on Point of Origins (PoO) that occur in the ‘Badlands’ (e.g. outside the corporate network) to compromise internet-facing technologies (e.g. Ecommerce, Digital, Mobile, etc.)Securing your Digital Footprint.CHAPTER 6: NETWORK/IT SECURITY.The importance of secure by design/default networks to help safeguard your most important business IT assets from compromise.Lateral Movement Attacks.CHAPTER 7: INFORMATION SYSTEMS SECURITY.Providing a guide to the securing of these systems, as a separate asset type, based upon the value of the data assets to the business and to aid the application of the 5 Ds of Security (Defend, Detect, Delay, Disrupt & Deter).Building Effective 5 Ds Network Architectures.CHAPTER 8: PHYSICAL SECURITY.A comprehensive guide to the development of appropriate physical security measures and its importance within the Protective Security strategy.Fortifying Your Business Operations.CHAPTER 9: INDUSTRIAL SYSTEMS SECURITYIncreasingly, Manufacturing systems are vulnerable to cyber-attacks. Gain an insight how securing these environments can be balanced with a minimal impact on productivity.Manufacturing Secure Operations.CHAPTER 10: SECURING YOUR SUPPLY CHAINGain an appreciation for securing your Supply Chains and the measures needed to ensure that the Supply Chain risks are minimized.The Weakest Link?CHAPTER 11: DEVELOPING YOUR INTERNAL FIREWALL.A focus on the development of a robust Security Culture, through the proactive engagement with a business’ personnel assets.Security Is Not A Dirty Word.CHAPTER 12: STRICT ACCESS RESTRICTIONSThe ‘Need To Know’/’Need To Access’ are the fundamental principles for any effective Protective Security strategy. Gain an insight into why this is the case and how to ensure that this is the case within your organization.The Keys To Your Empire.CHAPTER 13: BUILDING RESILIENT SYSTEMSGain an appreciation for the business value of building resilient systems and an understanding on what is required to develop resilience into your PS strategy.The Ability To ‘Bounce Back’.CHAPTER 14: DEMONSTRATING THE PROTECTIVE SECURITY (PS) RETURN ON INVESTMENTS (ROI)The value of an effective PS strategy is often underappreciated by business leaders. Gain an understanding on how to demonstrate to that their investments continue to deliver a robust security posture and continues to ensure that they remain a less viable target.The Value of PS.
Datenschutz nach DS-GVO und Informationssicherheit gewährleisten
In vielen Unternehmen und Behörden gibt es zahlreiche Verfahren, die sowohl die Anforderungen des Datenschutzes als auch die der Informationssicherheit erfüllen müssen. Was liegt da näher als die Auswahl der erforderlichen Sicherungsmaßnahmen in einem einheitlichen Vorgehen zu ermitteln. Mit diesem Werk gibt der Autor dem Praktiker einen Leitfaden an die Hand, den dieser gleichermaßen bei einfachen als auch komplexen Verfahren anwenden kann.Im ersten Teil wird auf Basis des Prozesses ZAWAS die Umsetzung der Anforderungen der DS-GVO (einschl. DSFA) aufgezeigt. Zusätzlich zeigt der Autor im zweiten Teil des Buches auf, wie durch eine kleine Prozesserweiterung dieses Vorgehen auch auf die Ermittlung der erforderlichen Sicherungsmaßnahmen für die Informationssicherheit genutzt werden kann.Dieses Vorgehen reduziert den Gesamtaufwand und führt zu einem höheren Schutzniveau.STEFAN MIEROWSKI, MSc., Dipl. Finanzwirt (FH), studierte Informatik und Rechtswissenschaft, Referent bei der Landesbeauftragten für den Datenschutz Niedersachsen, ehemaliger Referent beim BSI und zertifizierter ISO 27001 Auditor, Schöpfer des Prozesses ZAWAS. Ausgangslage: Anforderung der Digitalisierung.- Darstellung der Informationssicherheit und des Datenschutzes.- Der Prozess zur Auswahl angemessener Sicherungsmaßnahmen (ZAWAS).- Prüfung der Übertragbarkeit des Prozesses ZAWAS auf die Informationssicherheit.- Fazit.- Zusammenfassung
Introducing Blockchain with Lisp
Implement blockchain from scratch covering all the details with Racket, a general-purpose Lisp. You'll start by exploring what a blockchain is, so you have a solid foundation for the rest of the book. You'll then be ready to learn Racket before starting on your blockchain implementation. Once you have a working blockchain, you'll move onto extending it. The book's appendices provide supporting resources to help you in your blockchain projects.The recommended approach for the book is to follow along and write the code as it’s being explained instead of reading passively. This way you will get the most out of it. All of the source code is available for free download from GitHub.WHAT YOU WILL LEARN* Discover the Racket programming language and how to use it* Implement a blockchain from scratch using Lisp* Implement smart contracts and peer-to-peer support* Learn how to use macros to employ more general abstractionsWHO THIS BOOK IS FORNovices that have at least some experience with programming, as well as some basic working experience with computers. The book also assumes some experience with high school mathematics, such as functions.Boro Sitnikovski has over ten years of experience working professionally as a software engineer. He started programming with assembly on an Intel x86 at the age of ten. While in high school, he won several prizes in competitive programming, varying from 4th, 3rd, and 1st place. He is an informatics graduate - his bachelor’s thesis was titled “Programming in Haskell using algebraic data structures”, and his master’s thesis was titled “Formal verification of Instruction Sets in Virtual Machines”. He has also published a few papers on software verification. Other research interests of his include programming languages, mathematics, logic, algorithms, and writing correct software. He is a strong believer in the open-source philosophy and contributes to various open-source projects. In his spare time, he enjoys some time off with his family.1: Introduction to Blockchain2: Racket Programming Language3: Blockchain Implementation4: Extending the BlockchainConclusionFurther ReadingAppendix A: Macros
Scrum Master 2.0
Das nächste Level - Neuerscheinung in 04/2021!Dieses Buch ist für Scrum Master geschrieben, die festgestellt haben, dass ihnen die Theorie von Scrum alleine nicht weiterhilft. Denn wir arbeiten mit und für ein Team von Menschen, die ihre Schwächen, Stärken und Eigenheiten haben. Und da ist der offizielle Scrum Guide nur ein kleiner Teil der tatsächlichen Arbeitsinhalte. Hier setzt das Buch »Scrum Master 2.0« an: Nach den theoretischen Grundlagen zu diesem agilen Framework geht es um die Arbeit mit dem Team, um die tägliche Gestaltung des Scrum Master-Alltags, seine Vorgehensweisen, seine Tools, seine Interventionen. Scrum Master 2.0 startet da, wo der Scrum Guide endet.Jedes Kapitel konzentriert sich auf einen anderen Praxisbereich. Angesprochen werden Themen wie Teamentwicklung und -motivation, agile Konzepte, Visualisierung, Stressprävention, Kommunikation, Coaching, Kontaktmanagement, agile Moderation und vieles mehr. Dieses Buch ist somit ein unerlässliches Handwerkszeug für jeden Scrum Master.Kenntnisse des Scrum Frameworks werden für dieses Buch vorausgesetzt. Alle Themen lassen sich einfach und effektiv in den Arbeitsalltag integrieren.Leseprobe (PDF-Link)
Neuronale Netze mit C# programmieren
Mit praktischen Beispielen für Machine Learning im Unternehmenseinsatz.Sie wollen neuronale Netze und Machine-Learning-Algorithmen mit C# entwickeln? Dann finden Sie in diesem Buch eine gut verständliche Einführung in die Grundlagen und es wird Ihnen gezeigt, wie Sie neuronale Netze und Machine-Learning-Algorithmen in Ihren eigenen Projekten praktisch einsetzen.Mithilfe von Beispielen erstellen und trainieren Sie Ihr erstes neuronales Netz zur vorausschauenden Wartung einer Produktionsmaschine.Im Praxisteil lernen Sie dann, wie Sie TensorFlow-Modelle in ML.NET benutzen oder Infer.NET direkt verwenden können. Des Weiteren nutzen Sie die Predictive- und Sentiment-Analyse, um sich mit Machine-Learning-Algorithmen vertraut zu machen.Alle im Buch vorgestellten Projekte sind in C# programmiert und stehen als Download zur Verfügung. Grundkenntnisse in C# werden für die Arbeit mit dem Buch vorausgesetzt. Alle Projekte lassen sich ohne größere Rechnerressourcen umsetzen.Daniel Basler arbeitet als Lead Developer und Softwarearchitekt. Seine Schwerpunkte liegen auf Cross-Platform-Apps, Android, JavaScript und Microsoft-Technologien. Er entwickelt u.a. Software für Regal- und Flächenlagersysteme sowie Anlagenvisualisierung und setzt in diesem Umfeld verstärkt Machine-Learning-Methoden ein. Darüber hinaus schreibt er regelmäßig Artikel für die Fachzeitschriften dotnetpro und web&mobile Developer.Leseprobe (PDF-Link)
Stochastic Approaches to Electron Transport in Micro- and Nanostructures
The book serves as a synergistic link between the development of mathematical models and the emergence of stochastic (Monte Carlo) methods applied for the simulation of current transport in electronic devices. Regarding the models, the historical evolution path, beginning from the classical charge carrier transport models for microelectronics to current quantum-based nanoelectronics, is explicatively followed. Accordingly, the solution methods are elucidated from the early phenomenological single particle algorithms applicable for stationary homogeneous physical conditions up to the complex algorithms required for quantum transport, based on particle generation and annihilation. The book fills the gap between monographs focusing on the development of the theory and the physical aspects of models, their application, and their solution methods and monographs dealing with the purely theoretical approaches for finding stochastic solutions of Fredholm integral equations. Part I Aspets of Electron Transport Modeling: 1. Concepts of Device Modeling.- 2. The Semiconductor Model: Fundamentals.- 3. Transport Theories in Phase Space.- 4. Monte Carlo Computing.- Part II Stochastic Algorithms for Boltzmann Transport: 5. Homogeneous Transport: Empirical Approach.- 6. Homogeneous Transport: Stochastic Approach.- 7. Small Signal Analysis.- 8. Inhomogeneous Stationary Transport.- 9. General Transport: Self-Consistent Mixed Problem.- 10. Event Biasing.- Part III Stochastic Algorithms for Quantum Transport: 11.Wigner Function Modeling.- 12. Evolution in a Quantum Wire.- 13. Hierarchy of Kinetic Models.- 14. Stationary Quantum Particle Attributes.- 15. Transient Quantum Particle Attributes.