Data Wrangling
Data storage refers to the recording (storing) of information in a storage medium. Traditional methods include handwriting, phonographic recording, and magnetic tape, while modern digital data storage employs various technologies to retain vast amounts of data. Key types of computer data storage include:
Magnetic-tape data storage: A system for storing digital information on magnetic tape, using digital recording. It has been an important medium for primary data storage, especially in backups and archiving.
Optical data storage: This method involves saving data on optical media like CDs, DVDs, and Blu-ray discs. A notable advancement is 3D optical data storage, which records information with three-dimensional resolution.
5D optical data storage: Also known as Superman memory crystal, this experimental technology uses five-dimensional data storage methods to achieve high capacity and longevity.
Holographic data storage: A potential technology for high-capacity data storage, relying on the principles of holography to store data.
DNA digital data storage: The process of encoding and decoding binary data to and from synthesized strands of DNA, offering a compact and long-term storage solution.
Digital Data Storage (DDS): Based on the Digital Audio Tape (DAT) format, DDS is used for computer data storage and was developed during the 1980s.
Paper data storage: Although considered outdated, it includes methods like writing and illustrating, which can be interpreted by machines for data storage.
Data integration involves combining data from different sources to provide a unified view. This is crucial for businesses and enterprises that need to consolidate information for analysis, reporting, and decision-making. Key aspects of data integration include:
ETL (Extract, Transform, Load): This process includes three stages:
Enterprise Application Integration (EAI): The use of software and architectural principles to integrate various enterprise computer applications, ensuring seamless data flow between systems.
Data warehouse: A central repository where data from different sources is stored, transformed, and made available for analysis and reporting. It often employs both ETL and ELT processes.
Web data integration (WDI): Aggregating and managing data from different websites into a single workflow, which includes extraction, transformation, and loading of web data.
Core data integration initiatives: These often include ETL implementations and EAI, which are critical for creating a unified data view across different systems.