联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Database作业Database作业

日期:2024-03-26 03:53

CSCI 3280 Introduction to Multimedia Systems

Spring 2024, Assignment 2 - LZW Compression

Due Date: Mar. 28th, 2024 (11:59pm) Submission via Blackboard

Late submission penalty: 10% deduction per day (maximum 30%)

PLAGIARISM penalty: Whole Course Failed


Introduction

LZW is a dictionary-based compression algorithm and does not perform any analysis on the input text. LZW is widely known for its application in GIF and in the V.42

communication standard. The idea behind the LZW algorithm is simple but there are  implementation details that should betaken care of. It is lossless, meaning no data is lost when compressing. The algorithm is simple to implement and has the potential    for very high throughput in hardware implementations. In this assignment, you are     required to write an LZW algorithm (both the compression and decompression). The  following sections explain the details.

Code Dictionary

Code dictionary is a dictionary holding the correspondence between strings and

codes. By using the LZW algorithm, we can construct the dictionary on the run, and represent a string with its corresponding code in the dictionary so that less number of bits can be used to store the same information.

In this assignment, we will be using 12-bit code sequence to represent strings of 8- bit characters in the original file. Because a 12 bit can have values from 0 to 4095,   the dictionary can hold 4095 different strings (except code 4095 which reserved as EOF code). The first 256 entries of the dictionary need to be initialized as the 256    ASCII characters to represent all single-character strings.

An interesting aspect of this algorithm is that we don’t need to store the dictionary

in the compressed file; instead, we discard the dictionary once the files are compressed, and reconstruct on the run when westart to decompress.

Once you dictionary is full, you need to delete the dictionary and start a new dictionary.

You can refer to the example below. The first 256 and the last entries have

predefined values, while the remaining entries can be used to hold multi-character strings encountered during the compression. 

Algorithm for Compression

You can follow the procedure below to compress data using LZW algorithm.

1)   Initialize DICT with the first 256 entries

2)   Define 2 variables: STRING, CHAR

3)   Get first byte from file and store in STRING

4)   While there is more byte in file, read 1 byte and store in CHAR

5)           If (STRING+CHAR) is in DICT, let STRING = (STRING+CHAR)

6)           Else, write code of STRING to compressed file, add (STRING+CHAR) to DICT and let STRING = CHAR

7)   Write code of STRING

In this way, we will be scanning through input character stream, and constantly adding incoming characters into a variable STRING. Once the STRING + its following character CHAR forms a new string which is not currently in the dictionary, we will   record this STRING+CHAR into the DICT, then output the code corresponding to STRING, and finally, start the new STRING from the current CHAR (we will not be using the newly added code to represent the first occurrence of this unseen combination, but to use it when we see this combination again).

Make sure you start a new dictionary when the current one has no empty entries.

On the other hand, you should not start new dictionaries when you finish compressing one file and start with another, unless the dictionary is full.

Algorithm for Decompression

1)   Initialize DICT with the first 256 entries

2)   Define 4 variables: CURRENT, NEXT, STRING, CHAR

3)   Get first code from file and store in CURRENT

4)   While there is more code in input, read 1 code and store in NEXT

5)           Translate CURRENT and store in STRING, and write STRING to output

6)           If entry NEXT is non-empty in DICT, translate NEXT and store the first character in CHAR

7)           Else, let CHAR = first char of STRING

8)           Add (STRING+CHAR) to DICT and let CURRENT = NEXT

9)   Output translation of CURRENT

Following this procedure, wescan through the compressed code sequence, each time load a code into CURRENT and the following one into NEXT.

Remember that we output the code of STRING and save (STRING+CHAR) into the

DICT if and only if the current STRING is interrupted by the incoming CHAR during    the compression. To do the inverse process of the Compression, for each CURRENT, NEXT pair in the sequence, we need to first decode CURRENT back to STRING and    write STRING to the decompressed file, and then save (STRING + the following

CHAR) into DICT.

Because NEXT is representing the string following the one represented by CURRENT, in most cases we can use the first character of the translation of NEXT as CHAR. But   the tricky part is that sometimes NEXT does not exist in DICT, waiting for the newest (STRING+CHAR) to be inserted first to know its value. In this case, we can reason

that CHAR==(STRING+CHAR)[0], and CHAR==STRING[0] for any STRING that has length >= 1. As a result, we can instead use STRING[0] as the value of CHAR.

If you implemented the algorithm correctly, the DICT in decompression should be

exactly the same with the DICT in compression. Similar to the compression part, you need to start new dictionaries on demand.

Basic Part (80%)

1)   Write a Python program based on the given skeleton code to perform compression and decompression of files using LZW.

2)     Program has to support compressing multiple files into one file and decompress

back to multiple files. The code sequences of different files should be separated by   a <EOF> code. There is no need to extract filenames from input file paths; code for   reading and writing file headers has been given in the skeleton code. An example of compressed file (the header is in plaintext) is given as:

.\File1.txt\n

.\File2.txt\n

\n

<Code sequence for File1.txt> <EOF(4095)> <Code sequence for File2.txt> <EOF(4095)>

3)    The code size has to be 12-bit (dictionary size = 4096).

4)   The decompressed file has to be exactly the same with the input file.

5)     Program will be tested with both ASCII files and binary files.

6)     The input file can have arbitrary length. (You should ensure your program to work well with several mega-bytes of input data)

7)     The compressed file will not be compared with a sample compressed output,

however, there will be penalty if your compressed file is more than 10% larger than the sample compressed file (significantly over-sized output indicate you are

implementing the dictionary incorrectly).

8)   The program should run as a console program and accepts the following arguments:

For compression:

python lzw.py –c <compressed file name> <file 1> <file 2> …

(Example) python lzw.py –c .\output.lzw .\text.txt .\image.png

For decompression:

python lzw.py –d <compressed file name> -o <output dir>

(Example) python lzw.py –d .\output.lzw -o .\output

Enhancement Part (20%)

In this part, you can implement features related to LZW. Here are some examples:

?   Try implementing encryption of the compressed file. You can use your

password to manipulate the dictionary or the initial state of the compressed file to avoid decompression of meaningful content. You can choose any

simple encryption algorithms suitable to be used in the compression process.

?   Try implement binary-to-text encoding (e.g. base64, hex) to convert the

compressed file from binary file into text characters. Such method is widely used by systems like E-mailto provide compatibility for non-ASCII files.

?   Try implement variable-width code which use 8-bit code in the beginning and increase the code size by 1 each time all codes in the current dictionary have  been used up. By this means, shorter code in the early stage can be stored in  a more compact way.

You can also implement other interesting features about LZW compression.

Please write a report to describe your enhancement features. For each additional feature:

1)   Detail of the features

2)   Necessary code explanation related to the features

3)   Sample input and output (if applicable)

4)   The techniques used (if applicable)

5)   References (if applicable)

The report is important for the identification and understanding of your

implemented feature. We will do the grading and test the correctness of the source code based on the features you mentioned in the report.

If the features claimed in the report can only be barely used (e.g., with a lot of bugs) or even do not exist in the source code, marks will be deducted.

General Specifications

1.   Program should  be coded  in  Python and uses libraries listed in the skeleton code only.

2.   You are not allowed to use extra libraries that are not imported in the skeleton code for your assignment. If you are proposing enhancement features that are not mentioned in the specification, and have good reason to use extra libraries, please consult the TAs.

3.   The following files are provided for you to write and test your program. a.   lzw_skeleton.py: The skeleton code for LZW program.

i.   read_file_header,  write_file_header functions  for helping  you  to  read  and  write  file  headers  in  the  required format.

ii.   read_code, write_code functions for reading and writing N-bit code from and to files.

b. Three test files: Windows.txt, CSE.txt, web.bmp

c.    lzw.exe & compressed.lzw: A sample program for compression and decompression, and a sample compressed file containing all three test files. (The sample program does not support “-o” option to change the output directory, the decompressed file will overwrite the input file if you decompress in the same folder)

(You don’t need to have exact same output with the sample program)

4.   You are required to submit source codes only. We will test your code using a Python3 interpreter.

Please make sure that your source code is fully operational, failing to execute will receives 10-point deduction for each failed source.

Submission ( Deadline: Mar. 28th, 2024 11:59pm )

We  expect  the  following  files  zipped  into  a  file  named  by  your  student  ID  (e.g., 1155xxxxxx.zip) and have it uploaded to the course’s Blackboard system.

1.    report.pdf (Tell us anything that we should pay attention to, especially about the enhancement part)

2.    lzw.py (Basic requirement source code)

3.    lzw_enhancement.py (Enhancement part source code)

Other files that are provided are not required to be submitted.

 

 


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp