联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2020-08-22 11:10

KIT308/408 Multicore Architecture and Programming

Assignment 1 — Multithreading

Aims of the assignment

The purpose of this assignment is to give you experience at writing a multithreaded program for the CPU. This assignment will

give you an opportunity to demonstrate your understanding of:

creating and handling multiple threads;

partitioning work;

dynamic scheduling; and

data-sharing between threads.

Due Date

11:55pm Friday 21st of August (Week 6 of semester)

Late assignments will only be accepted in exceptional circumstances and provided that the proper procedures have been

followed (see the School Office or this link for details) assignments which are submitted late without good reason will be

subject to mark penalties if they are accepted at all (see the School office or this link for details on this as well).

Forms to request extensions of time to submit assignments are available from the School of Engineering and ICT office.

Requests must be accompanied by suitable documentation and should be submitted before the assignment due date.

Due to the tight schedule of this unit (and the reliance of future assignments on a solution to this one), extensions will be unable

to be granted and late assignments will not be accepted. If exception circumstances occur for an individual student they must

contact the lecture-in-charge before the assignment due date to arrange alternative methods of assessment in this unit.

Assignment Submission

Your assignment is to be submitted electronically via MyLO and should contain:

An assignment cover sheet;

A .zip (or .rar) containing a Visual Studio Solution containing a project for each attempted stage of the assignment (in the

format provided in the downloadable materials provided below).

A document containing:

A table of timing information comparing the original single-threaded times against each of three stages on each

scene file.

An analysis of the above timing data.

You do not need to (and shouldn't) submit executables, temporary object files, or images. In particular, you

must delete the ".vs" diretory before submission as it just Visual Studio temporary files and 100s of MBs. Do not however

delete the "Scenes" folder or the "Outputs" folder (but do delete the images within this one).

Task/Topic Marks

1. Basic Multithreaded CPU Implementation 30%

Correct (5%) and elegant (5%) thread management

(i.e. the correct image for any number of threads (even >64 threads)

10%

Correct use of "-thread" command-line argument 5%

Successful use of "-testMode" and "-colourise" command-line arguments 5%

Correct partitioning of render for images with height equally divisible by thread count

(i.e. the correct image, with exactly equal work area for each thread, and with the correct number of samples

rendered for: Tests 1, 2, 4, 5, and 7 with 8 threads)

5%

Correct partitioning of render for images with height NOT equally divisible by thread count

(i.e. the correct image, with work area allocated "evenly" across threads, and with the correct number of

samples rendered for: Tests 3 and 6 with 8 threads, and also Test 1 with 3, 5, 6, 7, and 513 threads)

5%

2. Dynamically Balanced Line-Based Multithreaded CPU Implementation 20%

Correct (5%) and elegant (5%) thread management 10%

Correct (5%) and elegant (5%) data-sharing between threads

(i.e. no possibility of threads "fighting" over shared memory locations)

10%

3. Dynamically Balanced Square-Based Multithreaded CPU Implementation 30%

Correct and elegant thread management 5%

Correct and elegant data-sharing between threads 5%

Correct use of "-blockSize" command-line argument 5%

Correct partitioning of render into squares for images with size that is equally divisible by block size

(i.e. the correct image, with exactly equal work area for each block, rendered for: Tests 1, 2, 5, and 7 with

block size 8 and 64)

5%

Correct partitioning of render into squares for images with size that is NOT equally divisible by block size

(i.e. the correct image, with work area for each block either at block size or smaller at edges, rendered for:

Tests 3, 4, and 6 with block size 8 and 64)

5%

Correct partitioning of render with no extra samples generated

(i.e. the correct image, with work area for each block either at block size or smaller at edges, and with the

correct number of samples rendered for: Tests 3, 4, and 6 with block size 8 and 64)

5%

Documentation 20%

Outputs showing timing information for each stage on all applicable scene files with all thread counts

(to get full marks for this part your code needs to pass all tests for all stages above)

10%

Analysis of data with respect to CPU used

(to get full marks for this part your code needs to pass all tests for all stages above)

10%

Penalties

Failure to comply with submission instructions (eg. no cover sheet, incorrect submission of files, submission of

unnecessary temporary or image files, abnormal solution/project structure, etc.)

-10%

Poor programming style (eg. insufficient / poor comments, poor variable names, poor indenting, obfuscated

code without documentation, compiler warnings, etc.)

up to -20%

Lateness (-20% for up to 24 hours, -50% for up to 7 days, -100% after 7 days)

Late assignments will not be accepted

up to -100%

Marking

This assignment will be marked out of 100. The following is the breakdown of marks:

Programming Style

This assignment is not focussed on programming style, but you should endeavour to follow good programming practices. You

should, for example:

comment your code;

use sensible variables names;

use correct and consistent indenting; and

internally document (with comments) any notable design decisions.

[NOTE: any examples in the provided assignment materials that don't live up to the above criteria, should be considered to be

deliberate examples of what not to do and are provided to aid your learning ;P]

The Assignment Task

You are to implement a "simple" raytracer in a multithreaded fashion (for a quick definition of raytracing see the wikipedia page).

From the provided single-threaded raytracer implementation, you will create multiple subsequent versions as follows:

1. A basic multithreaded CPU implementation.

2. A dynamically balanced line-based multithreaded CPU implementation.

3. A dynamically balanced square-based multithreaded CPU implementation.

Implementation

1. Basic Multithreaded CPU Implementation

This stage involves splitting up the render across multiple threads by splitting up the render into equal sized chunks (or almost

equal sized chunks when the number of threads doesn't divide evenly into the image height) with each chunk being rendered

entirely by an individual thread.

In order to complete this step you will need to:

Write code to partition the rendering job into chunks (so that each thread generates a different part of the final image).

Write code for the to create and manage multiple threads.

At the end of this stage the program should be able to handle any image size (any dimensions that are multiples of four, at least,

up to the maximum of 2048x2048).

To be eligible for full marks, your assignment should also use the command-line option "-threads" to set the number of threads to

any value that is less than or equal to the height of the image being rendered and also respect the "-colourise" option to tint each

section of the image with a set of colours (colour re-use is expected for a large number of threads).

2. Dynamically Balanced Line-Based Multithreaded CPU Implementation

This stage involves splitting up the render into lines which are dynamically allocated to threads from the thread pool as they

request more work.

In order to complete this step you will need to:

Start as many threads as required (specified from the command-line arguments) for the thread pool.

Manage some shared data between the threads so that each do the correct work.

3. Dynamically Balanced Square-Based Multithreaded CPU Implementation

This stage involves splitting up the render into squares which are dynamically allocated to thread from the thread pool as they

request more work.

In order to complete this step you will need to:

Partition the render into squares (requires more carefulness when writing results to the image).

NOTE: this implementation is (slighty) harder than Stage 2, but might not result in an increase in performance.

Hints / Tips

The techniques required to complete each stage rely heavily on work done in the tutorials 2 and 3 — refer to them often.

Documentation

For each stage of the assignment you attempt you should provide:

average (of 3 runs) timing information for each scene file for the total time taken for a render for particular thread counts.

See the next section for an example format for this timing table that specifies which tests are required for which scenes;

and

an explanation of the results (e.g. why there's no difference between the performance of stages 1, 2, and 3 (NOTE: this is

a made up example and isn't necessarily what to expect), or why a particular type of implementation works well (or

poorly) on a particular scene, etc.). This explanation should be with respect to the CPU on the system on which you ran the

tests, and you should discuss how the architectural features of the CPU explain the results.

Tests / Timing

The following table lists all the tests that your code needs to generate correctly at each stage. It also shows the timing tests that

need to be performed in order to fully complete the documentation section of the assignment.

In order to confirm your images match the images created by the base version of the assignment code, it's strongly

recommended you use a image comparison tool. For part of the marking for this, Image Magick will be used:

Image Magick: download for Windows/Mac/Linux.

e.g. running this from the command-line to test:

magick.exe compare -metric mae Outputs\cornell.txt_1024x1024x1_RayTracerAss1.exe.bmp

Outputs_REFERENCE\cornell.txt_1024x1024x4_RayTracerAss1.exe.bmp diff.bmp

Test

Average Time (Milliseconds)

Single

Threaded Multithreaded Technique / (Stage)

Threads

1 2 3 4 5 6 7 8

1. -input

Scenes/cornell.txt -size

1024 1024 -samples 1

Static Chunks (Stage 1) X X X X X X X

Dynamic Lines (Stage 2) X X X X X X X

Dynamic Squares (Stage 3) -- blocksize 8 X X X X X X X

Dynamic Squares (Stage 3) -- blocksize 64 X X X X X X X

2. -input

Scenes/cornell.txt -size

1024 1024 -samples 4

Static Chunks (Stage 1) X X X X X X X

Dynamic Lines (Stage 2) X X X X X X X

Dynamic Squares (Stage 3) -- blocksize 8 X X X X X X X

Dynamic Squares (Stage 3) -- blocksize 64 X X X X X X X

3. -input

Scenes/cornell.txt -size

500 300 -samples 1

Static Chunks (Stage 1) X X X X X X X

Dynamic Lines (Stage 2) X X X X X X X

Dynamic Squares (Stage 3) -- blocksize 8 X X X X X X X

Dynamic Squares (Stage 3) -- blocksize 64 X X X X X X X

4. -input

Scenes/allmaterials.txt

-size 1000 1000 -

samples 1

Static Chunks (Stage 1) X X X X X X X

Dynamic Lines (Stage 2) X X X X X X X

Dynamic Squares (Stage 3) -- blocksize 8 X X X X X X X

Dynamic Squares (Stage 3) -- blocksize 64 X X X X X X X

5. -input Scenes/cornell-

199lights.txt -size 256

256 -samples 1

Static Chunks (Stage 1) X X X X X X X

Dynamic Lines (Stage 2) X X X X X X X

Dynamic Squares (Stage 3) -- blocksize 8 X X X X X X X

Dynamic Squares (Stage 3) -- blocksize 64 X X X X X X X

6. -input

Scenes/5000spheres.txt

-size 480 270 -samples

1

Static Chunks (Stage 1) X X X X X X X

Dynamic Lines (Stage 2) X X X X X X X

Dynamic Squares (Stage 3) -- blocksize 8 X X X X X X X

Dynamic Squares (Stage 3) -- blocksize 64 X X X X X X X

7. -input Scenes/dudes.txt

-size 256 256 -samples

1

Static Chunks (Stage 1)

Dynamic Lines (Stage 2)

Dynamic Squares (Stage 3) -- blocksize 8

Dynamic Squares (Stage 3) -- blocksize 64

This produces an image ("diff.bmp") showing the differences between the two source images, and also a numeric summary

of the difference, here these images are 0.13266% different):

0.33827 (0.0013266)

(this example deliberately compares the 1x1 sampled to the 4x4 sampled image to show there is a difference).

Note: to fully complete the empty cells in the table below, around 180 total images will need be calculated (each test needs 3

runs, and the first six tests need to be run with the base code, and with 8 threads for each stage of the assignment, and test 7

needs to be run for 1–8 threads for stage 1, stage 2, and stage 3 with two different block sizes), so plan your time accordingly.

Provided Materials

The materials provided with this assignment contain:

the source code of the base single-threaded version of the raytracer;

a set of scene files to be supplied to the program;

a set of reference output files created from the single-threaded version of the program;

four batch files that will run all 7 timing tests for the base code, and each of the three stages; and

a further set of five batch files that will run comparison tests (assuming Image Magick is installed in its default location) for

each of the three stages.

Download the materials as an ZIP file.

Source Code

The provided code consists of 19 source files.

Raytracing logic:

Raytrace.cpp: this file contains the main function which reads the supplied scene file, begins the raytracing, and

writes the output BMP file. The main render loop, ray trace function, and handling of reflection and refraction is also

in this file.

Intersection.h and Intersection.cpp: these files define a datastructure for capturing relevant information at the point

of intersection between a ray and a scene object and functions for testing for individual ray-object collisions and rayscene

collisions.

Lighting.h and Lighting.cpp: these files provide functions to apply a lighting calculation at a single intersection point.

Texturing.h and Texturing.cpp: these files provide functions for the reading points from 3D procedural textures.

Constants.h: this header provide constant definitions used in the raytracing.

Basic types:

Primitives.h: this header contains definitions for points, vector, and rays. It also provides functions and overloaded

operators for performing calculations with vectors and points.

SceneObjects.h: this header file provides definitions for scene objects (ie. materials, lights, spheres, and boxes).

Colour.h: this header defines a datastructure for representing colours (with each colour component represented as a

float) and simple operations on colours, including conversions to/from the standard BGR pixel format.

Scene definition and I/O:

Scene.h and Scene.cpp: the header file contains the datastructure to represent a scene and a single function that

initialises this datastructure from a file. The scene datastructure itself consists of properties of the scene and lists of

the various scene objects as described above. The implementation file contains many functions to aide in the scene

loading process. Scene loading relies upon the functionality provided by the Config class.

Config.h and Config.cpp: this class provide facilities for parsing the scene file.

SimpleString.h: this is helper string class used by the Config class.

Image I/O:

ImageIO.h and ImageIO.cpp: these files contain the definitions of functions to read and write BMP files.

Miscellaneous:

Timer.h: this class provides a simple timer that makes use of different system functions depending on whether

TARGET_WINDOWS, TARGET_PPU, or TARGET_SPU is defined (we don't use the latter two, but I left this file unchanged in

case anyone wanted to see how such cross-platform stuff can be handled).

Executing

The program has the following functionality:

By default it will attempt to load the scene "Scenes/cornell.txt" and render it at 1024x1024 with 1x1 samples.

By default it will output a file named "Outputs/[scenefile-name]_[width]x[height]x[sample-level]_[executablefilename].bmp"

(e.g. with all the default options, "Outputs/cornell.txt_1024x1024x1_RayTracerAss1.exe.bmp")

It takes command line arguments that allow the user to specify the width and height, the anti-aliasing level (must be a

power of two), the name of the source scene file, the name of the destination BMP file, and the number of times to perform

the render (to improve the timing information).

Additionally it accepts some arguments (that are currently unused) for setting the number of threads, whether each thread

will tint the area that it renders, and the size of the block to render (only applicable for stage 3).

Further it accepts an arguments "-testMode" that will fill the output with white. When executing with multiple threads this

should be a greyscale colour that represents the thread number.

It loads the specified scene.

It renders the scene (as many times as requested).

It produces timing information for the average time taken to produce a render ignoring all file IO, and the number of

samples generated per run.

It outputs the rendered scene as a BMP file.

For example, running the program at the command line with no arguments would perform the first test (as described in the scene

file section):

On execution this would produce output similar to the following (as well as writing the resultant BMP file to

Outputs/cornell.txt_1024x1024x1_RayTracerAss1.exe.bmp):

average time taken (1 run(s)): 3672ms

Testing Batch Files

A number of batch files are provided that are intended to be executed from the command line, e.g.

For timing:

Stage1Timing.bat 3 8 will perform all the timing tests required for Stage 1 (i.e. 3 runs with 8 threads one each Test

scene).

Stage2Timing.bat 3 8 will perform all the timing tests required for Stage 2 (i.e. 3 runs with 8 threads one each Test

scene).

Stage3Timing.bat 3 8 8 will perform all the timing tests required for Stage 3 at a particular block size (i.e. 3 runs

with 8 threads and block size 8 on each Test scene).

For testing (requires Image Magick installation), e.g.:

Stage1Tests1.bat will perform all the comparison required for Stage 1 Tests where the thread count evenly divides

into the image height.

Stage1Tests2.bat will perform all the comparison required for Stage 1 Tests where the thread count doesn't evenly

divide into the image height.

Ray Tracing

The materials provided with this assignment include an implementation of a simple raytracer based on the raytracing tutorial at

codermind.com (links currently broken, but still available via the wayback machine here). This tutorial also provides a pretty good

introduction to the concepts of ray tracing.

Apart from a general structure rewrite and simplification of the code, the supplied implementation has the following similarities

and differences to the codermind tutorial:

it retains support for:

spheres;

diffuse (Lambert) and specular (Blinn-Phong) lighting;

shadows;

super-sampled anti-aliasing;

simple exposure;

a conic perspective camera model; and

reflection and refraction.

it is extended, in that is provides additional support for:

(axis-aligned) boxes;

(simple) procedural 3D textures;

setting the camera's position and (xz) rotation;

reading and writing BMP files.

it is simplified, in that it doesn't provide support for:

blobs;

gamma calculations;

perlin noise based procedural textures;

bump mapping;

auto exposure estimation;

bitmap texturing;

cubemap texturing;

an environment cubemap;

depth of field; and

support for reflection and refraction in the same material.

For those of you who want more information (as it's not entirely necessary to understand these concepts to be able to complete

the assignment) on some of the general and/or mathematical aspects of raytracing, see:

General explanations:

Codermind Tutorial (archived mirror): A step-by-step guide and the original source of the codebase of the

assignment materials.

Project Amiga Juggler: A step-by-step guide to the maths and implementation of a raytracer (handling only spheres

and a ground plane) in Java:

Step 1: vectors, rays, dot-product, and cross-product.

Step 5: camera model.

Step 8: ray-sphere intersections and Phong shading.

Step 9: ray-plane intersections.

Step 13: reflection.

Wikipedia's ray tracing page: a basic outline of the concepts.

Princeton Ray Casting Lectures: concepts, maths, and some pseudo code.

Princeton Illumination lectures: concepts, maths, and some pseudo code.

Softsurfer Algorithms: geometry algorithms.

Intersections:

Ray-sphere intersections: wikipedia, codermind (part 1), Project Amiga Juggler (step 8), Princeton Ray Casting

Lectures (Slide 11–14).

Ray-AABB (axis-aligned bounding box) intersections: medium.

Ray-plane intersections: wikipedia, Princeton Ray Casting Lectures (Slide 17), Softsurfer.com Algorithms.

Ray-triangle intersections: wikipedia, source paper, lighthouse3d, one more explanation.

Ray-something intersections: Real-time Rendering.

Lighting:

Diffuse lighting: codermind (part 1), Princeton Illumination Lectures (Slide 19–21).

Specular lighting: codermind (part 2), Princeton Illumination Lectures (Slide 23–25).

Shadowing: Project Amiga Juggler (step 8).

Perlin Noise:

2D: wikipedia.

3D: Understanding Perlin Noise.

One of the scenes uses voxel characters created by miklovesrobots originally in the .VOX format:

Voxel character / objects: Mini Mike's Metro Minis.

Magica Voxel editor and resources: MagicaVoxel.

Voxel file format: voxel-model.


版权所有:留学生编程辅导网 2018 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。