Work note: 2007-08

2007-08-30

2007-08-25

2007-08-22

You cannot modify the Hosts file or the Lmhosts file in Windows Vista

You cannot modify the Hosts file or the Lmhosts file in Windows Vista

WORKAROUND
To work around this issue, follow these steps:
1. Click Start Start button , click All Programs, click Accessories, right-click Notepad, and then click Run as administrator.

User Account Control permission If you are prompted for an administrator password or for a confirmation, type the password, or click Allow.
2. Open the Hosts file or the Lmhosts file, make the necessary changes, and then click Save on the Edit menu.

Links in Today

kqemu, QEMU Accelerator modules

alias in dos, use the doskey.

2007-08-21

libmtsk.so

Forbestperformanceandfunctionality,makesurethatthelatestOpenMPruntimelibrary,
libmtsk.so,isinstalled on the running system.

How can I find out the libmtsk.so. :(

Haha~~ have the patch for old-school. :D

SunOS 5.9: Microtasking libraries (libmtsk) patch

2007-08-20

Algorithm for square matrix multiplication

The trivial: O(n^3)
The Strassen: O(n^2.807)
The fastest*: O(n^2.376)

As the Strassen algorithm [ http://0rz.tw/bf2Xo ] mentioned:

The reduction in the number of multiplications however comes at the price of a somewhat reduced numeric stability.

* Coppersmith–Winograd algorithm
Reference:

*The Simultaneous Triple Product Property and Group-theoretic Results for the Exponent of Matrix Multiplication, arXiv:cs.CS/0703145

2007-08-19

Links in Today

Improving Application Efficiency Through Chip Multi-Threading

2.1.1.8 Multi-Threaded malloc

malloc and free are single-threaded operations and are among the bottlenecks for multi-threaded applications. A multi-threaded malloc scales with multi-threaded requests and can improve multi-threaded application performance. The Solaris OS has two types of multi-threaded malloc libraries, mt-malloc and umem.

2.1.1.8.1 Usage

a. Using mt-malloc:
LD_PRELOAD=libmtmalloc.so

b. Using libumem:
LD_PRELOAD=libumem.so

cc [ flag... ] file... -lumem [ library... ]

跟我的想法是一樣的，因為開矩陣還是需要很多時間的。

2007-08-16

Ethernet Codes: Vendor codes

Grid Topics

The Globus Alliance

gen()


#include 
#include 

#define         N1      5000
#define         N2      5000


void gen( int** a, int** b)
{
        int i, j;
        for (i = 0; i < N1; i++) {
                for (j = 0; j < N2; j++) {
                        srand( 0 );
                        a[i][j] = rand() % 5 + 1;
                        srand( 1 );
                        b[i][j] = rand() % 5 + 1;
                }
                //printf("i: %d\n", i);
        }
        pthread_exit(NULL);
}

void mul( int** a, int** b)
{
        int i, j, k;
        int c[N1][N2];

        for (i = 0; i < N1; i++) {
                for (j = 0;j < N2; j++) {
                        c[i][j] = 0;
                        for (k = 0; k < N1; k++)
                                c[i][j] += a[i][k] * b[k][j];
                        //printf("%d\t", c[i][j]);
                }
                //printf("\n");
        }
}

int main(int argc, char *argv[])
{
        int** a = NULL;
        int** b = NULL;
        int** c = NULL;

        a = (int**) malloc( sizeof(int) * N1 * N2);
        b = (int**) malloc( sizeof(int) * N1 * N2);
        c = (int**) malloc( sizeof(int) * N1 * N2);

        pthread_t* thread = NULL;
        thread = (pthread_t*) malloc( sizeof(pthread_t) * N1);
        int result;

        int i;
        for (i = 0; i < N1; i++) {
                a[i] = (int *) malloc( sizeof(int) * N2);
                b[i] = (int *) malloc( sizeof(int) * N2);
                c[i] = (int *) malloc( sizeof(int) * N2);
        }

        gen(a, b);

        return 0;
}

$ gcc m1.c && time ./a.out

real    3m55.660s
user    3m55.391s
sys     0m0.184s

This result without threading.

2007-08-13

About:config entries

2007-08-12


#include <pthread.h>
#include <stdio.h>

#define         N       10

void *test(void *c)
{
        printf("\tI am thread: %d\n", (int)c);
        pthread_exit(NULL);
}

int main()
{
        int i, result;
        pthread_t t[N];

        for(i = 0; i < N; i++) {
                result = pthread_create(&t[i], NULL, test, (void *)(i));
                if (result)
                        printf("Cannot do create.\n");
        }

        return 0;
}

出來每次結果都不同。 :( 需要明確地等待

BlueGene/L Apply Form

Using BG/L

Cluster Building

NCHC Formosa PC Cluster, 感覺親切。
研討會資料, HP 64bit Cluster
硬體資料
NCHC PC Cluster
國家高速網路與計算中心-PC Cluster 討論區
HIGH PERFORMANCE COMPUTING LAB. // 高效能計算實驗室 //, 東海大學

Reaching the Goal with the Regensburg Marathon-Cluster, Hubert Feyrer
High Performance Computing Training

Books,
*Beowulf Cluster Computing with Linux

*Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers

*An Introduction to Parallel Computing: Design and Analysis of Algorithms

Passing Multidimensional Arrays


#include <iostream>
using std::cout;

void print_m35(int m[3][5]);
void print_mi5(int m[][5], int dim1);
//void print_mij_(int m[][], int dim1, int dim2);
void print_mij(int* m, int dim1, int dim2);


void print_m35(int m[3][5])
{
        for (int i = 0; i < 3; i++) {
                for (int j = 0; j < 5; j++) cout << m[i][j] << '\t';
                cout << '\n';
        }
}

void print_mi5(int m[][5], int dim1)
{
        for (int i = 0; i < dim1; i++)
                for (int j = 0; j < 5; j++) cout << m[i][j] << '\t';
                cout << '\n';

}

/*
void print_mij_(int m[][], int dim1, int dim2)
{
        for (int i = 0; i < dim1; i++) {
                for (int j = 0; j < dim2; j++) cout << [i][j] << '\t';
                cout << '\n';
        }
}
*/

void print_mij(int* m, int dim1, int dim2)
{
        for (int i = 0; i < dim1; i++) {
                for (int j = 0; j < dim2; j++) cout << m[i * dim2 + j] << '\t';
                cout << '\n';
        }
}

int main()
{
        int v[3][5];
        for (int i = 0; i < 3; i++)
                for (int j = 0; j < 3; j++)
                        v[i][j] = j;

        print_m35(v);
        print_mi5(v, 3);
        print_mij(&v[0][0], 3, 5);

        return 0;
}

2007-08-11

libc-dev

I have installed the icc with stdlibc++.so.5 in my box, a box grow on 2.6 series kernel. And I also have the gcc-4.1.2 with libc6 libraries. Then one time, I'd like to debug with an executable file called "a.out", but I have no idea to do it. The gdb says that,

Failed to read a valid object file image from memory

When I run it during debug process.

As the "file" results:

a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), for GNU/Linux 2.4.1, not stripped

But actually, it should be like below,

a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), not stripped

I'm not dare to remove the _big_base_ libc6, but I reinstall the "libc-dev" then it returns back to my hoping status. :D

SunFire 15K (Solaris & Debian)

HCL for Solaris OS

Platform Group:   sun4u

Current Status, Debian on SPARC platform
Supported are Sun4m and Sun4u machines (with a 32-bit userland).

UltraLinux

gprof

Linux/Alpha or How to Make Your Applications Fly
Linux Performance Analysis Tools

[Author], David Mosberger

---
Useful Tools tutorial

gprof TIPS:

* Don't optimize before profiling
* Be careful not to optimize before profiling!
* Profile before you optimize.

hahaha~~~ :D I compiled it with -O3 options of icc and gcc. :D

2007-08-07

Apache+Subversion Recipe

subversion with lighty?
Apache+Subversion Recipe, Good boy!!

Linux to Power Google GPhone

HTC

Sent to you by Alan via Google Reader:

Linux to Power Google GPhone

via OSNews by donotreply@osnews.com (Eugenia Loli-Queru) on Aug 04, 2007

"Google's first mobile phone reportedly will run a Linux operating system on a Texas Instruments "Edge" chipset, and will likely ship to T-Mobile and Orange customers in the Spring of 2008. "GPhone" call minutes and text messages will apparently be funded by mobile advertising, according to reports." The report found at the popular embedded systems Linux news site LinuxDevices.

Things you can do from here:

Visit the original item
on OSNews
Subscribe to OSNews using Google Reader
Get started using Google Reader to easily keep up with all your favorite sites

2007-08-06

Matrix Transpose

I did ten thousand times multiplication with 800x800 matrix, each element is raged from one to five, integer type.

yrchen tole me a hint yesterday afternoon, that, the matrices should be tramsposed before did a huge compute. In today's experiment, I wrote two version codes, one doesn't has any transpose loop (called m1), the other does (called m1_t). Both them compiled with gcc, with -O3 optimization. The result is:

m1_t:
real 5m31.300s
user 5m31.245s
sys 0m0.020s

m1:
real 5m30.802s
user 5m30.265s
sys 0m0.008s

It looks like that -O3 did the transpose before has the huge matrices multiplication.


#include 
#include 

#define DIE     800

void gen( int a[][DIE]);
void mul( int a[][DIE], int b[][DIE]);

int main(int argc, char *argv[])
{
        int a[DIE][DIE];
        int b[DIE][DIE];

        int i;
        for (i = 0; i < 10000; i++) {
                srand(0);
                gen(a);
                srand(1);
                gen(b);
                mul(a, b);
        }

        return 0;
}

void gen( int a[][DIE] )
{
        int i, j;
        for (i = 0; i < DIE; i++)
                for (j = 0; j < DIE; j++)
                        a[i][j] = random() % 5 + 1;
}

void mul( int a[][DIE], int b[][DIE])
{
        int i, j, k;
        int c[DIE][DIE];

        //Transpose
        for (i = 0; i < DIE; i++)
                for (j = 0; j < DIE; j++)
                        mul[i][j] = b[j][i];

        for (i = 0; i < DIE; i++) {
                for (j = 0;j < DIE; j++) {
                        c[i][j] = 0;
                        for (k = 0; k < DIE; k++)
                                c[i][j] += a[i][k] * mul[i][k];
                        //printf("%d\t", c[i][j]);
                }
                //printf("\n");
        }
}

2007-08-04

Matrix Mutiplication Hint

先做轉置 (det)。

因為 CPU 會一次 fetch 1byte in each row, 而 matrix mul 是 col * row，所以可以直接把做一次 det(row) -> col，就能直接 fetch det(row) -> col。速度較快！

不要使用 recursive，因為 OS 要 handle 很多 stacks is a heavy-loading job。直接改用多個 whilie 把 huge-matrix 展開，開 threads 去分割成 strsen algo smaller-matrix.

這是 yrchen 給的 common hint.

2007-08-03

Barcode

GNU Barcode

In student card, it shows up "495G0047".

Venus spec

4 x 184pin DDR400+ 512

Now,

184pin DDR400 1GB, 3731

2007-08-02

MSDNAA FAQ

ics.uci.edu

Mirrors:
UC College of Business
http://ftp.sh.cvut.cz/MSDNAA/Rapid_Setup/
http://umcrookston.net:85/ The best
http://storage.siliconhill.cz/MSDNAA/Rapid_Setup/
http://imt.uni-paderborn.de/download/msdnaa/

EN_Windows_Embedded_CE_6_DVD.01.sdc

TODO:
評估校外開放下載的 .sdc 站台

訂閱：文章 (Atom)