2007-08-25
2007-08-22
You cannot modify the Hosts file or the Lmhosts file in Windows Vista
You cannot modify the Hosts file or the Lmhosts file in Windows Vista
WORKAROUND
To work around this issue, follow these steps:
1. Click Start Start button , click All Programs, click Accessories, right-click Notepad, and then click Run as administrator.
User Account Control permission If you are prompted for an administrator password or for a confirmation, type the password, or click Allow.
2. Open the Hosts file or the Lmhosts file, make the necessary changes, and then click Save on the Edit menu.
WORKAROUND
To work around this issue, follow these steps:
1. Click Start Start button , click All Programs, click Accessories, right-click Notepad, and then click Run as administrator.
User Account Control permission If you are prompted for an administrator password or for a confirmation, type the password, or click Allow.
2. Open the Hosts file or the Lmhosts file, make the necessary changes, and then click Save on the Edit menu.
2007-08-21
libmtsk.so
Forbestperformanceandfunctionality,makesurethatthelatestOpenMPruntimelibrary,
libmtsk.so,isinstalled on the running system.
How can I find out the libmtsk.so. :(
Haha~~ have the patch for old-school. :D
SunOS 5.9: Microtasking libraries (libmtsk) patch
libmtsk.so,isinstalled on the running system.
How can I find out the libmtsk.so. :(
Haha~~ have the patch for old-school. :D
SunOS 5.9: Microtasking libraries (libmtsk) patch
2007-08-20
Algorithm for square matrix multiplication
The trivial: O(n^3)
The Strassen: O(n^2.807)
The fastest*: O(n^2.376)
As the Strassen algorithm [ http://0rz.tw/bf2Xo ] mentioned:
The reduction in the number of multiplications however comes at the price of a somewhat reduced numeric stability.
* Coppersmith–Winograd algorithm
Reference:
*The Simultaneous Triple Product Property and Group-theoretic Results for the Exponent of Matrix Multiplication, arXiv:cs.CS/0703145
The Strassen: O(n^2.807)
The fastest*: O(n^2.376)
As the Strassen algorithm [ http://0rz.tw/bf2Xo ] mentioned:
The reduction in the number of multiplications however comes at the price of a somewhat reduced numeric stability.
* Coppersmith–Winograd algorithm
Reference:
*The Simultaneous Triple Product Property and Group-theoretic Results for the Exponent of Matrix Multiplication, arXiv:cs.CS/0703145
2007-08-19
Links in Today
Improving Application Efficiency Through Chip Multi-Threading
2.1.1.8 Multi-Threaded malloc
malloc and free are single-threaded operations and are among the bottlenecks for multi-threaded applications. A multi-threaded malloc scales with multi-threaded requests and can improve multi-threaded application performance. The Solaris OS has two types of multi-threaded malloc libraries, mt-malloc and umem.
2.1.1.8.1 Usage
a. Using mt-malloc:
LD_PRELOAD=libmtmalloc.so
b. Using libumem:
LD_PRELOAD=libumem.so
cc [ flag... ] file... -lumem [ library... ]
跟我的想法是一樣的,因為開矩陣還是需要很多時間的。
2.1.1.8 Multi-Threaded malloc
malloc and free are single-threaded operations and are among the bottlenecks for multi-threaded applications. A multi-threaded malloc scales with multi-threaded requests and can improve multi-threaded application performance. The Solaris OS has two types of multi-threaded malloc libraries, mt-malloc and umem.
2.1.1.8.1 Usage
a. Using mt-malloc:
LD_PRELOAD=libmtmalloc.so
b. Using libumem:
LD_PRELOAD=libumem.so
cc [ flag... ] file... -lumem [ library... ]
跟我的想法是一樣的,因為開矩陣還是需要很多時間的。
2007-08-16
gen()
#include
#include
#define N1 5000
#define N2 5000
void gen( int** a, int** b)
{
int i, j;
for (i = 0; i < N1; i++) {
for (j = 0; j < N2; j++) {
srand( 0 );
a[i][j] = rand() % 5 + 1;
srand( 1 );
b[i][j] = rand() % 5 + 1;
}
//printf("i: %d\n", i);
}
pthread_exit(NULL);
}
void mul( int** a, int** b)
{
int i, j, k;
int c[N1][N2];
for (i = 0; i < N1; i++) {
for (j = 0;j < N2; j++) {
c[i][j] = 0;
for (k = 0; k < N1; k++)
c[i][j] += a[i][k] * b[k][j];
//printf("%d\t", c[i][j]);
}
//printf("\n");
}
}
int main(int argc, char *argv[])
{
int** a = NULL;
int** b = NULL;
int** c = NULL;
a = (int**) malloc( sizeof(int) * N1 * N2);
b = (int**) malloc( sizeof(int) * N1 * N2);
c = (int**) malloc( sizeof(int) * N1 * N2);
pthread_t* thread = NULL;
thread = (pthread_t*) malloc( sizeof(pthread_t) * N1);
int result;
int i;
for (i = 0; i < N1; i++) {
a[i] = (int *) malloc( sizeof(int) * N2);
b[i] = (int *) malloc( sizeof(int) * N2);
c[i] = (int *) malloc( sizeof(int) * N2);
}
gen(a, b);
return 0;
}
$ gcc m1.c && time ./a.out
real 3m55.660s
user 3m55.391s
sys 0m0.184s
This result without threading.
2007-08-13
2007-08-12
#include <pthread.h>
#include <stdio.h>
#define N 10
void *test(void *c)
{
printf("\tI am thread: %d\n", (int)c);
pthread_exit(NULL);
}
int main()
{
int i, result;
pthread_t t[N];
for(i = 0; i < N; i++) {
result = pthread_create(&t[i], NULL, test, (void *)(i));
if (result)
printf("Cannot do create.\n");
}
return 0;
}
出來每次結果都不同。 :( 需要明確地等待
Cluster Building
NCHC Formosa PC Cluster, 感覺親切。
研討會資料, HP 64bit Cluster
硬體資料
NCHC PC Cluster
國家高速網路與計算中心-PC Cluster 討論區
HIGH PERFORMANCE COMPUTING LAB. // 高效能計算實驗室 //, 東海大學
Reaching the Goal with the Regensburg Marathon-Cluster, Hubert Feyrer
High Performance Computing Training
Books,
*Beowulf Cluster Computing with Linux
*Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers
*An Introduction to Parallel Computing: Design and Analysis of Algorithms
研討會資料, HP 64bit Cluster
硬體資料
NCHC PC Cluster
國家高速網路與計算中心-PC Cluster 討論區
HIGH PERFORMANCE COMPUTING LAB. // 高效能計算實驗室 //, 東海大學
Reaching the Goal with the Regensburg Marathon-Cluster, Hubert Feyrer
High Performance Computing Training
Books,
*Beowulf Cluster Computing with Linux
*Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers
*An Introduction to Parallel Computing: Design and Analysis of Algorithms
Passing Multidimensional Arrays
#include <iostream>
using std::cout;
void print_m35(int m[3][5]);
void print_mi5(int m[][5], int dim1);
//void print_mij_(int m[][], int dim1, int dim2);
void print_mij(int* m, int dim1, int dim2);
void print_m35(int m[3][5])
{
for (int i = 0; i < 3; i++) {
for (int j = 0; j < 5; j++) cout << m[i][j] << '\t';
cout << '\n';
}
}
void print_mi5(int m[][5], int dim1)
{
for (int i = 0; i < dim1; i++)
for (int j = 0; j < 5; j++) cout << m[i][j] << '\t';
cout << '\n';
}
/*
void print_mij_(int m[][], int dim1, int dim2)
{
for (int i = 0; i < dim1; i++) {
for (int j = 0; j < dim2; j++) cout << [i][j] << '\t';
cout << '\n';
}
}
*/
void print_mij(int* m, int dim1, int dim2)
{
for (int i = 0; i < dim1; i++) {
for (int j = 0; j < dim2; j++) cout << m[i * dim2 + j] << '\t';
cout << '\n';
}
}
int main()
{
int v[3][5];
for (int i = 0; i < 3; i++)
for (int j = 0; j < 3; j++)
v[i][j] = j;
print_m35(v);
print_mi5(v, 3);
print_mij(&v[0][0], 3, 5);
return 0;
}
2007-08-11
libc-dev
I have installed the icc with stdlibc++.so.5 in my box, a box grow on 2.6 series kernel. And I also have the gcc-4.1.2 with libc6 libraries. Then one time, I'd like to debug with an executable file called "a.out", but I have no idea to do it. The gdb says that,
Failed to read a valid object file image from memory
When I run it during debug process.
As the "file" results:
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), for GNU/Linux 2.4.1, not stripped
But actually, it should be like below,
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), not stripped
I'm not dare to remove the _big_base_ libc6, but I reinstall the "libc-dev" then it returns back to my hoping status. :D
Failed to read a valid object file image from memory
When I run it during debug process.
As the "file" results:
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), for GNU/Linux 2.4.1, not stripped
But actually, it should be like below,
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), not stripped
I'm not dare to remove the _big_base_ libc6, but I reinstall the "libc-dev" then it returns back to my hoping status. :D
SunFire 15K (Solaris & Debian)
HCL for Solaris OS
Current Status, Debian on SPARC platform
Supported are Sun4m and Sun4u machines (with a 32-bit userland).
UltraLinux
Platform Group: sun4u
Current Status, Debian on SPARC platform
Supported are Sun4m and Sun4u machines (with a 32-bit userland).
UltraLinux
gprof
Linux/Alpha or How to Make Your Applications Fly
Linux Performance Analysis Tools
[Author], David Mosberger
---
Useful Tools tutorial
gprof TIPS:
* Don't optimize before profiling
* Be careful not to optimize before profiling!
* Profile before you optimize.
hahaha~~~ :D I compiled it with -O3 options of icc and gcc. :D
Linux Performance Analysis Tools
[Author], David Mosberger
---
Useful Tools tutorial
gprof TIPS:
* Don't optimize before profiling
* Be careful not to optimize before profiling!
* Profile before you optimize.
hahaha~~~ :D I compiled it with -O3 options of icc and gcc. :D
2007-08-07
Linux to Power Google GPhone
HTC
"Google's first mobile phone reportedly will run a Linux operating system on a Texas Instruments "Edge" chipset, and will likely ship to T-Mobile and Orange customers in the Spring of 2008. "GPhone" call minutes and text messages will apparently be funded by mobile advertising, according to reports." The report found at the popular embedded systems Linux news site LinuxDevices.
Sent to you by Alan via Google Reader:
via OSNews by donotreply@osnews.com (Eugenia Loli-Queru) on Aug 04, 2007
"Google's first mobile phone reportedly will run a Linux operating system on a Texas Instruments "Edge" chipset, and will likely ship to T-Mobile and Orange customers in the Spring of 2008. "GPhone" call minutes and text messages will apparently be funded by mobile advertising, according to reports." The report found at the popular embedded systems Linux news site LinuxDevices.
Things you can do from here:
- on OSNews
- Subscribe to OSNews using Google Reader
- Get started using Google Reader to easily keep up with all your favorite sites
2007-08-06
Matrix Transpose
I did ten thousand times multiplication with 800x800 matrix, each element is raged from one to five, integer type.
yrchen tole me a hint yesterday afternoon, that, the matrices should be tramsposed before did a huge compute. In today's experiment, I wrote two version codes, one doesn't has any transpose loop (called m1), the other does (called m1_t). Both them compiled with gcc, with -O3 optimization. The result is:
m1_t:
real 5m31.300s
user 5m31.245s
sys 0m0.020s
m1:
real 5m30.802s
user 5m30.265s
sys 0m0.008s
It looks like that -O3 did the transpose before has the huge matrices multiplication.
yrchen tole me a hint yesterday afternoon, that, the matrices should be tramsposed before did a huge compute. In today's experiment, I wrote two version codes, one doesn't has any transpose loop (called m1), the other does (called m1_t). Both them compiled with gcc, with -O3 optimization. The result is:
m1_t:
real 5m31.300s
user 5m31.245s
sys 0m0.020s
m1:
real 5m30.802s
user 5m30.265s
sys 0m0.008s
It looks like that -O3 did the transpose before has the huge matrices multiplication.
#include
#include
#define DIE 800
void gen( int a[][DIE]);
void mul( int a[][DIE], int b[][DIE]);
int main(int argc, char *argv[])
{
int a[DIE][DIE];
int b[DIE][DIE];
int i;
for (i = 0; i < 10000; i++) {
srand(0);
gen(a);
srand(1);
gen(b);
mul(a, b);
}
return 0;
}
void gen( int a[][DIE] )
{
int i, j;
for (i = 0; i < DIE; i++)
for (j = 0; j < DIE; j++)
a[i][j] = random() % 5 + 1;
}
void mul( int a[][DIE], int b[][DIE])
{
int i, j, k;
int c[DIE][DIE];
//Transpose
for (i = 0; i < DIE; i++)
for (j = 0; j < DIE; j++)
mul[i][j] = b[j][i];
for (i = 0; i < DIE; i++) {
for (j = 0;j < DIE; j++) {
c[i][j] = 0;
for (k = 0; k < DIE; k++)
c[i][j] += a[i][k] * mul[i][k];
//printf("%d\t", c[i][j]);
}
//printf("\n");
}
}
2007-08-04
Matrix Mutiplication Hint
先做轉置 (det)。
因為 CPU 會一次 fetch 1byte in each row, 而 matrix mul 是 col * row,所以可以直接把做一次 det(row) -> col,就能直接 fetch det(row) -> col。速度較快!
不要使用 recursive,因為 OS 要 handle 很多 stacks is a heavy-loading job。直接改用多個 whilie 把 huge-matrix 展開,開 threads 去分割成 strsen algo smaller-matrix.
這是 yrchen 給的 common hint.
因為 CPU 會一次 fetch 1byte in each row, 而 matrix mul 是 col * row,所以可以直接把做一次 det(row) -> col,就能直接 fetch det(row) -> col。速度較快!
不要使用 recursive,因為 OS 要 handle 很多 stacks is a heavy-loading job。直接改用多個 whilie 把 huge-matrix 展開,開 threads 去分割成 strsen algo smaller-matrix.
這是 yrchen 給的 common hint.
2007-08-03
訂閱:
文章 (Atom)