en mi publicación anterior Necesitaba distribuir datos de archivos pgm entre 10 computadoras. Con la ayuda de Jonathan Dursi y Shawn Chin, he integrado el código. Puedo compilar mi programa pero tiene un error de segmentación. Corrí pero nada ocurreLas fallas de segmentación ocurren cuando ejecuto un programa paralelo con Open MPI
mpirun -np 10 ./exmpi_2 balloons.pgm output.pgm
El resultado es
[ubuntu:04803] *** Process received signal ***
[ubuntu:04803] Signal: Segmentation fault (11)
[ubuntu:04803] Signal code: Address not mapped (1)
[ubuntu:04803] Failing at address: 0x7548d0c
[ubuntu:04803] [ 0] [0x86b410]
[ubuntu:04803] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x186b00]
[ubuntu:04803] [ 2] ./exmpi_2(main+0x78e) [0x80492c2]
[ubuntu:04803] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x141bd6]
[ubuntu:04803] [ 4] ./exmpi_2() [0x8048aa1]
[ubuntu:04803] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 4803 on node ubuntu exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
entonces intento correr con valgrind para depurar el programa y se genera el output.pgm
valgrind mpirun -np 10 ./exmpi_2 balloons.pgm output.pgm
El resultado es
==4632== Memcheck, a memory error detector
==4632== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==4632== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==4632== Command: mpirun -np 10 ./exmpi_2 2.pgm 10.pgm
==4632==
==4632== Syscall param sched_setaffinity(mask) points to unaddressable byte(s)
==4632== at 0x4215D37: syscall (syscall.S:31)
==4632== by 0x402B335: opal_paffinity_linux_plpa_api_probe_init (plpa_api_probe.c:56)
==4632== by 0x402B7CC: opal_paffinity_linux_plpa_init (plpa_runtime.c:37)
==4632== by 0x402B93C: opal_paffinity_linux_plpa_have_topology_information (plpa_map.c:494)
==4632== by 0x402B180: linux_module_init (paffinity_linux_module.c:119)
==4632== by 0x40BE2C3: opal_paffinity_base_select (paffinity_base_select.c:64)
==4632== by 0x40927AC: opal_init (opal_init.c:295)
==4632== by 0x4046767: orte_init (orte_init.c:76)
==4632== by 0x804A82E: orterun (orterun.c:540)
==4632== by 0x804A3EE: main (main.c:13)
==4632== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==4632==
[ubuntu:04638] *** Process received signal ***
[ubuntu:04639] *** Process received signal ***
[ubuntu:04639] Signal: Segmentation fault (11)
[ubuntu:04639] Signal code: Address not mapped (1)
[ubuntu:04639] Failing at address: 0x7548d0c
[ubuntu:04639] [ 0] [0xc50410]
[ubuntu:04639] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0xde4b00]
[ubuntu:04639] [ 2] ./exmpi_2(main+0x78e) [0x80492c2]
[ubuntu:04639] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xd9fbd6]
[ubuntu:04639] [ 4] ./exmpi_2() [0x8048aa1]
[ubuntu:04639] *** End of error message ***
[ubuntu:04640] *** Process received signal ***
[ubuntu:04640] Signal: Segmentation fault (11)
[ubuntu:04640] Signal code: Address not mapped (1)
[ubuntu:04640] Failing at address: 0x7548d0c
[ubuntu:04640] [ 0] [0xdad410]
[ubuntu:04640] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0xe76b00]
[ubuntu:04640] [ 2] ./exmpi_2(main+0x78e) [0x80492c2]
[ubuntu:04640] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xe31bd6]
[ubuntu:04640] [ 4] ./exmpi_2() [0x8048aa1]
[ubuntu:04640] *** End of error message ***
[ubuntu:04641] *** Process received signal ***
[ubuntu:04641] Signal: Segmentation fault (11)
[ubuntu:04641] Signal code: Address not mapped (1)
[ubuntu:04641] Failing at address: 0x7548d0c
[ubuntu:04641] [ 0] [0xe97410]
[ubuntu:04641] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x1e8b00]
[ubuntu:04641] [ 2] ./exmpi_2(main+0x78e) [0x80492c2]
[ubuntu:04641] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x1a3bd6]
[ubuntu:04641] [ 4] ./exmpi_2() [0x8048aa1]
[ubuntu:04641] *** End of error message ***
[ubuntu:04642] *** Process received signal ***
[ubuntu:04642] Signal: Segmentation fault (11)
[ubuntu:04642] Signal code: Address not mapped (1)
[ubuntu:04642] Failing at address: 0x7548d0c
[ubuntu:04642] [ 0] [0x92d410]
[ubuntu:04642] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x216b00]
[ubuntu:04642] [ 2] ./exmpi_2(main+0x78e) [0x80492c2]
[ubuntu:04642] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x1d1bd6]
[ubuntu:04642] [ 4] ./exmpi_2() [0x8048aa1]
[ubuntu:04642] *** End of error message ***
[ubuntu:04643] *** Process received signal ***
[ubuntu:04643] Signal: Segmentation fault (11)
[ubuntu:04643] Signal code: Address not mapped (1)
[ubuntu:04643] Failing at address: 0x7548d0c
[ubuntu:04643] [ 0] [0x8f4410]
[ubuntu:04643] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x16bb00]
[ubuntu:04643] [ 2] ./exmpi_2(main+0x78e) [0x80492c2]
[ubuntu:04643] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x126bd6]
[ubuntu:04643] [ 4] ./exmpi_2() [0x8048aa1]
[ubuntu:04643] *** End of error message ***
[ubuntu:04638] Signal: Segmentation fault (11)
[ubuntu:04638] Signal code: Address not mapped (1)
[ubuntu:04638] Failing at address: 0x7548d0c
[ubuntu:04638] [ 0] [0x4f6410]
[ubuntu:04638] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x222b00]
[ubuntu:04638] [ 2] ./exmpi_2(main+0x78e) [0x80492c2]
[ubuntu:04638] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x1ddbd6]
[ubuntu:04638] [ 4] ./exmpi_2() [0x8048aa1]
[ubuntu:04638] *** End of error message ***
[ubuntu:04644] *** Process received signal ***
[ubuntu:04644] Signal: Segmentation fault (11)
[ubuntu:04644] Signal code: Address not mapped (1)
[ubuntu:04644] Failing at address: 0x7548d0c
[ubuntu:04644] [ 0] [0x61f410]
[ubuntu:04644] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x1a3b00]
[ubuntu:04644] [ 2] ./exmpi_2(main+0x78e) [0x80492c2]
[ubuntu:04644] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x15ebd6]
[ubuntu:04644] [ 4] ./exmpi_2() [0x8048aa1]
[ubuntu:04644] *** End of error message ***
[ubuntu:04645] *** Process received signal ***
[ubuntu:04645] Signal: Segmentation fault (11)
[ubuntu:04645] Signal code: Address not mapped (1)
[ubuntu:04645] Failing at address: 0x7548d0c
[ubuntu:04645] [ 0] [0x7a3410]
[ubuntu:04645] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x1d5b00]
[ubuntu:04645] [ 2] ./exmpi_2(main+0x78e) [0x80492c2]
[ubuntu:04645] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x190bd6]
[ubuntu:04645] [ 4] ./exmpi_2() [0x8048aa1]
[ubuntu:04645] *** End of error message ***
[ubuntu:04647] *** Process received signal ***
[ubuntu:04647] Signal: Segmentation fault (11)
[ubuntu:04647] Signal code: Address not mapped (1)
[ubuntu:04647] Failing at address: 0x7548d0c
[ubuntu:04647] [ 0] [0xf54410]
[ubuntu:04647] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x2bab00]
[ubuntu:04647] [ 2] ./exmpi_2(main+0x78e) [0x80492c2]
[ubuntu:04647] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x275bd6]
[ubuntu:04647] [ 4] ./exmpi_2() [0x8048aa1]
[ubuntu:04647] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 4639 on node ubuntu exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
6 total processes killed (some possibly by mpirun during cleanup)
==4632==
==4632== HEAP SUMMARY:
==4632== in use at exit: 158,751 bytes in 1,635 blocks
==4632== total heap usage: 10,443 allocs, 8,808 frees, 15,854,537 bytes allocated
==4632==
==4632== LEAK SUMMARY:
==4632== definitely lost: 81,655 bytes in 112 blocks
==4632== indirectly lost: 5,108 bytes in 91 blocks
==4632== possibly lost: 1,043 bytes in 17 blocks
==4632== still reachable: 70,945 bytes in 1,415 blocks
==4632== suppressed: 0 bytes in 0 blocks
==4632== Rerun with --leak-check=full to see details of leaked memory
==4632==
==4632== For counts of detected and suppressed errors, rerun with: -v
==4632== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 96 from 9)
Podría alguien ayudarme a resolver este problema. Este es mi código fuente
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "mpi.h"
#include <syscall.h>
#define SIZE_X 640
#define SIZE_Y 480
int main(int argc, char **argv)
{
FILE *FR,*FW;
int ierr;
int rank, size;
int ncells;
int greys[SIZE_X][SIZE_Y];
int rows,cols, maxval;
int mystart, myend, myncells;
const int IONODE=0;
int *disps, *counts, *mydata;
int *data;
int i,j,temp1;
char dummy[50]="";
ierr = MPI_Init(&argc, &argv);
if (argc != 3) {
fprintf(stderr,"Usage: %s infile outfile\n",argv[0]);
fprintf(stderr,"outputs the negative of the input file.\n");
return -1;
}
ierr = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
ierr = MPI_Comm_size(MPI_COMM_WORLD, &size);
if (ierr) {
fprintf(stderr,"Catastrophic MPI problem; exiting\n");
MPI_Abort(MPI_COMM_WORLD,1);
}
if (rank == IONODE) {
//if (read_pgm(argv[1], &greys, &rows, &cols, &maxval)) {
// fprintf(stderr,"Could not read file; exiting\n");
// MPI_Abort(MPI_COMM_WORLD,2);
rows=SIZE_X;
cols=SIZE_Y;
maxval=255;
FR=fopen(argv[1], "r+");
fgets(dummy,50,FR);
do{ fgets(dummy,50,FR); } while(dummy[0]=='#');
fgets(dummy,50,FR);
for (j = 0; j <cols; j++)
{
for (i = 0; i <rows; i++)
{
fscanf(FR,"%d",&temp1);
greys[i][j] = temp1;
}
}
}
ncells = rows*cols;
disps = (int *)malloc(size * sizeof(int));
counts= (int *)malloc(size * sizeof(int));
data = &(greys[0][0]); /* we know all the data is contiguous */
/* everyone calculate their number of cells */
ierr = MPI_Bcast(&ncells, 1, MPI_INT, IONODE, MPI_COMM_WORLD);
myncells = ncells/size;
mystart = rank*myncells;
myend = mystart + myncells - 1;
if (rank == size-1) myend = ncells-1;
myncells = (myend-mystart)+1;
mydata = (int *)malloc(myncells * sizeof(int));
/* assemble the list of counts. Might not be equal if don't divide evenly. */
ierr = MPI_Gather(&myncells, 1, MPI_INT, counts, 1, MPI_INT, IONODE, MPI_COMM_WORLD);
if (rank == IONODE) {
disps[0] = 0;
for (i=1; i<size; i++) {
disps[i] = disps[i-1] + counts[i-1];
}
}
/* scatter the data */
ierr = MPI_Scatterv(data, counts, disps, MPI_INT, mydata, myncells, MPI_INT, IONODE, MPI_COMM_WORLD);
/* everyone has to know maxval */
ierr = MPI_Bcast(&maxval, 1, MPI_INT, IONODE, MPI_COMM_WORLD);
for (i=0; i<myncells; i++)
mydata[i] = maxval-mydata[i];
/* Gather the data */
ierr = MPI_Gatherv(mydata, myncells, MPI_INT, data, counts, disps, MPI_INT, IONODE, MPI_COMM_WORLD);
if (rank == IONODE)
{
// write_pgm(argv[2], greys, rows, cols, maxval);
FW=fopen(argv[2], "w");
fprintf(FW,"P2\n%d %d\n255\n",rows,cols);
for(j=0;j<cols;j++)
for(i=0;i<rows;i++)
fprintf(FW,"%d ", greys[i][j]);
}
free(mydata);
if (rank == IONODE) {
free(counts);
free(disps);
//free(&(greys[0][0]));
//free(greys);
}
fclose(FR);
fclose(FW);
MPI_Finalize();
return 0;
}
Ésta es la imagen de entrada http://orion.math.iastate.edu/burkardt/data/pgm/balloons.pgm
¿Qué línea está dando la segfault? – suszterpatt