2012-08-14 29 views
6

Estoy tratando de crear un espejo local de LinkedGeoData.org desde this dump.cargando rdf se triplica en virtuoso opensource

Eso es alrededor de 61,000,000 de triples. Se supone que Virtuoso maneja fácilmente mucho más que eso, pero cada vez que deja de cargarse después de alrededor de 40,000,000 de triples. Estoy usando una instancia doble extra grande de Amazon EC2 que tiene 30 GB de RAM, con mucho espacio de almacenamiento también. ¿Hay algún problema con mi archivo de configuración? Estoy usando el servidor Ubuntu 12.04, y he intentado instalar Virtuoso a través de apt-get (versión 6.1.5) y compilado a partir de la última fuente estable de github (versión 6.1.6) después de Jörn Hees' instructions.

También he intentado dividir el archivo de volcado en piezas más pequeñas y cargarlas una a una. Esto también se rompe después de que se han insertado alrededor de 40,000,000 de triples.

El archivo de registro no muestra nada extraño; virtuoso-t simplemente deja de funcionar sin realmente fallar, y top muestra el proceso usando 0% de la CPU. Dejé el proceso en funcionamiento durante varios días sin ningún progreso después de la primera media hora más o menos.

Aquí es mi virtuoso.ini archivo:

[Database] 
DatabaseFile   = /var/lib/virtuoso/db/virtuoso.db 
ErrorLogFile   = /var/lib/virtuoso/db/virtuoso.log 
LockFile   = /var/lib/virtuoso/db/virtuoso.lck 
TransactionFile   = /var/lib/virtuoso/db/virtuoso.trx 
xa_persistent_file  = /var/lib/virtuoso/db/virtuoso.pxa 
ErrorLogLevel   = 7 
FileExtend   = 200 
MaxCheckpointRemap  = 625000 
Striping   = 0 
TempStorage   = TempDatabase 


[TempDatabase] 
DatabaseFile   = /var/lib/virtuoso/db/virtuoso-temp.db 
TransactionFile   = /var/lib/virtuoso/db/virtuoso-temp.trx 
MaxCheckpointRemap  = 2000 
Striping   = 0 


; 
; Server parameters 
; 
[Parameters] 
ServerPort   = 1111 
LiteMode   = 0 
DisableUnixSocket  = 1 
DisableTcpSocket  = 0 
;SSLServerPort   = 2111 
;SSLCertificate   = cert.pem 
;SSLPrivateKey   = pk.pem 
;X509ClientVerify  = 0 
;X509ClientVerifyDepth  = 0 
;X509ClientVerifyCAFile  = ca.pem 
ServerThreads   = 20 
CheckpointInterval  = 60 
O_DIRECT   = 0 
CaseMode   = 2 
MaxStaticCursorRows  = 5000 
CheckpointAuditTrail  = 0 
AllowOSCalls   = 0 
SchedulerInterval  = 10 
DirsAllowed   = ., /usr/share/virtuoso/vad, /home/ubuntu/lgd 
ThreadCleanupInterval  = 0 
ThreadThreshold   = 10 
ResourcesCleanupInterval = 0 
FreeTextBatchSize  = 100000 
SingleCPU   = 0 
VADInstallDir   = /usr/share/virtuoso/vad/ 
PrefixResultNames    = 0 
RdfFreeTextRulesSize  = 100 
IndexTreeMaps   = 256 
MaxMemPoolSize     = 200000000 
PrefixResultNames    = 0 
MacSpotlight     = 0 
IndexTreeMaps     = 64 
;; 
;; When running with large data sets, one should configure the Virtuoso 
;; process to use between 2/3 to 3/5 of free system memory and to stripe 
;; storage on all available disks. 
;; 
;; Uncomment next two lines if there is 2 GB system memory free 
;  NumberOfBuffers   = 170000 
;  MaxDirtyBuffers   = 130000 
;; Uncomment next two lines if there is 4 GB system memory free 
;  NumberOfBuffers   = 340000 
;  MaxDirtyBuffers   = 250000 
;; Uncomment next two lines if there is 8 GB system memory free 
;  NumberOfBuffers   = 680000 
;  MaxDirtyBuffers   = 500000 
;; Uncomment next two lines if there is 16 GB system memory free 
;  NumberOfBuffers   = 1360000 
;  MaxDirtyBuffers   = 1000000 
;; Uncomment next two lines if there is 32 GB system memory free 
     NumberOfBuffers   = 2720000 
     MaxDirtyBuffers   = 2000000 
;; Uncomment next two lines if there is 48 GB system memory free 
;  NumberOfBuffers   = 4000000 
;  MaxDirtyBuffers   = 3000000 
;; Uncomment next two lines if there is 64 GB system memory free 
;  NumberOfBuffers   = 5450000 
;  MaxDirtyBuffers   = 4000000 
;; 
;; Note the default settings will take very little memory 
;; but will not result in very good performance 
;; 


[HTTPServer] 
ServerPort   = 8890 
ServerRoot   = /var/lib/virtuoso/vsp 
ServerThreads   = 20 
DavRoot    = DAV 
EnabledDavVSP   = 0 
HTTPProxyEnabled  = 0 
TempASPXDir   = 0 
DefaultMailServer  = localhost:25 
ServerThreads   = 10 
MaxKeepAlives   = 10 
KeepAliveTimeout  = 10 
MaxCachedProxyConnections = 10 
ProxyConnectionCacheTimeout = 15 
HTTPThreadSize   = 280000 
HttpPrintWarningsInOutput = 0 
Charset    = UTF-8 
;HTTPLogFile    = logs/http.log 

[AutoRepair] 
BadParentLinks   = 0 

[Client] 
SQL_PREFETCH_ROWS  = 100 
SQL_PREFETCH_BYTES  = 16000 
SQL_QUERY_TIMEOUT  = 0 
SQL_TXN_TIMEOUT   = 0 
;SQL_NO_CHAR_C_ESCAPE  = 1 
;SQL_UTF8_EXECS   = 0 
;SQL_NO_SYSTEM_TABLES  = 0 
;SQL_BINARY_TIMESTAMP  = 1 
;SQL_ENCRYPTION_ON_PASSWORD = -1 

[VDB] 
ArrayOptimization  = 0 
NumArrayParameters  = 10 
VDBDisconnectTimeout  = 1000 
KeepConnectionOnFixedThread = 0 

[Replication] 
ServerName   = db-IP-10-252-61-61 
ServerEnable   = 1 
QueueMax   = 50000 


; 
; Striping setup 
; 
; These parameters have only effect when Striping is set to 1 in the 
; [Database] section, in which case the DatabaseFile parameter is ignored. 
; 
; With striping, the database is spawned across multiple segments 
; where each segment can have multiple stripes. 
; 
; Format of the lines below: 
; Segment<number> = <size>, <stripe file name> [, <stripe file name> .. ] 
; 
; <number> must be ordered from 1 up. 
; 
; The <size> is the total size of the segment which is equally divided 
; across all stripes forming the segment. Its specification can be in 
; gigabytes (g), megabytes (m), kilobytes (k) or in database blocks 
; (b, the default) 
; 
; Note that the segment size must be a multiple of the database page size 
; which is currently 8k. Also, the segment size must be divisible by the 
; number of stripe files forming the segment. 
; 
; The example below creates a 200 meg database striped on two segments 
; with two stripes of 50 meg and one of 100 meg. 
; 
; You can always add more segments to the configuration, but once 
; added, do not change the setup. 
; 
[Striping] 
Segment1   = 100M, db-seg1-1.db, db-seg1-2.db 
Segment2   = 100M, db-seg2-1.db 
;... 

;[TempStriping] 
;Segment1   = 100M, db-seg1-1.db, db-seg1-2.db 
;Segment2   = 100M, db-seg2-1.db 
;... 

;[Ucms] 
;UcmPath   = <path> 
;Ucm1    = <file> 
;Ucm2    = <file> 
;... 


[Zero Config] 
ServerName   = virtuoso (IP-10-252-61-61) 
;ServerDSN   = ZDSN 
;SSLServerName   = 
;SSLServerDSN   = 


[Mono] 
;MONO_TRACE   = Off 
;MONO_PATH   = <path_here> 
;MONO_ROOT   = <path_here> 
;MONO_CFG_DIR   = <path_here> 
;virtclr.dll   = 


[URIQA] 
DynamicLocal   = 0 
DefaultHost   = localhost:8890 


[SPARQL] 
;ExternalQuerySource  = 1 
;ExternalXsltSource   = 1 
;DefaultGraph   = http://localhost:8890/dataspace 
;ImmutableGraphs   = http://localhost:8890/dataspace 
ResultSetMaxRows   = 10000 
MaxQueryCostEstimationTime = 4000 ; in seconds 
MaxQueryExecutionTime  = 600 ; in seconds 
DefaultQuery    = select distinct ?Concept where {[] a ?Concept} LIMIT 100 
DeferInferenceRulesInit  = 0 ; controls inference rules loading 
;PingService   = http://rpc.pingthesemanticweb.com/ 
ShortenLongURIs = 1 

[Plugins] 
LoadPath   = /usr/lib/virtuoso/hosting 
Load1    = plain, wikiv 
Load2    = plain, mediawiki 
Load3    = plain, creolewiki 
Load4   = plain, im 

Cualquier ayuda es muy apreciada.

+0

Para beneficio de los futuros lectores ... Jörn ha actualizado su guía algunas veces. [Última fecha es 2015-11-23, y se basa en Virtuoso 7.2.1 y DBpedia 2015] (https://joernhees.de/blog/2015/11/23/setting-up-a-linked-data-mirror -from-rdf-dumps-dbpedia-2015-04-freebase-wikidata-linkedgeodata-with-virtuso-7-2-1-and-docker-optional /). – TallTed

+0

También tenga en cuenta que las preguntas específicas de Virtuoso a menudo se responden más rápidamente a través de recursos específicos del producto, como la lista de correo [Usuarios Virtuoso] (https://lists.sourceforge.net/lists/listinfo/virtuoso-users/), el público [Foros de soporte de OpenLink] (http://boards.openlinksw.com/support/index.php), o un [caso de soporte de OpenLink confidencial] (http://support.openlinksw.com/support/online-support.vsp) . ObDisclaimer: trabajo para [OpenLink Software] (http://www.openlinksw.com/), productor de [Virtuoso] (http://virtuoso.openlinksw.com/). – TallTed

Respuesta

4

Respondiendo a mi propia pregunta. El problema eran los espacios iniciales en las líneas

NumberOfBuffers   = 2720000 
    MaxDirtyBuffers   = 2000000 

Eliminación de los, Virtuoso utiliza realmente la memoria disponible en lugar de los 16 MB por defecto.