2011-02-13 8 views

Respuesta

4

EDIT: Sin embargo, más ajustes.

Dependiendo precisamente lo que se ve la entrada como, esto funciona para ASCII:

(?<! [:\s]) \s* (["']) (?: (?! \1) .)+ \1 

para “ 'coincidencia' Unicode comillas”, que tiene que ser un poco más en sus parejas, tal vez junto estas líneas:

(?xs) (?<!:) \s+ 
    (?: (["']) (?: (?! \1) .)+ \1 
    | “ .*? ” # English etc 
    | ‘ .*? ’ 
    | « .*? » # French, Spanish, Italian 
    | ‹ .*? › 
    | „ .*? “ # German, Icelandic, Romanian 
    | ‚ .*? ‘ 
    | „ .?* ” # Hungarian 
    | ” .?* ” # Swedish 
    | ’ .?* ’  
    | » .?* « # Danish, Hungarian 
    | › .*? ‹ 
    | 「 .*? 」 # Japanese, Chinese 
    | 『 .?* 』 
) 

Usted puede leer más acerca de los tipos de comillas pares utilizados por diversos idiomas here.

Aquí es un programa de prueba en Perl, pero los principios deben sostener perfectamente en Ruby:

#!/usr/bin/perl 
use strict; 
use warnings; 
use utf8; 
use open qw[ :std IO :utf8 ]; 
while (<DATA>) { 
    print if/(?<! [:\s]) \s* (["']) (?: (?! \1) .)+ \1/sx; 
} 
__END__ 
"Take off, hoser!" 
Dorothy Parker:Brevity is the soul of lingerie. 
Dorothy Parker:"Brevity is the soul of lingerie." 
Dorothy Parker: "Brevity is the soul of lingerie." 
Dorothy Parker: "Brevity is the soul of lingerie." 
Larry Wall: I don't know if it's what you want, but it's what you get. :-) 
Larry Wall said, "I don't know if it's what you want, but it's what you get. :-)" 
Larry Wall said: “I don't know if it's what you want, but it’s what you get. :-)” 
Larry Wall said: “I don't know if it's what you want, but it’s what you get. :-)” 
Larry Wall said, “I don't know if it's what you want, but it's what you get. :-)” 
Boss: And what's that "goto" doing there?!? 
Hacker: Er, I guess my finger slipped when I was typing "getservbyport"... 
‘Nevermore!’ quoth the raven. 
Quoth the raven: ‘Nevermore!’ 
'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent. 
src/perl/mg.c: "I wish I had never come here, and I don't want to see no more magic," he said, and fell silent. 
src/perl/mg.c: 'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent. 
src/perl/mg.c => "I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent." 
‘I wish I had never come here, and I don’t want to see no more magic,’ he said, and fell silent.’ 
“I wish I had never come here, and I don’t want to see no more magic,’ he said, and fell silent.” 

La salida es

"Take off, hoser!" 
Larry Wall: I don't know if it's what you want, but it's what you get. :-) 
Larry Wall said, "I don't know if it's what you want, but it's what you get. :-)" 
Larry Wall said: “I don't know if it's what you want, but it’s what you get. :-)” 
Larry Wall said: “I don't know if it's what you want, but it’s what you get. :-)” 
Larry Wall said, “I don't know if it's what you want, but it's what you get. :-)” 
Boss: And what's that "goto" doing there?!? 
Hacker: Er, I guess my finger slipped when I was typing "getservbyport"... 
'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent. 
src/perl/mg.c: 'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent. 
src/perl/mg.c => "I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent." 

Eso probablemente se ve “mal”, pero es debido a la cotizaciones internas Aquí está una versión más completa, que ilustra mejor los temas:

#!/usr/bin/perl 
use strict; 
use warnings; 
use utf8; 
use open qw[ :std IO :utf8 ]; 
while (<DATA>) { 
    chomp;  
    my $bingo = m{ 
     (?<! [:\s]) \s* 
     (?: (?<=^) 
      | (?<= \s) 
     ) 
     (?: (["']) (?: (?! \1) .)+ \1 
      | “ .*? ” # English etc 
      | ‘ .*? ’ 
     ) 
    }sx; 

    if ($bingo) { 
     printf("Line %2d, quote 「%s」\n", $., $&); 
     printf(" " x 7 . "in line 『%s』\n", $_); 
    } else { 
     printf("Line %2d IGNORE 『%s』\n", $., $_); 
    }  
}  
__END__ 
"Take off, hoser!" 
Dorothy Parker:Brevity is the soul of lingerie. 
Dorothy Parker:"Brevity is the soul of lingerie." 
Dorothy Parker: "Brevity is the soul of lingerie." 
Dorothy Parker: "Brevity is the soul of lingerie." 
Larry Wall: I don't know if it's what you want, but it's what you get. :-) 
Larry Wall said, "I don't know if it's what you want, but it's what you get. :-)" 
Larry Wall said: “I don't know if it's what you want, but it’s what you get. :-)” 
Larry Wall said: “I don't know if it's what you want, but it’s what you get. :-)” 
Larry Wall said, “I don't know if it's what you want, but it's what you get. :-)” 
Boss: And what's that "goto" doing there?!? 
Hacker: Er, I guess my finger slipped when I was typing "getservbyport"... 
‘Nevermore!’ quoth the raven. 
Quoth the raven: ‘Nevermore!’ 
'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent. 
src/perl/mg.c: "I wish I had never come here, and I don't want to see no more magic," he said, and fell silent. 
src/perl/mg.c: 'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent. 
src/perl/mg.c => "I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent." 
‘I wish I had never come here, and I don’t want to see no more magic,’ he said, and fell silent.’ 
“I wish I had never come here, and I don’t want to see no more magic,’ he said, and fell silent.” 

cuya salida es:

Line 1, quote 「"Take off, hoser!"」 
     in line 『"Take off, hoser!"』 
Line 2 IGNORE 『Dorothy Parker:Brevity is the soul of lingerie.』 
Line 3 IGNORE 『Dorothy Parker:"Brevity is the soul of lingerie."』 
Line 4 IGNORE 『Dorothy Parker: "Brevity is the soul of lingerie."』 
Line 5 IGNORE 『Dorothy Parker: "Brevity is the soul of lingerie."』 
Line 6 IGNORE 『Larry Wall: I don't know if it's what you want, but it's what you get. :-)』 
Line 7, quote 「 "I don't know if it's what you want, but it's what you get. :-)"」 
     in line 『Larry Wall said, "I don't know if it's what you want, but it's what you get. :-)"』 
Line 8 IGNORE 『Larry Wall said: “I don't know if it's what you want, but it’s what you get. :-)”』 
Line 9 IGNORE 『Larry Wall said: “I don't know if it's what you want, but it’s what you get. :-)”』 
Line 10, quote 「 “I don't know if it's what you want, but it's what you get. :-)”」 
     in line 『Larry Wall said, “I don't know if it's what you want, but it's what you get. :-)”』 
Line 11, quote 「 "goto"」 
     in line 『Boss: And what's that "goto" doing there?!?』 
Line 12, quote 「 "getservbyport"」 
     in line 『Hacker: Er, I guess my finger slipped when I was typing "getservbyport"...』 
Line 13, quote 「‘Nevermore!’」 
     in line 『‘Nevermore!’ quoth the raven.』 
Line 14 IGNORE 『Quoth the raven: ‘Nevermore!’』 
Line 15, quote 「'I wish I had never come here, and I don'」 
     in line 『'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent.』 
Line 16 IGNORE 『src/perl/mg.c: "I wish I had never come here, and I don't want to see no more magic," he said, and fell silent.』 
Line 17 IGNORE 『src/perl/mg.c: 'I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent.』 
Line 18, quote 「 "I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent."」 
     in line 『src/perl/mg.c => "I wish I had never come here, and I don't want to see no more magic,' he said, and fell silent."』 
Line 19, quote 「‘I wish I had never come here, and I don’」 
     in line 『‘I wish I had never come here, and I don’t want to see no more magic,’ he said, and fell silent.’』 
Line 20, quote 「“I wish I had never come here, and I don’t want to see no more magic,’ he said, and fell silent.”」 
     in line 『“I wish I had never come here, and I don’t want to see no more magic,’ he said, and fell silent.”』 

Además, hay una propiedad derivada Unicode estándar llamado \p{Quotation_Mark} o \p{QMark} para abreviar, pero Rubí doesn' t apoyarlo. Es posible enumerar todos éstos cabo utilizando the unichars script:

$ unichars '\p{qmark}' 
" 34 0022 QUOTATION MARK 
' 39 0027 APOSTROPHE 
« 171 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 
» 187 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK 
‘ 8216 2018 LEFT SINGLE QUOTATION MARK 
’ 8217 2019 RIGHT SINGLE QUOTATION MARK 
‚ 8218 201A SINGLE LOW-9 QUOTATION MARK 
‛ 8219 201B SINGLE HIGH-REVERSED-9 QUOTATION MARK 
“ 8220 201C LEFT DOUBLE QUOTATION MARK 
” 8221 201D RIGHT DOUBLE QUOTATION MARK 
„ 8222 201E DOUBLE LOW-9 QUOTATION MARK 
‟ 8223 201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK 
‹ 8249 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK 
› 8250 203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK 
「 12300 300C LEFT CORNER BRACKET 
」 12301 300D RIGHT CORNER BRACKET 
『 12302 300E LEFT WHITE CORNER BRACKET 
』 12303 300F RIGHT WHITE CORNER BRACKET 
〝 12317 301D REVERSED DOUBLE PRIME QUOTATION MARK 
〞 12318 301E DOUBLE PRIME QUOTATION MARK 
〟 12319 301F LOW DOUBLE PRIME QUOTATION MARK 
﹁ 65089 FE41 PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET 
﹂ 65090 FE42 PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET 
﹃ 65091 FE43 PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET 
﹄ 65092 FE44 PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET 
" 65282 FF02 FULLWIDTH QUOTATION MARK 
' 65287 FF07 FULLWIDTH APOSTROPHE 
「 65378 FF62 HALFWIDTH LEFT CORNER BRACKET 
」 65379 FF63 HALFWIDTH RIGHT CORNER BRACKET 

Usted puede enumerar todas las propiedades de un punto de código usando the uniprops script:

$ uniprops -a 2018 
U+2018 ‹‘› \N{ LEFT SINGLE QUOTATION MARK }: 
    \pP \p{Pi} 
    All Any Assigned InGeneralPunctuation Case_Ignorable CI Common Zyyy Pi P General_Punctuation Gr_Base Grapheme_Base Graph GrBase Initial_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn Print Punctuation QMark Quotation_Mark X_POSIX_Graph X_POSIX_Print X_POSIX_Punct 
    Age=1.1 Bidi_Class=ON Bidi_Class=Other_Neutral BC=ON Block=General_Punctuation Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Script=Common Decomposition_Type=None DT=None East_Asian_Width=A East_Asian_Width=Ambiguous EA=A Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group JG=NoJoiningGroup Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=QU Line_Break=Quotation LB=QU Numeric_Type=None NT=None Numeric_Value=NaN NV=NaN Present_In=1.1 IN=1.1 Present_In=2.0 IN=2.0 Present_In=2.1 IN=2.1 Present_In=3.0 IN=3.0 Present_In=3.1 IN=3.1 Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 Present_In=5.0 IN=5.0 Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 Present_In=6.0 IN=6.0 SC=Zyyy Script=Zyyy Sentence_Break=CL Sentence_Break=Close SB=CL Word_Break=MB Word_Break=MidNumLet WB=MB _Case_Ignorable _X_Begin 
+0

El OP quiere cadenas entre comillas * que no tienen * dos puntos antes que ellos. Además, probablemente debería ser '(?: (?! \ 1).) +' ':)' – Kobi

+0

@Kobi: Vaya, gracias. ¿Mejor ahora?:) – tchrist

+0

En realidad, no del todo ':)' si tiene más de un espacio, puede coincidir comenzando por el segundo (porque el primero no es dos puntos). Además, '\ s +' requiere al menos un espacio, pero ese es fácil. Aunque hay citas interesantes ... – Kobi

2

Aquí tienes Creo http://rubular.com/r/hFylsgU3OT

^[^:]*"(.*?)"$ 

Esto por cierto es la manera perfecta de hacer una pregunta expresiones regulares ... ejemplos, enlace, y instrucciones claras

+0

Gracias, pero me di cuenta de que olvidé una prueba: el texto no debe capturarse solo si el punto está justo antes de las comillas (algo así como: [\ s *]): http: // rubular .com/r/NtbcgGqX4h – krn

Cuestiones relacionadas