Summary: | Can't open banks with non-ascii characters in path | ||
---|---|---|---|
Product: | jlscp | Reporter: | Nikita Zlobin <cook60020tmp> |
Component: | jlscp | Assignee: | Grigor Iliev <gr.iliev> |
Status: | ASSIGNED --- | ||
Severity: | major | CC: | cuse |
Priority: | P5 | ||
Version: | SVN Trunk | ||
Hardware: | PC | ||
OS: | Linux | ||
Attachments: | jlscp-fix-utf8-escaping.diff |
Description
Nikita Zlobin
2013-12-25 13:45:14 CET
Would you please try if it works for you when using QSampler instead of JSampler/Fantasia? For now I have linuxsampler-2.1.0.svn8, qsampler 9.1 and jsampler 0.9. I have to note, that both qsampler and jsampler fail to load by filename. But when I tried in lscp shell, it was able to load it with cyrillic in name. Though command GET CHANNEL INFO 0 displays non-ascii in esc codes rather than utf8 text. I tried to rebuild linuxsampler stuff from fresh (I use live ebuilds in gentoo), but my gcc is old by now (7.4.0), and it says, that min C++14 is required. Will look if I can update qsampler without linuxsampler update. More recent info. It's not just instrument file path passing, but about all non-ascii communication with linuxsampler. For qsampler it's visible in channel creation dialog, where device names have non-ascii chars (all russian) displayed as \xXX. It's possible to pass utf8 both in directly typed or pasted form and and replacing non-ascii with C-like sequences (still in quotes of course). For example. gig file: "/home/nick87720z/Музыка/Yamaha C7.gig" lscp shell input: CREATE AUDIO_OUTPUT_DEVICE JACK CREATE MIDI_INPUT_DEVICE ALSA ADD CHANNEL LOAD INSTRUMENT NON_MODAL "/home/nick87720z/Музыка/Yamaha C7.gig" 0 0 Now: lscp=# GET CHANNEL INFO 0 ENGINE_NAME: GIG VOLUME: 1.000 AUDIO_OUTPUT_DEVICE: 0 AUDIO_OUTPUT_CHANNELS: 2 AUDIO_OUTPUT_ROUTING: 0,1 MIDI_INPUT_DEVICE: 0 MIDI_INPUT_PORT: 0 MIDI_INPUT_CHANNEL: 0 INSTRUMENT_FILE: /home/nick87720z/\xd0\x9c\xd1\x83\xd0\xb7\xd1\x8b\xd0\xba\xd0\xb0/Yamaha\x20C7.gig INSTRUMENT_NR: 0 INSTRUMENT_NAME: A\x27 Yamaha C7 \x2716 (Up+ Rel) INSTRUMENT_STATUS: 100 MUTE: false SOLO: false MIDI_INSTRUMENT_MAP: NONE This sequence is correct. Following instrument command is ok as well: LOAD INSTRUMENT NON_MODAL "/home/nick87720z/\xd0\x9c\xd1\x83\xd0\xb7\xd1\x8b\xd0\xba\xd0\xb0/Yamaha\x20C7.gig" 0 0 Now about frontends - right to attempt to load instrument, linuxsampler logs. JSampler: Scheduling '/home/nick87720z/C7K:0/Yamaha C7.gig' (Index=0) to be loaded in background (if not loaded yet). Loading gig file '/home/nick87720z/C7K:0/Yamaha C7.gig'...gig::Engine error: Failed to load instrument, cause: Can't open "/home/nick87720z/C7K:0/Yamaha C7.gig": No such file or directory QSampler: Scheduling '/home/nick87720z/' (Index=0) to be loaded in background (if not loaded yet). Loading gig file '/home/nick87720z/'...gig::Engine error: Failed to load instrument, cause: Not a RIFF file I worry, is not it really possible to make lscp communications with different encodings? UTF8 is now seem to be standard at least for terminals. Moreover, ascii compatible. If making it default still may break existing clients, there still could be LSCP command to set encoding. Faster way would be to change gui to interpret \xXX sequences in both directions. Also about lscp shell behavior when pasting russian text. utf8 chars are disaplayed by terminal correctly, but cursor shifts by multiple possitions, obviously counting bytes. And backspace erases bytes instead entire utf8 sequences. Last attempt was with fresh qsampler and liblscp from git. I guess, I'm ready to try fix it. From first look it's already supposed to support escaping and pass utf8 to LS. Although problem is wider, this is enough to fix loading: https://gitlab.com/rncbc/qsampler/-/merge_requests/1 It seems, linuxsampler accepts utf8 without problems. Test with lscp shell is not just one example. Qsampler code uses LscpEscapePath() in two liblscp calls: lscp_load_instrument_non_modal() and lscp_map_midi_instrument() right inside argument. Although only first case is our there, I edited both in such way (from git diff output): diff --git a/src/qsamplerChannel.cpp b/src/qsamplerChannel.cpp index 1a5c8bf..49326e3 100644 --- a/src/qsamplerChannel.cpp +++ b/src/qsamplerChannel.cpp @@ -224,8 +224,7 @@ bool Channel::loadInstrument ( const QString& sInstrumentFile, int iInstrumentNr if (::lscp_load_instrument_non_modal( pMainForm->client(), - qsamplerUtilities::lscpEscapePath( - sInstrumentFile).toUtf8().constData(), + sInstrumentFile.toUtf8().constData(), iInstrumentNr, m_iChannelID ) != LSCP_OK) { appendMessagesClient("lscp_load_instrument"); diff --git a/src/qsamplerInstrument.cpp b/src/qsamplerInstrument.cpp index 7beb109..c0bfb21 100644 --- a/src/qsamplerInstrument.cpp +++ b/src/qsamplerInstrument.cpp @@ -196,8 +196,8 @@ bool Instrument::mapInstrument (void) if (::lscp_map_midi_instrument(pMainForm->client(), &instr, m_sEngineName.toUtf8().constData(), - qsamplerUtilities::lscpEscapePath( - m_sInstrumentFile).toUtf8().constData(), + + m_sInstrumentFile.toUtf8().constData(), m_iInstrumentNr, m_fVolume, load_mode, m_sName.toUtf8().constData()) != LSCP_OK) { pMainForm->appendMessagesClient("lscp_map_midi_instrument"); Note - when LscpEscapePath() call is in place, toUtf8() could be replaced by e.g. toLatin1() without breakages (I guess). But even without escaping - it just works. Tracing down to liblscp code, I found no more conversions until send() call (sys/socket.h). Escaping requirement is from LSCP 1.2 as told in comments from LscpEscapePath() code. I forgot this bug was about jsampler, not qsampler. Instead of fixing jsampler,I was doing fix for Q. Merge request above now fixes all qsampler communications, just waiting for rncbc attention. I'm trying to analyse jsampler and jlscp to find, where (un)escaping happens. For now it seems like it's defined in jlscp (Parser.java). Still can't find, how it translates binary to lscp escapes. My first suggestion. UTF16 is standard string encoding in java. It's better be translated to system encoding (or what is utf8 in linux now) _before_ any attempt to translate. In Qt QString::toUtf8() method produces QByteArray object. Could be same way in java, i'm still digging (my first attempt to dig java project). I see number of .getBytes("US-ASCII") calls both in jsampler and jlscp. It may be really goot to use it like getBytes() without arg before escaping. I see you have been working on a fix for QSampler on the same issue (bug #314), thanks! Do you have plans for a fix for this issue on Fantasia/JSampler as well? If not, I will still leave this report open for some time, but will eventually close it as WONTFIX then, as there was nobody actively working on Fantasia/JSampler for many years now. I really tried, but I have no enough experience coding java, as I never really coded it. I only have little of it from university lessons. I only understood from first look - it's different from qsampler. I guess, I could try in some future unless someone else fixes it. For now its most precise, what I can tell. Ok, I understand Nikita. For anyone that might be interested in looking at this issue: Java does not seem to have built-in translation of escape sequences (e.g. in a convenient way with String.getBytes(...)). So probably best way to handle this in JSampler/Fantasia would be to use a regular expression to translate the incoming data from the sampler and replacing all occurrences of escape sequences by respective unicode characters. Then for the other way around, i.e. converting from a Unicode string from JSampler/Fantasia to be sent out to the sampler: maybe just converting the Java String object into a Character object array, which can deal with unicode for each character: String s = ... ; ... Character[] charObjectArray = ArrayUtils.toObject(s.toCharArray()); and finally assembling a Java string from that array where each unicode character is replaced by an escape sequence instead. Tried once more, this time it's much simpler - just utilized eclipse IDE. The problem is really not different from one in qsampler. Java String class is locked to UTF-16 by design, and StringBuffer seems to have same encoding, with only difference in that it's not immutable. This confusion seems to be more common - I found article, which assumes, that String object can be recreated in different encoding via intermediate byte[] array (according to docs, charset argument in String(byte[], charset) means incoming encoding, while String can't be other than UTF_16). For now I got ok input parsing (from server), there's svn diff output: (the real conversion code is in jlscp, not jsampler itself) ================================================= Index: src/org/linuxsampler/lscp/Parser.java =================================================================== --- src/org/linuxsampler/lscp/Parser.java (revision 3905) +++ src/org/linuxsampler/lscp/Parser.java (working copy) @@ -617,7 +617,8 @@ public static String toNonEscapedString(Object obj) { String s = obj.toString(); - StringBuffer sb = new StringBuffer(); + byte[] sb = new byte[s.length() + 1]; + int j = 0; for(int i = 0; i < s.length(); i++) { char c = s.charAt(i); if(c == '\\') { @@ -626,34 +627,34 @@ break; } char c2 = s.charAt(++i); - if(c2 == '\'') sb.append('\''); - else if(c2 == '"') sb.append('"'); - else if(c2 == '\\') sb.append('\\'); - else if(c2 == 'r') sb.append('\r'); - else if(c2 == 'n') sb.append('\n'); - else if(c2 == 'f') sb.append('\f'); - else if(c2 == 't') sb.append('\t'); - else if(c2 == 'v') sb.append((char)0x0B); + if (c2 == '\'') sb[j++] = '\''; + else if(c2 == '"') sb[j++] = '"'; + else if(c2 == '\\') sb[j++] = '\\'; + else if(c2 == 'r') sb[j++] = '\r'; + else if(c2 == 'n') sb[j++] = '\n'; + else if(c2 == 'f') sb[j++] = '\f'; + else if(c2 == 't') sb[j++] = '\t'; + else if(c2 == 'v') sb[j++] = (char)0x0B; else if(c2 == 'x') { - Character ch = getHexEscapeSequence(s, i + 1); - if(ch != null) sb.append(ch.charValue()); + byte ch = getHexEscapeSequence(s, i + 1); + if(ch != 0) sb[j++] = ch; i += 2; } else if(c2 >= '0' && c2 <= '9') { - Character ch = getOctEscapeSequence(s, i); - if(ch != null) sb.append(ch.charValue()); + byte ch = getOctEscapeSequence(s, i); + if(ch != 0) sb[j++] = ch; i += 2; } else Client.getLogger().info("Unknown escape sequence \\" + c2); } else { - sb.append(c); + sb[j++] = (byte)c; } } - - return sb.toString(); + sb[j] = 0; + return new String(sb, java.nio.charset.StandardCharsets.UTF_8); } - private static Character + private static byte getHexEscapeSequence(String s, int index) { - Character c = null; + byte c = 0; if(index + 1 >= s.length()) { Client.getLogger().info("Broken escape sequence"); @@ -660,15 +661,15 @@ return c; } - try { c = (char)Integer.parseInt(s.substring(index, index + 2), 16); } + try { c = (byte)Integer.parseInt(s.substring(index, index + 2), 16); } catch(Exception x) { Client.getLogger().info("Broken escape sequence!"); } return c; } - private static Character + private static byte getOctEscapeSequence(String s, int index) { - Character c = null; + byte c = 0; if(index + 2 >= s.length()) { Client.getLogger().info("Broken escape sequence"); @@ -675,7 +676,7 @@ return c; } - try { c = (char)Integer.parseInt(s.substring(index, index + 3), 8); } + try { c = (byte)Integer.parseInt(s.substring(index, index + 3), 8); } catch(Exception x) { Client.getLogger().info("Broken escape sequence!"); } return c; Created attachment 102 [details]
jlscp-fix-utf8-escaping.diff
Forgot, that patches are better to be attached.
There's more complete one, for both in/out stream.
For me this fixes drum maps list, recent filenames list in instrument settings and sending it to linuxsampler (I hope, there's no duplicate code).
Marked as patch, but it may lack some specific info, since I generated it with svn diff from working copy.
Setting product to jlscp seems more logical, but jsampler is still affected, because it uses jlscp.jar from own sources. In order for change to take effect - patched jlscp.jar must be copied into jsampler sources before attempt to build. |