Skip to content

Conversation

@jmr
Copy link
Contributor

@jmr jmr commented May 7, 2020

In Agent, Key, and Query; add functions to set the string
using a std::string_view. Remove the const char * version.

Key::set_str(const char *) => set_str(string_view)
Query::set_str(const char *) => set_str(string_view)
Agent::set_key(const char *) => set_key(string_view)
Agent::set_query(const char *) => set_query(string_view)

Also add getters:

Key::str() -> string_view
Query::str() -> string_view

These are convenient because

  string s;
  agent.set_key(s.data(), s.length());

becomes

  string s;
  agent.set_key(s);

Setting with a const char * still works, since string_view has an
implicit const char * constructor.

Similarly,

  string_view(agent.key().ptr(), agent.key.length()) == x

becomes

  agent.key().str() == x

std::string_view is a C++17 feature, so all changes are guarded
with #if __cplusplus >= 201703L.

jmr added 3 commits May 7, 2020 07:37
In Agent, Key, and Query; add functions to set the string
using a std::string_view.  Remove the const char * version.

Key::set_str(const char *) => set_str(string_view)
Query::set_str(const char *) => set_str(string_view)
Agent::set_key(const char *) => set_key(string_view)
Agent::set_query(const char *) => set_query(string_view)

Also add getters:
Key::str() -> string_view
Query::str() -> string_view

These are convenient because
  string s;
  agent.set_key(s.data(), s.length());
becomes
  string s;
  agent.set_key(s);

Setting with a const char * still works, since string_view has an
implicit const char * constructor.

Similarly,
  string_view(agent.key().ptr(), agent.key.length()) == x
becomes
  agent.key().str() == x

std::string_view is a C++17 feature, so all changes are guarded
with #if __cplusplus >= 201703L.
jmr added 2 commits May 12, 2020 03:25
Although const could prevent bugs, it's inconsistent with other function
definitions, so remove it.
This is shorter and makes the implementation parallel to set_query.
@s-yata s-yata self-assigned this Jun 14, 2020
@s-yata
Copy link
Owner

s-yata commented Jun 17, 2020

I agree with that std::string_view version getters and setters are convenient.
However, const char * version setters should not be replaced.
If the argument type changes depending on __cplusplus, users will be confused.

For example, if libmarisa is built on C++17 or later, callers must be built on C++17 or later.
If there is a conflict, the behavior is undefined.

@jmr
Copy link
Contributor Author

jmr commented Jun 17, 2020

I agree with that std::string_view version getters and setters are convenient.
However, const char * version setters should not be replaced.
If the argument type changes depending on __cplusplus, users will be confused.

There shouldn't be any confusion. You can still pass a const char * in C++17 mode, since string_view can be implicitly constructed from const char *. I didn't remove any test cases, so you can verify that this still works.

Maybe I should add a comment like, "Don't worry, you can still pass const char *"?

For example, if libmarisa is built on C++17 or later, callers must be built on C++17 or later.
If there is a conflict, the behavior is undefined.

This is a good point. Shall I make it a configure option?

@s-yata
Copy link
Owner

s-yata commented Jun 17, 2020

How about adding accessors with new names, such as set_str_view()?
We can easily avoid conflict problems.

@jmr
Copy link
Contributor Author

jmr commented Jun 17, 2020

How about adding accessors with new names, such as set_str_view()?

To me, it seems nicer to have the set_str overloads. What's the advantage of having both set_str(const char *) and set_str_view(string_view)? Just set_str(string_view) will do everything and have slightly smaller binary size.

@glebm
Copy link
Contributor

glebm commented Jun 17, 2020

If I understand correctly, C++ doesn't have that kind of stdlib ABI compatibility. If Marisa was linked against C++17 stdlib, it is not guaranteed to work with C++11 stdlib even if new C++17 stuff isn't used. Am I wrong?

@jmr
Copy link
Contributor Author

jmr commented Jun 17, 2020

If we make MARISA_USE_STRING_VIEW a config option it will work and save a lot of thinking.

The same version of gcc with different -std= options are compatible: https://stackoverflow.com/a/49119902

I think if we did something like

#if __cplusplus >= 201703L
void set_str(string_view);
#endif
void set_str(const char *);
void set_str(const char *, size_t);

it will even work with both versions, since we're just adding a non-virtual function and not changing the representation. I'm not really sure, since I haven't thought about it enough, though.

@s-yata
Copy link
Owner

s-yata commented Jun 18, 2020

If Agent::set_query(const char *) is inline, I could accept this pull request without worry.

If libmarisa is built on C++17 or later, Agent::set_query(const char *) does not exist.
In this case, users must use C++17 or later, because Agent::set_query(const char *) is only declared.
I believe it will confuse users.

Always provide these for all C++ versions and only add
set_str(string_view) for C++17.

This should provide binary compatibility if libmarisa is built with C++17,
but could still lead to confusion if built with earlier standards,
then used in something built with C++17.
@jmr
Copy link
Contributor Author

jmr commented Jun 18, 2020

If Agent::set_query(const char *) is inline, I could accept this pull request without worry.

Why is it better if it's inline?

If libmarisa is built on C++17 or later, Agent::set_query(const char *) does not exist.
In this case, users must use C++17 or later, because Agent::set_query(const char *) is only declared.
I believe it will confuse users.

Hmm. I fixed this, but now if the user build libmarisa with C++14 or earlier, trying to use set_str(string_view) (with -std=c++17 or equivalent, of course) will just result in a confusing link error (I think).

Maybe MARISA_USE_STRING_VIEW is really best?

@s-yata
Copy link
Owner

s-yata commented Jun 18, 2020

Why is it better if it's inline?

I'm sorry that I could not explain it well.

I would like to show you an example.

#ifndef ECHO_H
#define ECHO_H

#include <string_view>

class Echo {
 public:
#if __cplusplus >= 201703L
  std::string_view echo(std::string_view s) {
    return s;
  }
#else  // __cplusplus >= 201703L
  const char *echo(const char *s);
#endif
};

#endif  // ECHO_H
#include "echo.h"

#if __cplusplus < 201703L
const char *Echo::echo(const char *s) {
  return s;
}
#endif  // __cplusplus < 201703L
// main.cc
#include <iostream>
#include "echo.h"
int main() {
  Echo echo;
  std::cout << echo.echo("Mike") << std::endl;
  return 0;
}

If echo.cc is compiled with C++17 and main.cc is compiled with C++14, the linker fails due to undefined reference as follows:

$ g++ -c -std=c++17 echo.cc
$ g++ -c -std=c++14 main.cc
$ g++ echo.o main.o
main.o: In function `main':
main.cc:(.text+0x26): undefined reference to `Echo::echo(char const*)'
collect2: error: ld returned 1 exit status

This is because Echo::echo(const char *) does not exist in echo.o.

If the definition of Echo::echo(const char *) is provided in echo.h (inline), this problem does not happen.

Maybe MARISA_USE_STRING_VIEW is really best?

I think it's not a good way.

@jmr
Copy link
Contributor Author

jmr commented Jun 18, 2020

That makes sense, thanks.

2139f49 no longer has that problem, since I only add functions, not delete them, but I'll do something like your echo but with inline.

jmr added a commit to jmr/marisa-trie that referenced this pull request Jun 18, 2020
@jmr
Copy link
Contributor Author

jmr commented Jun 18, 2020

See if #31 is ok, then I'll do the inline version as you suggest.

@jmr
Copy link
Contributor Author

jmr commented Jun 19, 2020

I think the current version of this PR (without #31 ) works, with a tiny disadvantage of larger binary size. It's equivalent to this version of echo:

echo.h

#include <string_view>

class Echo {
 public:
#if __cplusplus >= 201703L
  std::string_view echo(std::string_view s) {
    return s;
  }
#endif
  const char *echo(const char *s);
};

echo.cc

#include "echo.h"

const char *Echo::echo(const char *s) {
  return s;
}
$ g++ -c -std=c++17 echo.cc && g++ -c -std=c++14 main.cc && g++ echo.o main.o && ./a.out
Mike

Can we proceed with this and decide whether/how to make Agent::set_query inline later?

@s-yata
Copy link
Owner

s-yata commented Jun 20, 2020

Sorry for my late response.

It looks good to me.
Thank you!

@s-yata s-yata merged commit d853921 into s-yata:master Jun 20, 2020
s-yata added a commit that referenced this pull request Jun 21, 2020
@jmr jmr deleted the string-view branch May 20, 2025 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants